There are a variety of different
plots to explore missing data available in the naniar package. This
vignette simply showcases all of the visualisations. If you would like
to know more about the philosophy of the `naniar`

package,
you should read `vignette("naniar")`

.

A key point to remember with the visualisation tools in
`naniar`

is that there is a way to get the data from the plot
out from the visualisation.

One of the first plots that I recommend you start with when you are
first exploring your missing data, is the `vis_miss()`

plot,
which is re-exported from `visdat`

.

This plot provides a specific visualiation of the amount of missing data, showing in black the location of missing values, and also providing information on the overall percentage of missing values overall (in the legend), and in each variable.

An upset plot from the `UpSetR`

package can be used to
visualise the patterns of missingness, or rather the combinations of
missingness across cases. To see combinations of missingness and
intersections of missingness amongst variables, use the
`gg_miss_upset`

function:

This tells us:

- Only Ozone and Solar.R have missing values
- Ozone has the most missing values
- There are 2 cases where both Solar.R and Ozone have missing values together

We can explore this with more complex data, such as riskfactors:

The default option of `gg_miss_upset`

is taken from
`UpSetR::upset`

- which is to use up to 5 sets and up to 40
interactions. Here, setting `nsets = 5`

means to look at 5
variables and their combinations. The number of combinations or rather
`intersections`

is controlled by `nintersects`

.
You could, for example look at all of the number of missing variables
using `n_var_miss`

:

`## [1] 24`

If there are 40 intersections, there will be up to 40 combinations of
variables explored. The number of sets and intersections can be changed
by passing arguments `nsets = 10`

to look at 10 sets of
variables, and `nintersects = 50`

to look at 50
intersections.

Setting `nintersects`

to `NA`

it will plot all
sets and all intersections.

There are a few different ways to explore different missing data
mechanisms and relationships. One way incorporates the method of
shifting missing values so that they can be visualised on the same axes
as the regular values, and then colours the missing and not missing
points. This is implemented with `geom_miss_point()`

.

`geom_miss_point`

```
library(ggplot2)
# using regular geom_point()
ggplot(airquality,
aes(x = Ozone,
y = Solar.R)) +
geom_point()
```

```
## Warning: Removed 42 rows containing missing values or values outside the scale range
## (`geom_point()`).
```

Here are some function that provide quick summaries of missingness in
your data, they all start with `gg_miss_`

- so that they are
easy to remember and tab-complete.

`gg_miss_var`

This plot shows the number of missing values in each variable in a
dataset. It is powered by the `miss_var_summary()`

function.

If you wish, you can also change whether to show the % of missing
instead with `show_pct = TRUE`

.

You can also plot the number of missings in a variable grouped by
another variable using the `facet`

argument.