Visualisation Gallery

library(brolgar)
library(ggplot2)

brolgar explores two ways to explore the data, first exploring the raw data, then exploring the data using summaries. This vignette displays a variety of ways to explore your data around these two ideas.

Exploring raw data

When you first receive your data, you want to look at as much raw data as possible. This section discusses a few techniques to make it more palatable to explore your raw data without getting too much overplotting.

Select a sample of individuals

Sample n random individuals to explore (Note: Possibly not representative)

For example, we can sample 20 random individuals, and then plot them. (perhaps change sample_n_keys into sample_id.)


wages %>%
  sample_n_keys(size = 20)
#> # A tsibble: 132 x 9 [!]
#> # Key:       id [20]
#>       id ln_wages    xp   ged xp_since_ged black hispanic high_grade
#>    <int>    <dbl> <dbl> <int>        <dbl> <int>    <int>      <int>
#>  1  7173     1.58 0.247     0            0     1        0         10
#>  2  7173     1.96 0.542     0            0     1        0         10
#>  3  7173     1.68 1.41      0            0     1        0         10
#>  4  7173     1.75 1.47      0            0     1        0         10
#>  5  7173     1.48 1.93      0            0     1        0         10
#>  6  9613     1.60 0.375     0            0     0        0         11
#>  7  9613     1.69 1.38      0            0     0        0         11
#>  8  9613     1.48 2.74      0            0     0        0         11
#>  9  9613     1.37 3.68      0            0     0        0         11
#> 10  9613     1.30 4.01      0            0     0        0         11
#> # ℹ 122 more rows
#> # ℹ 1 more variable: unemploy_rate <dbl>

wages %>%
  sample_n_keys(size = 20) %>%
  ggplot(aes(x = xp,
             y = ln_wages,
             group = id)) + 
  geom_line()

Filter only those with certain number of observations

There was a variety of the number of observations in the data - some with only a few, and some with many. We can filter by the number of the observations in the data using add_n_obs(), which adds a new column, n_obs, the number of observations for each key.

wages %>%
  add_n_obs()
#> # A tsibble: 6,402 x 10 [!]
#> # Key:       id [888]
#>       id    xp n_obs ln_wages   ged xp_since_ged black hispanic high_grade
#>    <int> <dbl> <int>    <dbl> <int>        <dbl> <int>    <int>      <int>
#>  1    31 0.015     8     1.49     1        0.015     0        1          8
#>  2    31 0.715     8     1.43     1        0.715     0        1          8
#>  3    31 1.73      8     1.47     1        1.73      0        1          8
#>  4    31 2.77      8     1.75     1        2.77      0        1          8
#>  5    31 3.93      8     1.93     1        3.93      0        1          8
#>  6    31 4.95      8     1.71     1        4.95      0        1          8
#>  7    31 5.96      8     2.09     1        5.96      0        1          8
#>  8    31 6.98      8     2.13     1        6.98      0        1          8
#>  9    36 0.315    10     1.98     1        0.315     0        0          9
#> 10    36 0.983    10     1.80     1        0.983     0        0          9
#> # ℹ 6,392 more rows
#> # ℹ 1 more variable: unemploy_rate <dbl>

We can then filter our data based on the number of observations, and combine this with the previous steps to sample the data using sample_n_keys().

library(dplyr)
wages %>%
  add_n_obs() %>%
  filter(n_obs >= 5) %>%
  sample_n_keys(size = 20) %>%
  ggplot(aes(x = xp,
             y = ln_wages,
             group = id)) + 
  geom_line()

Clever facets: facet_strata

brolgar provides some clever facets to help make it easier to explore your data. facet_strata() splits the data into 12 groups by default:

set.seed(2019-07-23-1936)
library(ggplot2)
ggplot(wages,
       aes(x = xp,
           y = ln_wages,
           group = id)) +
  geom_line() +
  facet_strata()

You can control the number with n_strata:

set.seed(2019-07-23-1936)
library(ggplot2)
ggplot(wages,
       aes(x = xp,
           y = ln_wages,
           group = id)) +
  geom_line() +
  facet_strata(n_strata = 6)