This variant of dplyr::count()
includes a column showing percentage of total observations for each group.
Usage
count_pct(
.data,
...,
na.rm = FALSE,
.by = NULL,
wt = NULL,
sort = FALSE,
.drop = dplyr::group_by_drop_default()
)
Arguments
- ...
Variables to group by. Will be passed to
dplyr::count()
.- na.rm
If
TRUE
, removes rows withNA
values before calculations.- .by
a selection of columns to group by for just this operation, functioning as an alternative to
dplyr::group_by()
. Percentages will be computed within each group rather than for the grand total. See examples.- wt
<
data-masking
> Frequency weights. Can beNULL
or a variable:If
NULL
(the default), counts the number of rows in each group.If a variable, computes
sum(wt)
for each group.
- sort
If
TRUE
, will show the largest groups at the top.- .drop
Handling of factor levels that don't appear in the data, passed on to
group_by()
.For
count()
: ifFALSE
will include counts for empty groups (i.e. for levels of factors that don't exist in the data).For
add_count()
: deprecated since it can't actually affect the output.
Value
A data frame with columns for grouping variables, n
(the count of observations in each group), and pct
(the percentage of total observations in each group).
Examples
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
## note effect of `na.rm` on percentages
dplyr::starwars %>%
count_pct(gender)
#> # A tibble: 3 × 3
#> gender n pct
#> <chr> <int> <dbl>
#> 1 feminine 17 0.195
#> 2 masculine 66 0.759
#> 3 NA 4 0.0460
dplyr::starwars %>%
count_pct(gender, na.rm = TRUE)
#> # A tibble: 2 × 3
#> gender n pct
#> <chr> <int> <dbl>
#> 1 feminine 17 0.205
#> 2 masculine 66 0.795
## note effect of grouping on percentages
# no grouping: % of grand total
ggplot2::mpg %>%
count_pct(year, cyl)
#> # A tibble: 7 × 4
#> year cyl n pct
#> <int> <int> <int> <dbl>
#> 1 1999 4 45 0.192
#> 2 1999 6 45 0.192
#> 3 1999 8 27 0.115
#> 4 2008 4 36 0.154
#> 5 2008 5 4 0.0171
#> 6 2008 6 34 0.145
#> 7 2008 8 43 0.184
# grouping w `group_by()`: % of group, output is grouped
ggplot2::mpg %>%
dplyr::group_by(year) %>%
count_pct(cyl)
#> # A tibble: 7 × 4
#> # Groups: year [2]
#> year cyl n pct
#> <int> <int> <int> <dbl>
#> 1 1999 4 45 0.385
#> 2 1999 6 45 0.385
#> 3 1999 8 27 0.231
#> 4 2008 4 36 0.308
#> 5 2008 5 4 0.0342
#> 6 2008 6 34 0.291
#> 7 2008 8 43 0.368
# grouping w `.by`: % of group, output isn't grouped
ggplot2::mpg %>%
count_pct(cyl, .by = year)
#> # A tibble: 7 × 4
#> year cyl n pct
#> <int> <int> <int> <dbl>
#> 1 1999 4 45 0.385
#> 2 1999 6 45 0.385
#> 3 1999 8 27 0.231
#> 4 2008 4 36 0.308
#> 5 2008 5 4 0.0342
#> 6 2008 6 34 0.291
#> 7 2008 8 43 0.368