Wrappers around base::pmin(), base::pmax(), lighthouse::psum(), and
lighthouse::pmean() that accept
tidyselect expressions.
Usage
psum_across(..., na.rm = FALSE)
pmean_across(..., na.rm = FALSE)
pmin_across(..., na.rm = FALSE)
pmax_across(..., na.rm = FALSE)Arguments
- ...
<
tidy-select> one or more tidyselect expressions that capture numeric and/or logical columns.- na.rm
Should missing values (including
NaN) be removed?
Details
Lighthouse includes two sets of functions for computing "parallel" or row-wise aggregates:
psum()andpmean()(which complementbase::pmin()andpmax())pmin_across(),pmax_across(),psum_across(), andpmean_across()
Both sets of functions differ from base::rowSums() and rowMeans() in that
they:
work in data-masking contexts (e.g., inside
dplyr::mutate()) without needing helpers likedplyr::pick()ordplyr::across().accept multiple inputs via
....return
NAwhenna.rm = TRUEand all values in a row areNA. This mirrors behavior ofbase::pmin()andpmax(), but differs fromrowSums(), which returns0in this situation.
psum_across() and friends support tidyselect expressions; e.g.,
dat
mutate(
IDScrTotal = psum_across(IDScr1:IDScr6),
SDScrTotal = psum_across(starts_with("SDScr"))
)...but must be used inside a data-masking verb like dplyr::mutate(),
group_by(), or filter(), and do not support implicit computations.
Conversely, psum() and friends do not support tidyselect expressions, but
can be used both inside or outside a data-masking context:
# data-masking
dat
mutate(
NumColors = psum(Red, Blue, Green),
)
#non-data masking
psum(1:10, 6:15, 11:20)and support "on the fly" or "implicit" computations:
Examples
dat <- tibble::tribble(
~product, ~price1, ~price2, ~price3,
"Product 1", 20, 25, 22,
"Product 2", NA, 30, 29,
"Product 3", 15, NA, NA,
"Product 4", NA, NA, NA
)
price_cols <- c("price1", "price2", "price3")
dat %>%
dplyr::mutate(
min = pmin_across(price1, price2, price3, na.rm = TRUE),
max = pmax_across(price1:price3, na.rm = TRUE),
sum = psum_across(starts_with("price"), na.rm = TRUE),
mean = pmean_across(all_of(price_cols), na.rm = TRUE)
)
#> # A tibble: 4 × 8
#> product price1 price2 price3 min max sum mean
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Product 1 20 25 22 20 25 67 22.3
#> 2 Product 2 NA 30 29 29 30 59 29.5
#> 3 Product 3 15 NA NA 15 15 15 15
#> 4 Product 4 NA NA NA NA NA NA NA
