Summarize variables based on measurement level

Summarizes each variable passed to .... This is handled differently based on each variable's level of measurement:

For nominal variables, returns n and proportion for each level
For binary variables, returns n and proportion TRUE
For continuous variables, returns mean and standard deviation by default. Specify alternative summary statistics using .cont_fx.

By default, summary_report() will guess the measurement level for each variable. This can be overridden for all variables using the .default argument, or for select variables using the nom(), bin(), or cont() measurement wrappers. See details.

Usage

summary_report(
  .data,
  ...,
  .default = c("auto", "nom", "bin", "cont"),
  .drop = TRUE,
  .cont_fx = list(mean, sd),
  .missing_label = NA,
  na.rm = FALSE,
  na.rm.nom = na.rm,
  na.rm.bin = na.rm,
  na.rm.cont = na.rm
)

nom(...)

bin(...)

cont(...)

Arguments

.data: a data frame or data frame extension.
...: <tidy-select> one or more variable names. and/or tidyselect expressions. Elements may be wrapped in nom(), bin(), or cont() to force summarizing as binary, nominal, or continuous, respectively; see details.
.default: how to determine measurement level for variables if not specified by a measurement wrapper. "auto" will guess measurement level for each variable, while "nom", "bin", and "cont" will treat all unwrapped variables as nominal, binary, or continuous, respectively.
.drop: if FALSE, frequencies for nominal variables will include counts for empty groups (i.e. for levels of factors that don't exist in the data).
.cont_fx: a list containing the two functions with which continuous variables will be summarized.
.missing_label: label for missing values in nominal variables.
na.rm: if TRUE, NA values in each variable will be dropped prior to computation.
na.rm.nom, na.rm.bin, na.rm.cont: control NA handling specifically for nominal, binary, or continuous variables. Overrides na.rm for that variable type.

Value

A tibble with four columns:

Variable: Variable name
Value:
- For nominal variables, a row for each unique value (including unobserved factor levels if .drop = FALSE).
- For binary variables, either TRUE or 1 (for logical or numeric variables, respectively).
- For continuous variables, the names of the summary statistics specified in .cont_fx.
V1:
- For nominal and binary variables, the number of observations with the value in Value.
- For continuous variables, the value of the first summary statistic.
V2:
- For nominal and binary variables, the proportion of observations with the value in Value.
- For continuous variables, the value of the second summary statistic.

Determining measurement level

The measurement level for each variable is determined as follows:

Variables wrapped in nom(), bin(), or cont() will be treated as nominal, binary, or continuous, respectively.
Variables without a measurement wrapper will be treated as the type specified in .default.
If .default is "auto", measurement level will be inferred:
- Logical vectors will be treated as binary if there are no missing values or if na.rm.bin = TRUE.
- Character vectors, factors, dates and datetimes, and logical vectors with missing values will be treated as nominal.
- All other variables will be treated as continuous.

Support for binary variables

To be treated as binary, both of these must be true:

The variable must be either a logical vector, or a binary numeric vector containing only 0s and 1s.
The variable must not include any missing values, or na.rm.bin must be set to TRUE.

Future extensions may allow handling of other dichotomous variables (e.g., "Pregnant" vs. "Not pregnant"), but this is not currently supported. Instead, consider converting these to a logical indicator, e.g., Pregnant = PregnancyStatus == "Pregnant".

Usage

Arguments

Value

Determining measurement level

Support for binary variables

Examples