Max Gordon
2023-08-25
- The basics ofgetDescriptionStatsBy
- Integration withhtmlTable
- Extraeverything
- P-values
- Customp-values
- UsingmergeDesc
The purpose of the first table in a medical paper is most often todescribe your population. In an RCT the table frequently compares thebaseline characteristics between the randomized groups, while anobservational study will often compare exposed with unexposed. In thisvignette I will show how I use the functions to quickly generate adescriptive table.
We will use the mtcars dataset and compare the groups with automatictransmission to those without. The units and labels are built upon thelogic in the Hmisc package that allow us to specifyattributes on columns. Note that this labeling is not needed,it just makes stuff nicer.
library(Gmisc)data("mtcars")mtcars <- mtcars %>% mutate(am = factor(am, levels = 0:1, labels = c("Automatic", "Manual")), gear = factor(gear), # Make up some data for making it slightly more interesting col = factor(sample(c("red", "black", "silver"), size = NROW(mtcars), replace = TRUE))) %>% set_column_labels(mpg = "Gas", wt = "Weight", am = "Transmission", gear = "Gears", col = "Car color") %>% set_column_units(mpg = "Miles/(US) gallon", wt = "10<sup>3</sup> lbs")
The function getDescriptionStatsBy
is a simple way to dobasic descriptive statistics. Mandatory named column is by:
mtcars %>% getDescriptionStatsBy(mpg, wt, am, gear, col, by = am)
Automatic | Manual | |
---|---|---|
Gas | 17.1 (±3.8) | 24.4 (±6.2) |
Weight | 3.8 (±0.8) | 2.4 (±0.6) |
Transmission | 19 (100.0%) | 0 (0.0%) |
Gears | ||
3 | 15 (78.9%) | 0 (0.0%) |
4 | 4 (21.1%) | 8 (61.5%) |
5 | 0 (0.0%) | 5 (38.5%) |
Car color | ||
black | 8 (42.1%) | 3 (23.1%) |
red | 5 (26.3%) | 4 (30.8%) |
silver | 6 (31.6%) | 6 (46.2%) |
If we prefer median we can simply specify the statistic used withcontinuous variables:
mtcars %>% getDescriptionStatsBy(mpg, wt, am, gear, col, by = am, continuous_fn = describeMedian)
Automatic | Manual | |
---|---|---|
Gas | 17.3 (14.9 - 19.2) | 22.8 (21.0 - 30.4) |
Weight | 3.5 (3.4 - 3.8) | 2.3 (1.9 - 2.8) |
Transmission | 19 (100.0%) | 0 (0.0%) |
Gears | ||
3 | 15 (78.9%) | 0 (0.0%) |
4 | 4 (21.1%) | 8 (61.5%) |
5 | 0 (0.0%) | 5 (38.5%) |
Car color | ||
black | 8 (42.1%) | 3 (23.1%) |
red | 5 (26.3%) | 4 (30.8%) |
silver | 6 (31.6%) | 6 (46.2%) |
Integration with htmlTable
Key to having a good descriptive statistics is to be able to outputit into a table. I usually rely on htmlTabl
for all mytable requirements as it has a nice set of advanced options that allowus to get publication ready tables that can simply be copy-pasted intoour paper. Note that we here use html code † that we thenexplain in the footer. If we specify a name to the parameters like thiswe override the labels previously set.
mtcars %>% getDescriptionStatsBy(mpg, `Weight†` = wt, am, gear, col, by = am) %>% htmlTable(caption = "Basic descriptive statistics from the mtcars dataset", tfoot = "† The weight is in 10<sup>3</sup> kg")
Basic descriptive statistics from the mtcars dataset | ||
Automatic | Manual | |
---|---|---|
Gas | 17.1 (±3.8) | 24.4 (±6.2) |
Weight† | 3.8 (±0.8) | 2.4 (±0.6) |
Transmission | 19 (100.0%) | 0 (0.0%) |
Gears | ||
3 | 15 (78.9%) | 0 (0.0%) |
4 | 4 (21.1%) | 8 (61.5%) |
5 | 0 (0.0%) | 5 (38.5%) |
Car color | ||
black | 8 (42.1%) | 3 (23.1%) |
red | 5 (26.3%) | 4 (30.8%) |
silver | 6 (31.6%) | 6 (46.2%) |
† The weight is in 103 kg |
Extra everything
There is a large set of options forgetDescriptionStatsBy
, here is an example with some of theman some extra styling.
mtcars %>% getDescriptionStatsBy(mpg, `Weight†` = wt, am, gear, col, by = am, digits = 0, add_total_col = TRUE, use_units = "name") %>% addHtmlTableStyle(pos.caption = "bottom") %>% htmlTable(caption = "Basic descriptive statistics from the mtcars dataset", tfoot = "† The weight is in 10<sup>3</sup> kg")
Total | Automatic | Manual | |
---|---|---|---|
Gas (Miles/(US) gallon) | 20 (±6) | 17 (±4) | 24 (±6) |
Weight† (103 lbs) | 3 (±1) | 4 (±1) | 2 (±1) |
Transmission | 19 (59%) | 19 (100%) | 0 (0%) |
Gears | |||
3 | 15 (47%) | 15 (79%) | 0 (0%) |
4 | 12 (38%) | 4 (21%) | 8 (62%) |
5 | 5 (16%) | 0 (0%) | 5 (38%) |
Car color | |||
black | 11 (34%) | 8 (42%) | 3 (23%) |
red | 9 (28%) | 5 (26%) | 4 (31%) |
silver | 12 (38%) | 6 (32%) | 6 (46%) |
Basic descriptive statistics from the mtcars dataset | |||
† The weight is in 103 kg |
Event though p-values are discouraged in the Table 1, they are not uncommon. I havetherefore added basic statistics consisting that defaults to Fisher’sexact test for proportions and Wilcoxon rank sum test for continuousvalues.
mtcars %>% getDescriptionStatsBy(mpg, wt, am, gear, col, by = am, continuous_fn = describeMedian, digits = 0, header_count = TRUE, statistics = TRUE) %>% htmlTable(caption = "Basic descriptive statistics from the mtcars dataset")
Basic descriptive statistics from the mtcars dataset | |||
Automatic No.19 | Manual No.13 | P-value | |
---|---|---|---|
Gas | 17 (15 - 19) | 23 (21 - 30) | 0.002 |
Weight | 4 (3 - 4) | 2 (2 - 3) | < 0.0001 |
Transmission | 19 (100%) | 0 (0%) | < 0.0001 |
Gears | < 0.0001 | ||
3 | 15 (79%) | 0 (0%) | |
4 | 4 (21%) | 8 (62%) | |
5 | 0 (0%) | 5 (38%) | |
Car color | 0.60 | ||
black | 8 (42%) | 3 (23%) | |
red | 5 (26%) | 4 (31%) | |
silver | 6 (32%) | 6 (46%) |
Custom p-values
By popular demand I’ve expanded with the option of having customp-values. All you need to do is to provide a function that takes twovalues and exports a single p-value. There are several preparedfunctions that you can use or use as a template for your own p-valuefunction. They all start with getPval..
,e.g.getPvalKruskal
. You can either provide a singlefunction or you can set the defaults depending on the variable type:
mtcars %>% getDescriptionStatsBy(mpg, wt, am, gear, col, by = am, continuous_fn = describeMedian, digits = 0, header_count = TRUE, statistics = list(continuous = getPvalChiSq, factor = getPvalChiSq, proportion = getPvalFisher)) %>% addHtmlTableStyle(pos.caption = "bottom") %>% htmlTable(caption = "P-values generated from a custom set of values")
Automatic No.19 | Manual No.13 | P-value | |
---|---|---|---|
Gas | 17 (15 - 19) | 23 (21 - 30) | 0.27 |
Weight | 4 (3 - 4) | 2 (2 - 3) | 0.37 |
Transmission | 19 (100%) | 0 (0%) | < 0.0001 |
Gears | < 0.0001 | ||
3 | 15 (79%) | 0 (0%) | |
4 | 4 (21%) | 8 (62%) | |
5 | 0 (0%) | 5 (38%) | |
Car color | 0.52 | ||
black | 8 (42%) | 3 (23%) | |
red | 5 (26%) | 4 (31%) | |
silver | 6 (32%) | 6 (46%) | |
P-values generated from a custom set of values |
Prior to Gmisc v3.0 mergeDesc
was the best way toquickly assemble a “Table 1”:
getTable1Stats <- function(x, digits = 0, ...){ getDescriptionStatsBy(x = x, by = mtcars$am, digits = digits, continuous_fn = describeMedian, header_count = TRUE, ...) }t1 <- list()t1[["Gas"]] <- getTable1Stats(mtcars$mpg) t1[["Weight†"]] <- getTable1Stats(mtcars$wt)t1[["Color"]] <- getTable1Stats(mtcars$col)# If we want to use the labels set in the beginning# we add an element without a namet1 <- c(t1, list(getTable1Stats(mtcars$gear)))mergeDesc(t1, htmlTable_args = list(caption = "Basic descriptive statistics from the mtcars dataset", tfoot = "† The weight is in 10<sup>3</sup> kg"))
Basic descriptive statistics from the mtcars dataset | ||
Automatic No.19 | Manual No.13 | |
---|---|---|
Gas | 17 (15 - 19) | 23 (21 - 30) |
Weight† | 4 (3 - 4) | 2 (2 - 3) |
Color | ||
black | 8 (42%) | 3 (23%) |
red | 5 (26%) | 4 (31%) |
silver | 6 (32%) | 6 (46%) |
Gears | ||
3 | 15 (79%) | 0 (0%) |
4 | 4 (21%) | 8 (62%) |
5 | 0 (0%) | 5 (38%) |
† The weight is in 103 kg |