How can I create a cross-table by multiple variables in R?

57 views Asked by At

I have a dummy dataset df which contains data on the treatment, disease grade and survival outcome of 50 people.

library(Hmisc)

# Generate initial cases.

set.seed(123)
n <- 50

df <- data.frame(
  treatment = sample(0:1, n, replace = TRUE),
  grade = sample(0:1, n, replace = TRUE),
  death = sample(0:1, n, replace = TRUE)
)

# Add labels to columns.

label(df$treatment) <- "Drug Intervention"
label(df$grade) <- "Cancer Grade"
label(df$death) <- "Death Occurrence"

# Factor the values.

df$treatment <- factor(df$treatment, levels = c(0, 1), labels = c("Drug A", "Drug B"))
df$grade <- factor(df$grade, levels = c(0, 1), labels = c("Grade 1", "Grade 2"))
df$death <- factor(df$death, levels = c(0, 1), labels = c("Survived", "Died"))

I want to generate a 2 x 3 cross table which shows the number of people who survived by grade and treatment. I am using tbl_strata() and tbl_summary() from gtsummary to do this.

This code is getting close to the desired outcome:

library(tidyverse)
library(gtsummary)

df %>% 
  tbl_strata(
    strata = grade,
    ~.x %>%
      tbl_summary(
        by = death,
        percent = "row"
      ))

Which produces a plot that looks like this:

Characteristic Grade 1 Survived, N = 10 Grade 1 Died, N = 17 Grade 2 Survived, N = 9 Grade 2 Died, N = 14
Treatment
Drug A 5 (28%) 13 (72%) 5 (42%) 7 (58%)
Drug B 5 (56%) 4 (44%) 4 (36%) 7 (64%)

However the desired output is:

Treatment Grade 1 Died Grade 2 Died P-Value
Drug A 13 (72%) 7 (58%) 0.46
Drug B 4 (44%) 7 (64%) 0.65

How can I use gtsummary to filter out/collapse the 'Survived` columns to simplify the table, and is it possible to add a Fisher's exact p-value for the relationship between death and grade (for the two separate drugs)?

2

There are 2 answers

3
BigMistake On BEST ANSWER

There are multiple ways to accomplish this. You could filter out individuals who survived since you're only interested in death occurrences, then use tbl_strata() to stratify by treatment rather than grade. Finally, inside tbl_strata(), you can use tbl_summary() to summarize by grade and calculate percentages and use add_p() for your p-values (or whichever method of p-value calculation you prefer).

Here is an example of the structure I am thinking of:

df_filtered <- df %>% 
  filter(death == "Died")
table_output <- df_filtered %>%
  tbl_strata(
    strata = treatment,
    .f = ~ .x %>%
      tbl_summary(
        by = grade,
        missing = "no"
      ) %>%
      add_n() %>% # Add counts
      modify_header(label = "**Grade**") %>% # Modify the header
      add_p(test = list(all_continuous() ~ "fisher.test")) # Add Fisher's exact test p-value
  )
print(table_output)

Another option is to use tbl_strata(strata = treatment) to stratify by treatment, and create separate tables for Drug A and Drug B.

To get your count and percentage format, you can use statistic = list(all_categorical() ~ "{n} ({p}%)").

And you can specify Fisher's exact explicitly as well: add_p(test = all_categorical() ~ "fisher").

This way, your code might look something like this:

df %>%
  tbl_strata(
    strata = treatment,
    ~.x %>%
      tbl_summary(
        by = grade,
        statistic = list(all_categorical() ~ "{n} ({p}%)"),
        missing = "no"
      ) %>%
      add_p(test = all_categorical() ~ "fisher.test")
  )

You can further modify various aspects of formatting like this:

   %>% modify_header(stat_by = "Grade")
   %>% as_gt() %>%  
       tbl_caption(
         caption =  "Captions"
       )
0
jpsmith On

If you're open to other packages besides gtsummary, you can use table1 here:

pvalue <- function(x, ...) {
  y <- unlist(x)
  g <- factor(rep(1:length(x), times=sapply(x, length)))
  p <- chisq.test(table(unlist(x), g))$p.value
  c("", sub("<", "&lt;", format.pval(p, digits = 3, eps = 0.001)))
}

table1::table1(~ treatment | grade, data = df[df$death == "Died",],
               overall = FALSE,
               extra.col = list(`P-value` = pvalue))

enter image description here

Or if you want it to explicitly say, "Death":

table1::table1(~ treatment | grade, data = df[df$death == "Died",],
               overall = FALSE,
               extra.col = list(`P-value` = pvalue),
               caption = "Death")

enter image description here