Q & A for Data Analysis with R

Workshop Date: December 2, 2022

FAQs

Q: Why is this course relevant today?

Learning the skills we teach you in this workshop will put you in the forefront of the Analytics movement. Not only will you learn to master the technical skills required to handle critical data manipulation and visualization using R, but you will also learn more advanced statistical methods. You will even learn to build basic predictive models. Ultimately, you will build an analytics toolkit and a mindset that is 100% transferable to the job you have today, and the job you deserve tomorrow.

Q: What is R? R is the world’s premier programming language for analytics and data visualization. It is 100% free and open source with an incredible user community that is constantly adding to its library of free packages. The power of R is applied everyday at the world’s leading companies including Google, Facebook, Uber, and Microsoft.
Q: Do I need to pass an exam to finish the workshop? No. we’ll monitor your progress along the way and intervene if necessary. You’ll end the course by completing a practical assignment that will help you bring your newly developed skills into practice.
Q: Is this course the right fit for me? If you’re not sure this course is the right fit, please visit our website and schedule a call (+8801771083855) with one of our advisors.
Q: After completion After completing the course, you will receive a certificate which is signed by your instructor and will be sent to you by post. The certificate number enables you to add your experience to the education section of your LinkedIn profile.
Q: Are there any prerequisites? No.

Answered Questions

Q: How do I save PDFs of slides?

The website will stay up and you can access the slides; if you want to print it go to the slides in full screen view, convert to pdf mode and print

Q: How can I see tables in viewer pane?

Global Options > RMarkdown > show output in Viewer Pane

Q: Link for the code for Dan’s quarto slides?

https://github.com/ddsjoberg/clinical-reporting-gtsummary-rmed/tree/main/slides

Q: Syntax for set_variable_labels()?

See function docs at: https://larmarange.github.io/labelled/reference/var_label.html

Q: How to confirm labels worked?

str(df) or view(df)

Q: What is |>?

It’s the new base R pipe! See https://ivelasq.rbind.io/blog/understanding-the-r-pipe/ for more details

Q: What statistics can you use in tbl_summary(statistic=) argument? How does the function know "{min}, {max}" is the range?

There are a set of known names for default statistics that you can use in this function. These (plus glue syntax) allow you to display specific statistics. See more details here, under the statistic argument section: https://www.danieldsjoberg.com/gtsummary/reference/tbl_summary.html

Q: How do you display mean instead of median?

Adjust the statistic argument:

trial %>%
  select(age) %>%
  tbl_summary(statistic = everything() ~ "{mean}")

Characteristic	N = 200¹
Age	47
Unknown	11
¹ Mean

Q: With add_n() can you change the label to specify that it represents non-missing values?

Yes, you can change this with col_label argument:

trial |>
  select(age, grade, trt) %>%
  tbl_summary(
    by = trt,
    missing = "no"
  ) |> 
  add_overall() |> 
  add_n(col_label = "**N (Non-missing)**")

Characteristic	N (Non-missing)	Overall, N = 200¹	Drug A, N = 98¹	Drug B, N = 102¹
Age	189	47 (38, 57)	46 (37, 59)	48 (39, 56)
Grade	200
I		68 (34%)	35 (36%)	33 (32%)
II		68 (34%)	32 (33%)	36 (35%)
III		64 (32%)	31 (32%)	33 (32%)
¹ Median (IQR); n (%)

Q: How can you remove the death variable inside the tbl_summary() function?

Use adplyr::select() prior to tbl_summary or use the include = argument in tbl_summary(). Three options below:

sm_trial <- trial |>
  select(age, grade, trt, death)

sm_trial |>
  select(-death) %>%
  tbl_summary(
    by = trt,
    missing = "no"
  ) |> 
  add_overall()

Characteristic	Overall, N = 200¹	Drug A, N = 98¹	Drug B, N = 102¹
Age	47 (38, 57)	46 (37, 59)	48 (39, 56)
Grade
I	68 (34%)	35 (36%)	33 (32%)
II	68 (34%)	32 (33%)	36 (35%)
III	64 (32%)	31 (32%)	33 (32%)
¹ Median (IQR); n (%)

sm_trial |>
  tbl_summary(
    by = trt,
    include = -death,
    missing = "no"
  ) |> 
  add_overall()

Characteristic	Overall, N = 200¹	Drug A, N = 98¹	Drug B, N = 102¹
Age	47 (38, 57)	46 (37, 59)	48 (39, 56)
Grade
I	68 (34%)	35 (36%)	33 (32%)
II	68 (34%)	32 (33%)	36 (35%)
III	64 (32%)	31 (32%)	33 (32%)
¹ Median (IQR); n (%)

sm_trial |>
  tbl_summary(
    by = trt,
    include = !death,
    missing = "no"
  ) |> 
  add_overall()

Characteristic	Overall, N = 200¹	Drug A, N = 98¹	Drug B, N = 102¹
Age	47 (38, 57)	46 (37, 59)	48 (39, 56)
Grade
I	68 (34%)	35 (36%)	33 (32%)
II	68 (34%)	32 (33%)	36 (35%)
III	64 (32%)	31 (32%)	33 (32%)
¹ Median (IQR); n (%)

Q: Is it possible to add custom footnotes, say if you want to explain what a particular category includes etc.?

Yes, use modify_footnote() for column headers for footnotes that already exist. Or use modify_table_styling() for new footnotes. https://www.danieldsjoberg.com/gtsummary/reference/modify_table_styling.html

Q: Does gtsummary support role selection like {recipes}?

Use the selectors all_continuous(), all_categorical(), etc.

Q: There is an error trying to use add_difference() with add_p()

This is because you are trying to add two p’s to same table. You could consider doing two separate tables then using tbl_merge() to merge them, but presenting these two p-values in one table could be confusing.

Q: It would be great to have an “asterisk_p” to show significant p-values, like bold_p()

add_significance_stars() already exists! https://www.danieldsjoberg.com/gtsummary/reference/add_significance_stars.html

Q: If there are more than two groups, is there any function for post-hoc testing?

Here are supported tests: https://www.danieldsjoberg.com/gtsummary/reference/tests.html. Post-hoc tests might require custom testing. There are some relevant questions on stackoverflow.com that may be useful.

Q: Can gtsummary give Bayesian statistics result?

See https://www.danieldsjoberg.com/gtsummary/reference/tests.html for supported tests. If it is not supported, you can use custom functions for p-values (see add_p()) or add_stat() to use a custom statistic.

Q: The {rms} package fits are not supported?

You can file an issue for support. You can also use custom tidier. See Wald example in table gallery for tips on how to do this: https://www.danieldsjoberg.com/gtsummary/articles/gallery.html#wald-ci

Q: How does it handle interaction terms?

Gorgeously:

tbl <-
      lm(time ~ ph.ecog*sex, survival::lung) %>%
      tbl_regression(label = list(ph.ecog = "ECOG Score", sex = "Sex"))
    
tbl

Characteristic	Beta	95% CI¹	p-value
ECOG Score	-119	-233, -5.5	0.040
Sex	11	-79, 102	0.8
ECOG Score * Sex	43	-33, 120	0.3
¹ CI = Confidence Interval

Q: What does exponentiate=TRUE do again in tbl_regression()? I missed it.

Exponentiation coefficients so you can easily report OR instead of log(OR) or HRs instead of raw coefficients.

Q: Is there a way to add an additional footnote saying T1 is the control? The default dash may be interpreted as a “missing value”.

modify_table_styling() https://www.danieldsjoberg.com/gtsummary/reference/modify_table_styling.html or use argument add_estimate_to_reference_rows = TRUE

glm(death ~ age + grade, data = trial, family = binomial) %>%
  tbl_regression( exponentiate = TRUE, add_estimate_to_reference_rows = FALSE ) %>%
  add_global_p() %>%
  modify_table_styling(columns = c(estimate, ci),
                       rows = reference_row == TRUE, 
                       missing_symbol = "Ref." )

Characteristic	OR¹	95% CI¹	p-value
Age	1.01	0.99, 1.03	0.3
Grade			0.053
I	Ref.	Ref.
II	1.11	0.56, 2.24
III	2.28	1.11, 4.75
¹ OR = Odds Ratio, CI = Confidence Interval

You can also use themes and change tbl_regression-str:ref_row_text option: https://www.danieldsjoberg.com/gtsummary/articles/themes.html

Q: Can add_global_p() and the global p-value without removing the p-value for each level?

There is an arg in add_global_p() called keep; that will keep both p-values. It is FALSE by default

glm(death ~ age + grade, data = trial, family = binomial) %>%
  tbl_regression(exponentiate = TRUE,
                 add_estimate_to_reference_rows = FALSE) %>%
  add_global_p(keep = TRUE)

Characteristic	OR¹	95% CI¹	p-value
Age	1.01	0.99, 1.03	0.3
Grade			0.053
I	—	—
II	1.11	0.56, 2.24	0.8
III	2.28	1.11, 4.75	0.025
¹ OR = Odds Ratio, CI = Confidence Interval

Q: tbl_regression(): Are the p-values per Wald by default?

You need to look at the details for your specific model and what the summary function returns. You can change default via the tidier for your model. My tidier function example available in the gallery: https://www.danieldsjoberg.com/gtsummary/articles/gallery.html#wald-ci

Q: Is there an option to compare multiple regression models within a single table? (e.g., when performing sensitivity analyses). I’m thinking of a situation where we are using different adjustment sets of covariates. I’m thinking of a situation where we are using different adjustment sets of covariates.

Use tbl_merge()

Q: Does it work with {tidymodels}?

There are currently some issues with dummy variables, but we are working on it for future release.

Q: Can you have super/sub scripts in variable names

Use utf8 characters and markdown characters. {gt} sometimes has some issues with “common markdown” language so be aware of your output type and print engine, but most likely they will work.

Q: Is the marginal adjusted mean estimated using the median of the other covariates fitting the model?

This is calculated outside the package. See emmeans::emmeans() for details on calculations.

Q: Does tbl_regression() support ridge/lasso regressions? I usually use {glmnet} for these

Not currently supported but see https://github.com/ddsjoberg/gtsummary/issues/1280 for more details.

Q: For sex, how do i get it to show the Male level instead of the Female reference group?

inline_text() has a level= argument to select this.

Q: If I change the CI to 90%, I need to update the pattern to 90% CI {ci} right?

If you are using the default pattern ({conf.level}), you don’t need to change anything, but if you hard code like 95%, then yes, use pattern = "odds ratio {estimate}; {conf.level*100}% CI {ci}"

Q: Can I add LaTeX symbols in the inline text like chi square results?

Probably?

Q: Can add_glance_source_note(), give just adjusted R^2 ?

Yes!

mod <- lm(age ~ marker + grade, trial) %>% tbl_regression()
mod %>% 
  add_glance_table(include = c(r.squared, adj.r.squared))

Characteristic	Beta	95% CI¹	p-value
Marker Level (ng/mL)	-0.04	-2.6, 2.5	>0.9
Grade
I	—	—
II	0.64	-4.7, 6.0	0.8
III	2.4	-2.8, 7.6	0.4
R²	0.005
Adjusted R²	-0.012
¹ CI = Confidence Interval

Q: How can I increase the font size of a gtsummary object in a xaringan slide?

You can set a theme for this, or check out print engine options. See https://www.danieldsjoberg.com/gtsummary/articles/rmarkdown.html

Q: How do I decrease the height of the gtsummary table so it exports properly?

Depends on print engine and desired output format. See https://www.danieldsjoberg.com/gtsummary/articles/rmarkdown.html

Q: What would the code be to export as XLS?

as_hux_xlsx(). See: https://www.danieldsjoberg.com/gtsummary/reference/as_hux_table.html