Q & A for Data Analysis with R
Workshop Date: December 2, 2022
FAQs
- Q: Why is this course relevant today?
Learning the skills we teach you in this workshop will put you in the forefront of the Analytics movement. Not only will you learn to master the technical skills required to handle critical data manipulation and visualization using R, but you will also learn more advanced statistical methods. You will even learn to build basic predictive models. Ultimately, you will build an analytics toolkit and a mindset that is 100% transferable to the job you have today, and the job you deserve tomorrow.
Q: What is R? R is the world’s premier programming language for analytics and data visualization. It is 100% free and open source with an incredible user community that is constantly adding to its library of free packages. The power of R is applied everyday at the world’s leading companies including Google, Facebook, Uber, and Microsoft.
Q: Do I need to pass an exam to finish the workshop? No. we’ll monitor your progress along the way and intervene if necessary. You’ll end the course by completing a practical assignment that will help you bring your newly developed skills into practice.
Q: Is this course the right fit for me? If you’re not sure this course is the right fit, please visit our website and schedule a call (+8801771083855) with one of our advisors.
Q: After completion After completing the course, you will receive a certificate which is signed by your instructor and will be sent to you by post. The certificate number enables you to add your experience to the education section of your LinkedIn profile.
Q: Are there any prerequisites? No.
Answered Questions
- Q: How do I save PDFs of slides?
The website will stay up and you can access the slides; if you want to print it go to the slides in full screen view, convert to pdf mode and print
- Q: How can I see tables in viewer pane?
Global Options > RMarkdown > show output in Viewer Pane
- Q: Link for the code for Dan’s quarto slides?
https://github.com/ddsjoberg/clinical-reporting-gtsummary-rmed/tree/main/slides
- Q: Syntax for
set_variable_labels()
?
See function docs at: https://larmarange.github.io/labelled/reference/var_label.html
- Q: How to confirm labels worked?
str(df)
orview(df)
- Q: What is
|>
?
It’s the new base R pipe! See https://ivelasq.rbind.io/blog/understanding-the-r-pipe/ for more details
- Q: What statistics can you use in
tbl_summary(statistic=)
argument? How does the function know"{min}, {max}"
is the range?
There are a set of known names for default statistics that you can use in this function. These (plus glue syntax) allow you to display specific statistics. See more details here, under the
statistic argument
section: https://www.danieldsjoberg.com/gtsummary/reference/tbl_summary.html
- Q: How do you display mean instead of median?
Adjust the statistic argument:
%>%
trial select(age) %>%
tbl_summary(statistic = everything() ~ "{mean}")
Characteristic | N = 2001 |
---|---|
Age | 47 |
Unknown | 11 |
1 Mean |
- Q: With
add_n()
can you change the label to specify that it represents non-missing values?
Yes, you can change this with
col_label
argument:
|>
trial select(age, grade, trt) %>%
tbl_summary(
by = trt,
missing = "no"
|>
) add_overall() |>
add_n(col_label = "**N (Non-missing)**")
Characteristic | N (Non-missing) | Overall, N = 2001 | Drug A, N = 981 | Drug B, N = 1021 |
---|---|---|---|---|
Age | 189 | 47 (38, 57) | 46 (37, 59) | 48 (39, 56) |
Grade | 200 | |||
I | 68 (34%) | 35 (36%) | 33 (32%) | |
II | 68 (34%) | 32 (33%) | 36 (35%) | |
III | 64 (32%) | 31 (32%) | 33 (32%) | |
1 Median (IQR); n (%) |
- Q: How can you remove the death variable inside the
tbl_summary()
function?
Use a
dplyr::select()
prior to tbl_summary or use theinclude = argument
intbl_summary()
. Three options below:
<- trial |>
sm_trial select(age, grade, trt, death)
|>
sm_trial select(-death) %>%
tbl_summary(
by = trt,
missing = "no"
|>
) add_overall()
Characteristic | Overall, N = 2001 | Drug A, N = 981 | Drug B, N = 1021 |
---|---|---|---|
Age | 47 (38, 57) | 46 (37, 59) | 48 (39, 56) |
Grade | |||
I | 68 (34%) | 35 (36%) | 33 (32%) |
II | 68 (34%) | 32 (33%) | 36 (35%) |
III | 64 (32%) | 31 (32%) | 33 (32%) |
1 Median (IQR); n (%) |
|>
sm_trial tbl_summary(
by = trt,
include = -death,
missing = "no"
|>
) add_overall()
Characteristic | Overall, N = 2001 | Drug A, N = 981 | Drug B, N = 1021 |
---|---|---|---|
Age | 47 (38, 57) | 46 (37, 59) | 48 (39, 56) |
Grade | |||
I | 68 (34%) | 35 (36%) | 33 (32%) |
II | 68 (34%) | 32 (33%) | 36 (35%) |
III | 64 (32%) | 31 (32%) | 33 (32%) |
1 Median (IQR); n (%) |
|>
sm_trial tbl_summary(
by = trt,
include = !death,
missing = "no"
|>
) add_overall()
Characteristic | Overall, N = 2001 | Drug A, N = 981 | Drug B, N = 1021 |
---|---|---|---|
Age | 47 (38, 57) | 46 (37, 59) | 48 (39, 56) |
Grade | |||
I | 68 (34%) | 35 (36%) | 33 (32%) |
II | 68 (34%) | 32 (33%) | 36 (35%) |
III | 64 (32%) | 31 (32%) | 33 (32%) |
1 Median (IQR); n (%) |
- Q: Is it possible to add custom footnotes, say if you want to explain what a particular category includes etc.?
Yes, use
modify_footnote()
for column headers for footnotes that already exist. Or usemodify_table_styling()
for new footnotes. https://www.danieldsjoberg.com/gtsummary/reference/modify_table_styling.html
- Q: Does gtsummary support role selection like {recipes}?
Use the selectors
all_continuous()
,all_categorical()
, etc.
- Q: There is an error trying to use
add_difference()
withadd_p()
This is because you are trying to add two p’s to same table. You could consider doing two separate tables then using
tbl_merge()
to merge them, but presenting these two p-values in one table could be confusing.
- Q: It would be great to have an “asterisk_p” to show significant p-values, like
bold_p()
add_significance_stars()
already exists! https://www.danieldsjoberg.com/gtsummary/reference/add_significance_stars.html
- Q: If there are more than two groups, is there any function for post-hoc testing?
Here are supported tests: https://www.danieldsjoberg.com/gtsummary/reference/tests.html. Post-hoc tests might require custom testing. There are some relevant questions on stackoverflow.com that may be useful.
- Q: Can gtsummary give Bayesian statistics result?
See https://www.danieldsjoberg.com/gtsummary/reference/tests.html for supported tests. If it is not supported, you can use custom functions for p-values (see
add_p()
) oradd_stat()
to use a custom statistic.
- Q: The {rms} package fits are not supported?
You can file an issue for support. You can also use custom tidier. See Wald example in table gallery for tips on how to do this: https://www.danieldsjoberg.com/gtsummary/articles/gallery.html#wald-ci
- Q: How does it handle interaction terms?
Gorgeously:
<-
tbl lm(time ~ ph.ecog*sex, survival::lung) %>%
tbl_regression(label = list(ph.ecog = "ECOG Score", sex = "Sex"))
tbl
Characteristic | Beta | 95% CI1 | p-value |
---|---|---|---|
ECOG Score | -119 | -233, -5.5 | 0.040 |
Sex | 11 | -79, 102 | 0.8 |
ECOG Score * Sex | 43 | -33, 120 | 0.3 |
1 CI = Confidence Interval |
- Q: What does
exponentiate=TRUE
do again intbl_regression()
? I missed it.
Exponentiation coefficients so you can easily report OR instead of log(OR) or HRs instead of raw coefficients.
- Q: Is there a way to add an additional footnote saying T1 is the control? The default dash may be interpreted as a “missing value”.
modify_table_styling()
https://www.danieldsjoberg.com/gtsummary/reference/modify_table_styling.html or use argumentadd_estimate_to_reference_rows = TRUE
glm(death ~ age + grade, data = trial, family = binomial) %>%
tbl_regression( exponentiate = TRUE, add_estimate_to_reference_rows = FALSE ) %>%
add_global_p() %>%
modify_table_styling(columns = c(estimate, ci),
rows = reference_row == TRUE,
missing_symbol = "Ref." )
Characteristic | OR1 | 95% CI1 | p-value |
---|---|---|---|
Age | 1.01 | 0.99, 1.03 | 0.3 |
Grade | 0.053 | ||
I | Ref. | Ref. | |
II | 1.11 | 0.56, 2.24 | |
III | 2.28 | 1.11, 4.75 | |
1 OR = Odds Ratio, CI = Confidence Interval |
You can also use themes and change
tbl_regression-str:ref_row_text
option: https://www.danieldsjoberg.com/gtsummary/articles/themes.html
- Q: Can
add_global_p()
and the global p-value without removing the p-value for each level?
There is an arg in
add_global_p()
calledkeep
; that will keep both p-values. It isFALSE
by default
glm(death ~ age + grade, data = trial, family = binomial) %>%
tbl_regression(exponentiate = TRUE,
add_estimate_to_reference_rows = FALSE) %>%
add_global_p(keep = TRUE)
Characteristic | OR1 | 95% CI1 | p-value |
---|---|---|---|
Age | 1.01 | 0.99, 1.03 | 0.3 |
Grade | 0.053 | ||
I | — | — | |
II | 1.11 | 0.56, 2.24 | 0.8 |
III | 2.28 | 1.11, 4.75 | 0.025 |
1 OR = Odds Ratio, CI = Confidence Interval |
- Q:
tbl_regression()
: Are the p-values per Wald by default?
You need to look at the details for your specific model and what the summary function returns. You can change default via the tidier for your model. My tidier function example available in the gallery: https://www.danieldsjoberg.com/gtsummary/articles/gallery.html#wald-ci
- Q: Is there an option to compare multiple regression models within a single table? (e.g., when performing sensitivity analyses). I’m thinking of a situation where we are using different adjustment sets of covariates. I’m thinking of a situation where we are using different adjustment sets of covariates.
Use
tbl_merge()
- Q: Does it work with {tidymodels}?
There are currently some issues with dummy variables, but we are working on it for future release.
- Q: Can you have super/sub scripts in variable names
Use utf8 characters and markdown characters. {gt} sometimes has some issues with “common markdown” language so be aware of your output type and print engine, but most likely they will work.
- Q: Is the marginal adjusted mean estimated using the median of the other covariates fitting the model?
This is calculated outside the package. See
emmeans::emmeans()
for details on calculations.
- Q: Does
tbl_regression()
support ridge/lasso regressions? I usually use {glmnet} for these
Not currently supported but see https://github.com/ddsjoberg/gtsummary/issues/1280 for more details.
- Q: For sex, how do i get it to show the
Male
level instead of theFemale
reference group?
inline_text()
has alevel=
argument to select this.
- Q: If I change the CI to 90%, I need to update the pattern to 90% CI {ci} right?
If you are using the default pattern (
{conf.level}
), you don’t need to change anything, but if you hard code like 95%, then yes, usepattern = "odds ratio {estimate}; {conf.level*100}% CI {ci}"
- Q: Can I add LaTeX symbols in the inline text like chi square results?
Probably?
- Q: Can
add_glance_source_note()
, give just adjusted R^2 ?
Yes!
<- lm(age ~ marker + grade, trial) %>% tbl_regression()
mod %>%
mod add_glance_table(include = c(r.squared, adj.r.squared))
Characteristic | Beta | 95% CI1 | p-value |
---|---|---|---|
Marker Level (ng/mL) | -0.04 | -2.6, 2.5 | >0.9 |
Grade | |||
I | — | — | |
II | 0.64 | -4.7, 6.0 | 0.8 |
III | 2.4 | -2.8, 7.6 | 0.4 |
R² | 0.005 | ||
Adjusted R² | -0.012 | ||
1 CI = Confidence Interval |
- Q: How can I increase the font size of a gtsummary object in a xaringan slide?
You can set a theme for this, or check out print engine options. See https://www.danieldsjoberg.com/gtsummary/articles/rmarkdown.html
- Q: How do I decrease the height of the gtsummary table so it exports properly?
Depends on print engine and desired output format. See https://www.danieldsjoberg.com/gtsummary/articles/rmarkdown.html
- Q: What would the code be to export as XLS?
as_hux_xlsx()
. See: https://www.danieldsjoberg.com/gtsummary/reference/as_hux_table.html