The aim of this vignette is to illustrate the use/functionality of the `glm_coef`

function. `glm_coef`

can be used to display model coefficients with confidence intervals and p-values. The advantages and limitations of `glm_coef`

are:

- Recognises the main models used in epidemiology/public health.
- Automatically back-transforms estimates and confidence intervals, when the model requires it.
- Can use robust standard errors for the calculation of confidence intervals.
- Standard errors are used by default.
- The use of standard errors is restricted by the following classes of objects (models):
`gee`

,`glm`

and`survreg`

.

- Can display nice labels for the names of the parameters.
- Returns a data frame that can be modified and/or exported as tables for publications (with further editing).

We start by loading relevant packages and setting the theme for the plots (as suggested in the Template of this package):

For continuous outcomes there is no need of exponentiating the results unless the outcome was fitted in the log-scale. In our first example we want to estimate the effect of smoking and race on the birth weight of babies.

We can generate factors and assign labels in the same `pipe`

stream:

```
data(birthwt, package = "MASS")
birthwt <- birthwt %>%
mutate(
smoke = factor(smoke, labels = c("Non-smoker", "Smoker")),
race = factor(race, labels = c("White", "African American", "Other"))
) %>%
var_labels(
bwt = 'Birth weight (g)',
smoke = 'Smoking status',
race = 'Race'
)
```

Is good to start with some basic descriptive statistics, so we can compare the birth weight between groups.

```
birthwt %>%
group_by(race, smoke) %>%
summarise(
n = n(),
Mean = round(mean(bwt, na.rm = TRUE), 2),
SD = round(sd(bwt, na.rm = TRUE), 2),
Median = round(median(bwt, na.rm = TRUE), 2),
CV = round(rel_dis(bwt), 2)
) %>%
kable(caption = "Descriptive statistics of birth weight (g) by ethnicity
and smoking status.")
```

race | smoke | n | Mean | SD | Median | CV |
---|---|---|---|---|---|---|

White | Non-smoker | 44 | 3428.75 | 710.10 | 3593.0 | 0.21 |

White | Smoker | 52 | 2826.85 | 626.47 | 2775.5 | 0.22 |

African American | Non-smoker | 16 | 2854.50 | 621.25 | 2920.0 | 0.22 |

African American | Smoker | 10 | 2504.00 | 637.06 | 2381.0 | 0.25 |

Other | Non-smoker | 55 | 2815.78 | 709.35 | 2807.0 | 0.25 |

Other | Smoker | 12 | 2757.17 | 810.04 | 3146.5 | 0.29 |

From the previous table, the group with the lower birth weight was from babies born from African Americans who were smokers. The highest birth weight was from babies born from White non-smokers.

Another way to compare the means between the groups is with `gen_bst_df`

which estimates means with corresponding bootstrapped CIs.

```
birthwt %>%
gen_bst_df(bwt ~ race|smoke) %>%
kable(caption = "Mean birth weight (g) by ethnicity
and smoking status with 95% CIs.")
```

Birth weight (g) | LowerCI | UpperCI | Race | Smoking status |
---|---|---|---|---|

3428.75 | 3204.29 | 3638.16 | White | Non-smoker |

2826.85 | 2662.15 | 2989.64 | White | Smoker |

2854.50 | 2552.06 | 3133.33 | African American | Non-smoker |

2504.00 | 2124.72 | 2860.14 | African American | Smoker |

2815.78 | 2632.59 | 2988.06 | Other | Non-smoker |

2757.17 | 2292.10 | 3127.91 | Other | Smoker |

Another approach to tabular analysis is graphical analysis. For this comparison, box-plots would be the way to go, but in health sciences it is more common to see bar charts with error bars.

```
birthwt %>%
bar_error(bwt ~ race, fill = ~ smoke) %>%
axis_labs() %>%
gf_labs(fill = "Smoking status:")
```

We fit a linear model.

Note:Model diagnostics are not be discussed in this vignette.

First, we load the following packages to help us with regression:

Traditional output from the model:

```
model_norm %>%
Anova()
Anova Table (Type II tests)
Response: bwt
Sum Sq Df F value Pr(>F)
smoke 7322575 1 15.4588 0.0001191 ***
race 8712354 2 9.1964 0.0001557 ***
Residuals 87631356 185
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
model_norm %>%
summary()
Call:
lm(formula = bwt ~ smoke + race, data = birthwt)
Residuals:
Min 1Q Median 3Q Max
-2313.95 -440.22 15.78 492.14 1655.05
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3334.95 91.78 36.338 < 2e-16 ***
smokeSmoker -428.73 109.04 -3.932 0.000119 ***
raceAfrican American -450.36 153.12 -2.941 0.003687 **
raceOther -452.88 116.48 -3.888 0.000141 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 688.2 on 185 degrees of freedom
Multiple R-squared: 0.1234, Adjusted R-squared: 0.1092
F-statistic: 8.683 on 3 and 185 DF, p-value: 2.027e-05
```

Table of coefficients:

```
model_norm %>%
glm_coef()
Coefficient Pr(>|t|)
(Intercept) 3334.95 (3036.11, 3633.78) < 0.001
smokeSmoker -428.73 (-667.02, -190.44) < 0.001
raceAfrican American -450.36 (-750.31, -150.4) 0.003
raceOther -452.88 (-775.17, -130.58) 0.006
```

Once we know the order in which the parameters are displayed, we can add labels to our final table:

Note:Compare results using naive and robust standard errors.

```
model_norm %>%
glm_coef(se_rob = FALSE, labels = c("Constant",
"Smoker - No smoker",
"African American - White",
"Other - White")) %>%
kable(caption = "Table of coefficients using naive standard errors.",
align = 'r')
```

Coefficient | Pr(>|t|) | |
---|---|---|

Constant | 3334.95 (3153.89, 3516.01) | < 0.001 |

Smoker - No smoker | -428.73 (-643.86, -213.6) | < 0.001 |

African American - White | -450.36 (-752.45, -148.27) | 0.004 |

Other - White | -452.88 (-682.67, -223.08) | < 0.001 |

```
model_norm %>%
glm_coef(labels = c("Constant",
"Smoker - No smoker",
"African American - White",
"Other - White")) %>%
kable(caption = "Table of coefficients using robust standard errors.",
align = "r")
```

Coefficient | Pr(>|t|) | |
---|---|---|

Constant | 3334.95 (3036.11, 3633.78) | < 0.001 |

Smoker - No smoker | -428.73 (-667.02, -190.44) | < 0.001 |

African American - White | -450.36 (-750.31, -150.4) | 0.003 |

Other - White | -452.88 (-775.17, -130.58) | 0.006 |

The function `glance`

from the `broom`

package allow us to have a quick look at statistics related with the model.

```
model_norm %>% glance %>% round(digits = 2)
# A tibble: 1 x 11
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.12 0.11 688. 8.68 0 4 -1501. 3012. 3028.
# â€¦ with 2 more variables: deviance <dbl>, df.residual <dbl>
```

To construct the effect plot, we can use `plot_model`

from the `sjPlot`

package. The advantage of `plot_model`

is that recognises labelled data and uses that information for annotating the plots.

When the explanatory variables are categorical, another option is `emmip`

from the `emmeans`

package. We can include CIs in `emmip`

but as estimates are *connected*, the resulting plots look more messy, so I recommend `emmip`

to look at the trace.

For logistic regression we are interested in the odds ratios. We will look at the effect of amount of fibre intake on the development of coronary heart disease.

```
data(diet, package = "Epi")
diet <- diet %>%
mutate(
chd = factor(chd, labels = c("No CHD", "CHD"))
) %>%
var_labels(
chd = "Coronary Heart Disease",
fibre = "Fibre intake (10 g/day)"
)
```

We start with descriptive statistics:

Coronary Heart Disease | N | Min. | Max. | Mean | Median | SD | CV | |
---|---|---|---|---|---|---|---|---|

Fibre intake (10 g/day) | No CHD | 288 | 0.60 | 5.35 | 1.75 | 1.69 | 0.58 | 0.33 |

CHD | 45 | 0.76 | 2.43 | 1.49 | 1.51 | 0.40 | 0.27 |

It is standard to plot the dependent variable in the \(y\)-axis, so in this case, we can use horizontal box-plots.

We fit a linear logistic model:

```
model_binom <- glm(chd ~ fibre, data = diet, family = binomial)
model_binom %>%
glm_coef(labels = c("Constant", "Fibre intake (g/day)")) %>%
kable(caption = "Parameter estimates from logistic regression.", align = "r")
```

Odds ratio | Pr(>|z|) | |
---|---|---|

Constant | 0.95 (0.3, 3.01) | 0.934 |

Fibre intake (g/day) | 0.33 (0.16, 0.67) | 0.002 |

```
model_binom %>% glance %>% round(digits = 2)
# A tibble: 1 x 7
null.deviance df.null logLik AIC BIC deviance df.residual
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 264. 332 -127. 258. 265. 254. 331
```

Effect plot:

We will look at a matched case-control study on the effect of oestrogen use and history of gall bladder disease on the development of endometrial cancer.

```
data(bdendo, package = "Epi")
bdendo <- bdendo %>%
mutate(
cancer = factor(d, labels = c('Control', 'Case')),
gall = factor(gall, labels = c("No GBD", "GBD")),
est = factor(est, labels = c("No oestrogen", "Oestrogen"))
) %>%
var_labels(
cancer = 'Endometrial cancer',
gall = 'Gall bladder disease',
est = 'Oestrogen'
)
```

We start with descriptive statistics:

Endometrial cancer | levels | Control | Case | Total |
---|---|---|---|---|

Total N (%) | 252 (80.0) | 63 (20.0) | 315 | |

Oestrogen | No oestrogen | 125 (49.6) | 7 (11.1) | 132 (41.9) |

Oestrogen | 127 (50.4) | 56 (88.9) | 183 (58.1) | |

Gall bladder disease | No GBD | 228 (90.5) | 46 (73.0) | 274 (87.0) |

GBD | 24 (9.5) | 17 (27.0) | 41 (13.0) |

We fit the conditional logistic model:

```
library(survival)
model_clogit <- clogit(cancer == 'Case' ~ est * gall + strata(set), data = bdendo)
model_clogit %>%
glm_coef(labels = c("Oestrogen/No oestrogen", "GBD/No GBD",
"Oestrogen:GBD Interaction")) %>%
kable()
```

Odds ratio | Pr(>|z|) | |
---|---|---|

Oestrogen/No oestrogen | 14.88 (4.49, 49.36) | < 0.001 |

GBD/No GBD | 18.07 (3.2, 102.01) | 0.001 |

Oestrogen:GBD Interaction | 0.13 (0.02, 0.9) | 0.039 |

```
model_clogit %>% glance %>% round(digits = 2)
# A tibble: 1 x 15
n nevent statistic.log p.value.log statistic.sc p.value.sc statistic.wald
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 315 63 49.3 0 40.6 0 25.1
# â€¦ with 8 more variables: p.value.wald <dbl>, r.squared <dbl>,
# r.squared.max <dbl>, concordance <dbl>, std.error.concordance <dbl>,
# logLik <dbl>, AIC <dbl>, BIC <dbl>
```

Creating data frame needed to construct the effect plot:

Effect plot:

We use data about house satisfaction.

```
library(ordinal)
data(housing, package = "MASS")
housing <- housing %>%
var_labels(
Sat = "Satisfaction",
Infl = "Perceived influence",
Type = "Type of rental",
Cont = "Contact"
)
```

We fit the ordinal logistic model:

```
labs_ord <- c("Constant: Low/Medium satisfaction",
"Constant: Medium/High satisfaction",
"Perceived influence: Medium/Low",
"Perceived influence: High/Low",
"Accommodation: Apartment/Tower",
"Accommodation: Atrium/Tower",
"Accommodation: Terrace/Tower",
"Afforded: High/Low")
kable(glm_coef(model_clm, labels = labs_ord),
caption = "Parameter estimates on satisfaction of householders.")
```

Ordinal OR | Pr(>|Z|) | |
---|---|---|

Constant: Low/Medium satisfaction | 0.61 (0.48, 0.78) | < 0.001 |

Constant: Medium/High satisfaction | 2 (1.56, 2.55) | < 0.001 |

Perceived influence: Medium/Low | 1.76 (1.44, 2.16) | < 0.001 |

Perceived influence: High/Low | 3.63 (2.83, 4.66) | < 0.001 |

Accommodation: Apartment/Tower | 0.56 (0.45, 0.71) | < 0.001 |

Accommodation: Atrium/Tower | 0.69 (0.51, 0.94) | 0.018 |

Accommodation: Terrace/Tower | 0.34 (0.25, 0.45) | < 0.001 |

Afforded: High/Low | 1.43 (1.19, 1.73) | < 0.001 |

```
model_clm %>% glance %>% round(digits = 2)
# A tibble: 1 x 5
edf logLik AIC BIC df.residual
<dbl> <dbl> <dbl> <dbl> <dbl>
1 8 -1740. 3495. 3539. 1673
```

Effect plot:

```
plot_model(model_clm, type = "pred", terms = c("Infl", "Cont"),
dot.size = 1, title = "") %>%
gf_theme(axis.text.x = element_text(angle = 45, hjust = 1))
```

```
plot_model(model_clm, type = "pred", terms = c("Infl", "Type"),
dot.size = 1, title = "") %>%
gf_theme(axis.text.x = element_text(angle = 45, hjust = 1))
```

Note:In the previous table parameter estimates and confidence intervals forPerceived influenceandAccommodationwere not adjusted for multiple comparisons.

For Poisson regression we are interested in incidence rate ratios. We will look at the effect of sex, ethnicity and age group on number of absent days from school in a year.

```
data(quine, package = "MASS")
levels(quine$Eth) <- c("Aboriginal", "White")
levels(quine$Sex) <- c("Female", "Male")
```

```
quine <- quine %>%
var_labels(
Days = "Number of absent days",
Eth = "Ethnicity",
Age = "Age group"
)
```

Descriptive statistics:

```
quine %>%
group_by(Eth, Sex, Age) %>%
summarise(
n = n(),
Mean = round(mean(Days, na.rm = TRUE), 2),
SD = round(sd(Days, na.rm = TRUE), 2),
Median = round(median(Days, na.rm = TRUE), 2)
) %>%
kable()
```

Eth | Sex | Age | n | Mean | SD | Median |
---|---|---|---|---|---|---|

Aboriginal | Female | F0 | 5 | 17.60 | 17.37 | 11 |

Aboriginal | Female | F1 | 15 | 18.87 | 16.33 | 13 |

Aboriginal | Female | F2 | 9 | 32.56 | 27.31 | 20 |

Aboriginal | Female | F3 | 9 | 14.56 | 14.85 | 10 |

Aboriginal | Male | F0 | 8 | 11.50 | 7.23 | 12 |

Aboriginal | Male | F1 | 5 | 9.60 | 4.51 | 7 |

Aboriginal | Male | F2 | 11 | 30.91 | 17.81 | 32 |

Aboriginal | Male | F3 | 7 | 27.14 | 10.37 | 28 |

White | Female | F0 | 5 | 19.80 | 9.68 | 20 |

White | Female | F1 | 17 | 7.76 | 6.48 | 6 |

White | Female | F2 | 10 | 5.70 | 4.97 | 4 |

White | Female | F3 | 10 | 13.50 | 11.49 | 12 |

White | Male | F0 | 9 | 13.56 | 20.85 | 7 |

White | Male | F1 | 9 | 5.56 | 5.39 | 5 |

White | Male | F2 | 10 | 15.20 | 12.88 | 12 |

White | Male | F3 | 7 | 27.29 | 22.93 | 27 |

We start by fitting a standard Poisson linear regression model:

```
model_pois <- glm(Days ~ Eth + Sex + Age, family = poisson, data = quine)
glm_coef(model_pois) %>% kable
```

Rate ratio | Pr(>|z|) | |
---|---|---|

(Intercept) | 17.66 (10.66, 29.24) | < 0.001 |

EthWhite | 0.59 (0.39, 0.88) | 0.01 |

SexMale | 1.11 (0.77, 1.6) | 0.57 |

AgeF1 | 0.8 (0.42, 1.5) | 0.482 |

AgeF2 | 1.42 (0.85, 2.36) | 0.181 |

AgeF3 | 1.35 (0.77, 2.34) | 0.292 |

```
model_pois %>% glance %>% round(digits = 2)
# A tibble: 1 x 7
null.deviance df.null logLik AIC BIC deviance df.residual
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2074. 145 -1165. 2343. 2361. 1742. 140
```

The assumption is that the mean is equal than the variance. If that is the case, deviance should be close to the degrees of freedom of the residuals (look at the above output from `glance`

). In other words, the following calculation should be close to 1:

Thus, we have over-dispersion. One option is to use a negative binomial distribution.

```
library(MASS)
model_negbin <- glm.nb(Days ~ Eth + Sex + Age, data = quine)
glm_coef(model_negbin,
labels=c(
"Constant",
"Race: Aboriginal/White",
"Sex: Female/Male",
"F1/Primary",
"F2/Primary",
"F3/Primary")
) %>%
kable()
```

Rate ratio | Pr(>|z|) | |
---|---|---|

Constant | 20.24 (12.24, 33.47) | < 0.001 |

Race: Aboriginal/White | 0.57 (0.38, 0.84) | 0.005 |

Sex: Female/Male | 1.07 (0.74, 1.53) | 0.73 |

F1/Primary | 0.69 (0.39, 1.23) | 0.212 |

F2/Primary | 1.2 (0.7, 2.03) | 0.507 |

F3/Primary | 1.29 (0.73, 2.28) | 0.385 |

```
model_negbin %>% glance %>% round(digits = 2)
# A tibble: 1 x 7
null.deviance df.null logLik AIC BIC deviance df.residual
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 192. 145 -548. 1110. 1131. 168. 140
```

Notice that age group is a factor with more than two levels and is significant:

```
model_negbin %>%
Anova()
Analysis of Deviance Table (Type II tests)
Response: Days
LR Chisq Df Pr(>Chisq)
Eth 12.6562 1 0.0003743 ***
Sex 0.1486 1 0.6998722
Age 9.4844 3 0.0234980 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Thus, we want to report confidence intervals and \(p\)-values adjusted for multiple comparisons.

Effect plot:

We adjust for multiple comparisons:

```
multiple(model_negbin, ~ Age|Eth)$df
contrast Eth estimate SE df z.ratio p.value
1 F0 - F1 Aboriginal 0.37 0.23 Inf 1.57 0.395
2 F0 - F2 Aboriginal -0.18 0.23 Inf -0.77 0.865
3 F0 - F3 Aboriginal -0.25 0.24 Inf -1.04 0.724
4 F1 - F2 Aboriginal -0.55 0.21 Inf -2.65 0.039
5 F1 - F3 Aboriginal -0.62 0.21 Inf -2.89 0.020
6 F2 - F3 Aboriginal -0.07 0.22 Inf -0.34 0.987
7 F0 - F1 White 0.37 0.23 Inf 1.57 0.395
8 F0 - F2 White -0.18 0.23 Inf -0.77 0.865
9 F0 - F3 White -0.25 0.24 Inf -1.04 0.724
10 F1 - F2 White -0.55 0.21 Inf -2.65 0.040
11 F1 - F3 White -0.62 0.21 Inf -2.89 0.020
12 F2 - F3 White -0.07 0.22 Inf -0.34 0.987
```

We can see the comparison graphically with:

We will use an example on the effect of thiotepa versus placebo on the development of bladder cancer.

```
data(bladder)
bladder <- bladder %>%
mutate(times = stop,
rx = factor(rx, labels=c("Placebo", "Thiotepa"))
) %>%
var_labels(times = "Survival time",
rx = "Treatment")
```

Using robust standard errors (default):

Survival time ratio | Pr(>|z|) | |
---|---|---|

Treatment: Thiotepa/Placebo | 1.64 (0.89, 3.04) | 0.116 |

Scale | 1 (0.85, 1.18) | 0.992 |

In this example the scale parameter is not statistically different from one, meaning hazard is constant and thus, we can use the exponential distribution:

```
model_exp <- survreg(Surv(times, event) ~ rx, data = bladder, dist = "exponential")
model_exp %>%
glm_coef(labels = c("Treatment: Thiotepa/Placebo")) %>%
kable()
```

Survival time ratio | Pr(>|z|) | |
---|---|---|

Treatment: Thiotepa/Placebo | 1.64 (0.85, 3.16) | 0.139 |

```
model_exp %>% glance %>% round(digits = 2)
# A tibble: 1 x 8
iter df statistic p.value logLik AIC BIC df.residual
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5 2 6.54 0.01 -594. 1192. 1199. 338
```

Interpretation:Patients receiving Thiotepa live on average 64% more than those in the Placebo group.

Using naive standard errors:

Survival time ratio | Pr(>|z|) | |
---|---|---|

Treatment: Thiotepa/Placebo | 1.64 (1.11, 2.41) | 0.012 |

Hazard ratio | Pr(>|z|) | |
---|---|---|

Treatment: Thiotepa/Placebo | 0.64 (0.44, 0.94) | 0.024 |

```
model_cox %>% glance %>% round(digits = 2)
# A tibble: 1 x 15
n nevent statistic.log p.value.log statistic.sc p.value.sc statistic.wald
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 340 112 5.24 0.02 5.14 0.02 5.06
# â€¦ with 8 more variables: p.value.wald <dbl>, r.squared <dbl>,
# r.squared.max <dbl>, concordance <dbl>, std.error.concordance <dbl>,
# logLik <dbl>, AIC <dbl>, BIC <dbl>
```

Interpretation:Patients receiving Thiotepa are 64% less likely of dying than those in the Placebo group.

We look at the relationship between sex and age on the distance from the pituitary to the pterygomaxillary fissure (mm).

We fit the model:

```
model_lme <- lme(distance ~ Sex * I(age - mean(age, na.rm = TRUE)), random = ~ 1|Subject,
method = "ML", data = Orthodont)
```

```
model_lme %>%
glm_coef(labels = c("Constant", "Sex: female-male", "Age (years)",
"Sex:Age interaction")) %>%
kable()
```

Coefficient | Pr(>|t|) | |
---|---|---|

Constant | 24.97 (24.03, 25.9) | < 0.001 |

Sex: female-male | -2.32 (-3.78, -0.86) | 0.005 |

Age (years) | 0.78 (0.63, 0.94) | < 0.001 |

Sex:Age interaction | -0.3 (-0.54, -0.07) | 0.015 |

```
model_lme %>% glance
# A tibble: 1 x 5
sigma logLik AIC BIC deviance
<dbl> <dbl> <dbl> <dbl> <lgl>
1 1.37 -214. 441. 457. NA
```

Effect plot: