Appendix J — Statistical computing in R

Published

Last modified: 2024-12-13: 0:43:41 (AM)

J.1 Online R learning resources

There are an overwhelming number of great resources for learning R; here are some recommendations:

  • The RStudio Education website, especially:
  • R for Epidemiology (Cannell and Livingston (2024))
  • The Epidemiologist R Handbook (Batra (2024))
  • R for Data Science (Wickham, Çetinkaya-Rundel, and Grolemund (2023))
  • Advanced R (Wickham (2019))
  • R Graphics Cookbook (Chang (2024))
  • R Packages (Wickham and Bryan (2023))
  • Nahhas (2023) (same author as Nahhas (2024))
  • Myatt (2022)
  • Aragon (2017) (previously Aragon (2013)): Author is State Public Health Officer and Director, California Department of Public Health, https://drtomasaragon.github.io/)
  • SAS and R (Kleinman and Horton (2009))
    • The procs package in R provides versions of common SAS procedures, such as ‘proc freq’, ‘proc means’, ‘proc ttest’, ‘proc reg’, ‘proc transpose’, ‘proc sort’, and ‘proc print’
  • R for SAS and SPSS users (Muenchen (2011))
  • Building reproducible analytical pipelines with R (Rodrigues (2023))
  • Posit Recipes: Some tasty R code snippets: https://posit.cloud/learn/recipes

J.2 UC Davis R programming courses

There are several dedicated UC Davis courses on R programming:

DataLab maintains another list of courses: https://datalab.ucdavis.edu/courses/

DataLab also provides short-form workshops on R programming and data science: https://datalab.ucdavis.edu/workshops/

J.3 Functions

J.3.1 Methods versus functions

See https://adv-r.hadley.nz/oo.html#oop-systems

J.3.2 Debugging R and C code

See https://www.maths.ed.ac.uk/~swood34/RCdebug/RCdebug.html

J.4 data.frames and tibbles

J.4.1 Displaying tibbles

See vignette("digits", package = "tibble")

J.5 The tidyverse

The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.

  • https://www.tidyverse.org/

These packages are being actively developed by Hadley Wickham and his colleagues at posit1.

Details:

  • Wickham et al. (2019)
  • Wickham, Çetinkaya-Rundel, and Grolemund (2023)
  • Kuhn and Silge (2022)

J.6 Piping

See Wickham, Çetinkaya-Rundel, and Grolemund (2023) for details.

There are currently (2024) two commonly-used pipe operators in R:

  • %>%: the “magrittr pipe”, from the magrittr package (Bache and Wickham (2022); re-exported by dplyr and others) .

  • |>: the “native pipe”, from base R (\(\geq\) 4.1.0)

J.6.1 Which pipe should I use?

Wickham, Çetinkaya-Rundel, and Grolemund (2023) recommends the native pipe:

For simple cases, |> and %>% behave identically. So why do we recommend the base pipe? Firstly, because it’s part of base R, it’s always available for you to use, even when you’re not using the tidyverse. Secondly, |> is quite a bit simpler than %>%: in the time between the invention of %>% in 2014 and the inclusion of |> in R 4.1.0 in 2021, we gained a better understanding of the pipe. This allowed the base implementation to jettison infrequently used and less important features.

J.6.2 Why doesn’t ggplot2 use piping?

Here’s tidyverse creator Hadley Wickham’s answer (from 2018):

I think it’s worth unpacking this question into a few smaller pieces:

  • Should ggplot2 use the pipe? IMO, yes.
  • Could ggplot2 support both the pipe and plus? No
  • Would it be worth it to create a ggplot3 that uses the pipe? No.

https://forum.posit.co/t/why-cant-ggplot2-use/4372/7

J.7 Quarto

Quarto is a system for writing documents with embedded R code and/or results:

J.8 One source file, multiple outputs

One of quarto’s excellent features is the ability to convert the same source file into multiple output formats; in particular, I am using the same set of source files to generate an html website, a pdf document, and a set of revealjs slide decks.

I use ::: notes divs to mark text chunks to omit from the revealjs format but include in the website and pdf format.

J.9 Packages

This book espouses our philosophy of package development: anything that can be automated, should be automated. Do as little as possible by hand. Do as much as possible with functions. The goal is to spend your time thinking about what you want your package to do rather than thinking about the minutiae of package structure.

  • https://r-pkgs.org/introduction.html#:~:text=This%20book%20espouses,of%20package%20structure.

  • Read this ASAP: https://r-pkgs.org/whole-game.html

  • Use the rest of Wickham and Bryan (2023) as a reference

J.10 Submitting packages to CRAN

J.11 Git

94% of respondents to a 2022 Stack Overflow survey reported using git for version control.

More details

J.12 Spatial data science

  • Pebesma and Bivand (2023)

J.13 Shiny apps

  • Read Wickham (2021) first
  • Use Fay et al. (2021) as a reference

K Making the most of RStudio

Over time, explore all the tabs and menus; there are a lot of great quality-of-life features.

  • use the History tab to view past commands; you can rerun them or copy them into a source code file in one click! (up-arrow in the Console also enables this process, but less easily).

L Contributing to R

Many modern R packages are developed on Github, and welcome bug reports and pull requests (suggested edits to source code) through the Github interface.

To contribute to “base R” (the core systems), see https://contributor.r-project.org/


  1. the company formerly known as RStudio↩︎