Appendix H — Statistical computing in R
There are an overwhelming number of great resources for learning R; here are some recommendations:
- Introduction to modern R: Wickham, Çetinkaya-Rundel, and Grolemund (2023)
- Advanced R programming: Wickham (2019)
- Examples of graphics: Chang (2024)
- Building R packages: Wickham and Bryan (2023)
- Translations from SAS: Kleinman and Horton (2009)
H.1 Functions
- Read this ASAP: https://r4ds.hadley.nz/functions.html
- Use this as a reference: https://adv-r.hadley.nz/functions.html
H.1.1 Methods versus functions
H.1.2 Debugging R and C code
See https://www.maths.ed.ac.uk/~swood34/RCdebug/RCdebug.html
H.2 data.frame
s and tibble
s
H.2.1 Displaying tibble
s
See vignette("digits", package = "tibble")
H.3 The tidyverse
The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
- https://www.tidyverse.org/
These packages are being actively developed by Hadley Wickham and his colleagues at posit1.
Details:
H.4 Piping
See Wickham, Çetinkaya-Rundel, and Grolemund (2023) for details.
There are currently (2024) two commonly-used pipe operators in R:
%>%
: the “magrittr
pipe”, from themagrittr
package (Bache and Wickham (2022); re-exported bydplyr
and others) .|>
: the “native pipe”, from base R (≥4.1.0)
H.4.1 Which pipe should I use?
Wickham, Çetinkaya-Rundel, and Grolemund (2023) recommends the native pipe:
For simple cases, |> and %>% behave identically. So why do we recommend the base pipe? Firstly, because it’s part of base R, it’s always available for you to use, even when you’re not using the tidyverse. Secondly, |> is quite a bit simpler than %>%: in the time between the invention of %>% in 2014 and the inclusion of |> in R 4.1.0 in 2021, we gained a better understanding of the pipe. This allowed the base implementation to jettison infrequently used and less important features.
H.4.2 Why doesn’t ggplot2
use piping?
Here’s tidyverse
creator Hadley Wickham’s answer (from 2018):
I think it’s worth unpacking this question into a few smaller pieces:
- Should ggplot2 use the pipe? IMO, yes.
- Could ggplot2 support both the pipe and plus? No
- Would it be worth it to create a ggplot3 that uses the pipe? No.
https://forum.posit.co/t/why-cant-ggplot2-use/4372/7
H.5 Quarto
Quarto is a system for writing documents with embedded R code and/or results:
- Read this ASAP: https://r4ds.hadley.nz/communicate
- Then use this for reference: https://quarto.org/docs/reference/
H.6 Packages
This book espouses our philosophy of package development: anything that can be automated, should be automated. Do as little as possible by hand. Do as much as possible with functions. The goal is to spend your time thinking about what you want your package to do rather than thinking about the minutiae of package structure.
https://r-pkgs.org/introduction.html#:~:text=This%20book%20espouses,of%20package%20structure.
Read this ASAP: https://r-pkgs.org/whole-game.html
Use the rest of Wickham and Bryan (2023) as a reference
H.7 Git
94% of respondents to a 2022 Stack Overflow survey reported using git for version control link
H.8 Spatial data science
- Pebesma and Bivand (2023)