Appendix E Sample cheat sheet

As you make progress with Datacamp modules, you will learn an increasing number of new R functions, routines, etc. Before you realize it, the number will exceed what your memory can handle. This is when an external memory is needed: or what I call a cheat sheet.

I suggest that you start with an empty file called master-cheatsheet.R. You can place the file inside one of the project folders that you created in Chapter 1. From now on, whenever you encounter a new function or a new way of using a familiar function, I suggest that you write them down in the cheat sheet. I attach a snippet of a sample cheat sheet below for your reference. By no means do you have to use the same style. I will periodically select a few cheat sheets from the class, take a snippet from each and post them here. Each one has some merits that are worth learning from for all of us.


  • A good example is worth a thousand lines of comments.
  • Be generous with spaces and empty lines to visually separate comments from code, and between code chunks.
  • Be stingy with special symbols such as *, #, _. Avoid littering them all over the file.
  • Having too much information is as useless as having too little. At some point, you have to be selective about what to include in your cheat sheet.
  • Inject structures into the file with headings of different levels.

E.1 Sample cheat sheet 1


# Purpose: View the structure of the data
# Commmands: str(), dplyr::glimpse()
# Example:

# Purpose: Make an if else statement
# Commmands: ifelse([logical condition], [do this if true], [do this if false])
# Example:
hsb2 <- hsb2 %>%
  dplyr::mutate(read_cat = ifelse( #create a new variable: read_cat
      read < avg_read, # logical condition
      "below average", # if case
      "at or above average" # else case

# Purpose: Specify transparency of points
# Commmands: ggplot2::geom_point(alpha = )
# Example:
# Make the points 40% opaque
ggplot2::ggplot(data = diamonds, 
                mapping = ggplot2::aes(carat, price, color = clarity)) +
  ggplot2::geom_point(alpha=0.4) +


The author has excelled at both content and style of their cheat sheet. The whole file consists of numerous identically structured pieces. Each piece consists of the same components: purpose, commands, and an accompanying example. The comments are usually concise but with carefully chosen words. The usage of # is not excessive — it is used to indicate non-code lines or the beginning of a piece. The double blank lines are used to separate any two adjacent pieces.

Room for improvement: as the file length grows, some hierarchical structure may be helpful. See the next example (E.2).

E.2 Sample cheat sheet 2


title: Cheat Sheet
    toc: true
        collapsed: false
        smooth_scroll: false

```{r setup, include=F}

```{r packages-and-data, include=F}
gg <- import::from(ggplot2, .all=T, .into={new.env()})
import::from(nycflights13, df_flights = flights)
import::from(magrittr, "%>%")

## Visualization

### Scatterplots

AKA bivariate plots. 
They allow you to visualize the *relationship* between two **continuous** variables. 

# import the data frame and name it “flights”
import::from(nycflights13, df_flights = flights)

# choose all rows related to Alaska Airlaine carrier
df_alaska_flights <- df_flights %>% 
  dplyr::filter(carrier == "AS")

# built a scatter plot
gg$ggplot(data = df_alaska_flights, 
          mapping = gg$aes(x = dep_delay, y = arr_delay)) +

### Overplotting

When points are being plotted on top of each other over and over again 
and it is difficult to know the number of points being plotted.

+ Method 1: Changing the transparency of the points

Setting the `alpha` argument in `geom_point` 
(usually, `alpha` argument is set by default at 1 point – 100% opaque). 
By specifying a value of `alpha` argument, 
we can change the transparency of the points (to less than 1).

```{r demo-alpha eval=T}
df_alaska_flights <- df_flights %>% 
  dplyr::filter(carrier == "AS")

gg$ggplot(data = df_alaska_flights, 
          mapping = gg$aes(x = dep_delay, y = arr_delay)) +
  gg$geom_point(alpha = 0.2)

When you compile the Rmd file using the “knit” button in RStudio, the output would look like this:

Place your mouse cursor inside the box and try to scroll up and down to see the full sample.


The author took advantage of some nice features of rmarkdown and made their cheat sheet versatile. By setting eval=F in the first chunk (named “setup”), all the following R chunks would not be evaluated by default, suiting the design of a cheatsheet. However, whenever the author wants to show the output of a specific chunk, perhaps to remind themselves what the code does, they can turn on eval=T inside the header of a chunk, such as in the demo-alpha chunk. The author also structured the file with headings at different levels, using ##, ###, etc.

E.3 Feeling inspired?

Keep in mind that you will be working on YOUR cheat sheet and will refer to it from time to time as you learn to become a seasoned R user. Customize it. Own it. Put whatever content that is deemed useful to you. Each week, you will hand in an updated version of the same file — your cheat sheet, to fulfill the course requirement. I expect that everyone’s cheat sheet will evolve over time. Your cheat sheet by the end of the first week may contain a line on how to examine the structure of a dataframe, for example; and by the end of the semester, you may know this line by heart and no longer need it on the cheat sheet.