Appendix E Sample cheat sheet

As you make progress with Datacamp modules, you will learn an increasing number of new R functions, routines, etc. Before you realize it, the number will exceed what your memory can handle. This is when an external memory is needed: or what I call a cheat sheet.

I suggest that you start with an empty file called master-cheatsheet.R. You can place the file inside one of the project folders that you created in Chapter 1. From now on, whenever you encounter a new function or a new way of using a familiar function, I suggest that you write them down in the cheat sheet. I attach a snippet of a sample cheat sheet below for your reference. By no means do you have to use the same style. I will periodically select a few cheat sheets from the class, take a snippet from each and post them here. Each one has some merits that are worth learning from for all of us.

TL;DR

  • A good example is worth a thousand lines of comments.
  • Be generous with spaces and empty lines to visually separate comments from code, and between code chunks.
  • Be stingy with special symbols such as *, #, _. Avoid littering them all over the file.
  • Having too much information is as useless as having too little. At some point, you have to be selective about what to include in your cheat sheet.
  • Inject structures into the file with headings of different levels.

E.1 Sample cheat sheet 1

cheatsheet.txt

#
# Purpose: View the structure of the data
#
# Commmands: str(), dplyr::glimpse()
# Example:
str(hsb2)
dplyr::glimpse(hsb2)


#
# Purpose: Make an if else statement
#
# Commmands: ifelse([logical condition], [do this if true], [do this if false])
# Example:
hsb2 <- hsb2 %>%
  dplyr::mutate(read_cat = ifelse( #create a new variable: read_cat
      read < avg_read, # logical condition
      "below average", # if case
      "at or above average" # else case
      )
  )


#
# Purpose: Specify transparency of points
#
# Commmands: ggplot2::geom_point(alpha = )
# Example:
# Make the points 40% opaque
ggplot2::ggplot(data = diamonds, 
                mapping = ggplot2::aes(carat, price, color = clarity)) +
  ggplot2::geom_point(alpha=0.4) +
  ggplot2::geom_smooth()

Comments:

The author has excelled at both content and style of their cheat sheet. The whole file consists of numerous identically structured pieces. Each piece consists of the same components: purpose, commands, and an accompanying example. The comments are usually concise but with carefully chosen words. The usage of # is not excessive — it is used to indicate non-code lines or the beginning of a piece. The double blank lines are used to separate any two adjacent pieces.

Room for improvement: as the file length grows, some hierarchical structure may be helpful. See the next example (E.2).

E.2 Sample cheat sheet 2

cheatsheet.Rmd

---
title: Cheat Sheet
output: 
  html_document: 
    toc: true
    toc_float: 
        collapsed: false
        smooth_scroll: false
---

```{r setup, include=F}
knitr::opts_chunk$set(
                      tidy=F, 
                      eval=F
)
```

```{r packages-and-data, include=F}
gg <- import::from(ggplot2, .all=T, .into={new.env()})
import::from(nycflights13, df_flights = flights)
import::from(magrittr, "%>%")
```


## Visualization

### Scatterplots

AKA bivariate plots. 
They allow you to visualize the *relationship* between two **continuous** variables. 

```{r}
# import the data frame and name it “flights”
import::from(nycflights13, df_flights = flights)

# choose all rows related to Alaska Airlaine carrier
df_alaska_flights <- df_flights %>% 
  dplyr::filter(carrier == "AS")

# built a scatter plot
gg$ggplot(data = df_alaska_flights, 
          mapping = gg$aes(x = dep_delay, y = arr_delay)) +
  gg$geom_point()
```

### Overplotting

When points are being plotted on top of each other over and over again 
and it is difficult to know the number of points being plotted.


+ Method 1: Changing the transparency of the points

Setting the `alpha` argument in `geom_point` 
(usually, `alpha` argument is set by default at 1 point – 100% opaque). 
By specifying a value of `alpha` argument, 
we can change the transparency of the points (to less than 1).

```{r demo-alpha eval=T}
df_alaska_flights <- df_flights %>% 
  dplyr::filter(carrier == "AS")

gg$ggplot(data = df_alaska_flights, 
          mapping = gg$aes(x = dep_delay, y = arr_delay)) +
  gg$geom_point(alpha = 0.2)
```

When you compile the Rmd file using the “knit” button in RStudio, the output would look like this:

Place your mouse cursor inside the box and try to scroll up and down to see the full sample.

Comments:

The author took advantage of some nice features of rmarkdown and made their cheat sheet versatile. By setting eval=F in the first chunk (named “setup”), all the following R chunks would not be evaluated by default, suiting the design of a cheatsheet. However, whenever the author wants to show the output of a specific chunk, perhaps to remind themselves what the code does, they can turn on eval=T inside the header of a chunk, such as in the demo-alpha chunk. The author also structured the file with headings at different levels, using ##, ###, etc.

E.3 Feeling inspired?

Keep in mind that you will be working on YOUR cheat sheet and will refer to it from time to time as you learn to become a seasoned R user. Customize it. Own it. Put whatever content that is deemed useful to you. Each week, you will hand in an updated version of the same file — your cheat sheet, to fulfill the course requirement. I expect that everyone’s cheat sheet will evolve over time. Your cheat sheet by the end of the first week may contain a line on how to examine the structure of a dataframe, for example; and by the end of the semester, you may know this line by heart and no longer need it on the cheat sheet.