Appendix E Sample cheat sheet
As you make progress with Datacamp modules,
you will learn an increasing number of new R
functions, routines, etc.
Before you realize it, the number will exceed what your memory can handle.
This is when an external memory is needed: or what I call a cheat sheet.
I suggest that you start with an empty file called master-cheatsheet.R
.
You can place the file inside one of the project folders
that you created in Chapter 1.
From now on,
whenever you encounter a new function or a new way of using a familiar function,
I suggest that you write them down in the cheat sheet.
I attach a snippet of a sample cheat sheet below for your reference.
By no means do you have to use the same style.
I will periodically select a few cheat sheets from the class,
take a snippet from each and post them here.
Each one has some merits that are worth learning from for all of us.
TL;DR
- A good example is worth a thousand lines of comments.
- Be generous with spaces and empty lines to visually separate comments from code, and between code chunks.
- Be stingy with special symbols such as
*
,#
,_
. Avoid littering them all over the file. - Having too much information is as useless as having too little. At some point, you have to be selective about what to include in your cheat sheet.
- Inject structures into the file with headings of different levels.
E.1 Sample cheat sheet 1
cheatsheet.txt
#
# Purpose: View the structure of the data
#
# Commmands: str(), dplyr::glimpse()
# Example:
str(hsb2)
::glimpse(hsb2)
dplyr
#
# Purpose: Make an if else statement
#
# Commmands: ifelse([logical condition], [do this if true], [do this if false])
# Example:
<- hsb2 %>%
hsb2 ::mutate(read_cat = ifelse( #create a new variable: read_cat
dplyr< avg_read, # logical condition
read "below average", # if case
"at or above average" # else case
)
)
#
# Purpose: Specify transparency of points
#
# Commmands: ggplot2::geom_point(alpha = )
# Example:
# Make the points 40% opaque
::ggplot(data = diamonds,
ggplot2mapping = ggplot2::aes(carat, price, color = clarity)) +
::geom_point(alpha=0.4) +
ggplot2::geom_smooth() ggplot2
Comments:
The author has excelled at both content and style of their cheat sheet.
The whole file consists of numerous identically structured pieces.
Each piece consists of the same components: purpose, commands,
and an accompanying example.
The comments are usually concise but with carefully chosen words.
The usage of #
is not excessive — it is used to indicate
non-code lines or the beginning of a piece.
The double blank lines are used to separate any two adjacent pieces.
Room for improvement: as the file length grows, some hierarchical structure may be helpful. See the next example (E.2).
E.2 Sample cheat sheet 2
cheatsheet.Rmd
---
title: Cheat Sheet
output:
html_document:
toc: true
toc_float:
collapsed: false
smooth_scroll: false
---
```{r setup, include=F}
knitr::opts_chunk$set(
tidy=F,
eval=F
)
```
```{r packages-and-data, include=F}
gg <- import::from(ggplot2, .all=T, .into={new.env()})
import::from(nycflights13, df_flights = flights)
import::from(magrittr, "%>%")
```
## Visualization
### Scatterplots
AKA bivariate plots.
They allow you to visualize the *relationship* between two **continuous** variables.
```{r}
# import the data frame and name it “flights”
import::from(nycflights13, df_flights = flights)
# choose all rows related to Alaska Airlaine carrier
df_alaska_flights <- df_flights %>%
dplyr::filter(carrier == "AS")
# built a scatter plot
gg$ggplot(data = df_alaska_flights,
mapping = gg$aes(x = dep_delay, y = arr_delay)) +
gg$geom_point()
```
### Overplotting
When points are being plotted on top of each other over and over again
and it is difficult to know the number of points being plotted.
+ Method 1: Changing the transparency of the points
`alpha` argument in `geom_point`
Setting the `alpha` argument is set by default at 1 point – 100% opaque).
(usually, `alpha` argument,
By specifying a value of
we can change the transparency of the points (to less than 1).
```{r demo-alpha eval=T}
df_alaska_flights <- df_flights %>%
dplyr::filter(carrier == "AS")
gg$ggplot(data = df_alaska_flights,
mapping = gg$aes(x = dep_delay, y = arr_delay)) +
gg$geom_point(alpha = 0.2)
```
When you compile the Rmd
file using the “knit” button in RStudio,
the output would look like this:
Place your mouse cursor inside the box and try to scroll up and down to see the full sample.
Comments:
The author took advantage of some nice features of rmarkdown
and made their cheat sheet versatile.
By setting eval=F
in the first chunk (named “setup”),
all the following R
chunks would not be evaluated by default,
suiting the design of a cheatsheet.
However, whenever the author wants to show the output of a specific chunk,
perhaps to remind themselves what the code does,
they can turn on eval=T
inside the header of a chunk,
such as in the demo-alpha
chunk.
The author also structured the file with headings at different levels,
using ##
, ###
, etc.
E.3 Feeling inspired?
Keep in mind that you will be working on YOUR cheat sheet
and will refer to it from time to time
as you learn to become a seasoned R
user.
Customize it. Own it.
Put whatever content that is deemed useful to you.
Each week, you will hand in an updated version of the same file —
your cheat sheet, to fulfill the course requirement.
I expect that everyone’s cheat sheet will evolve over time.
Your cheat sheet by the end of the first week
may contain a line on how to examine the structure of a dataframe, for example;
and by the end of the semester,
you may know this line by heart and no longer need it on the cheat sheet.