Visualization

Scatterplots

AKA bivariate plots. They allow you to visualize the relationship between two continuous variables.

# import the data frame and name it “flights”
import::from(nycflights13, df_flights = flights)

# choose all rows related to Alaska Airlaine carrier
df_alaska_flights <- df_flights %>% 
  dplyr::filter(carrier == "AS")

# built a scatter plot
gg$ggplot(data = df_alaska_flights, 
          mapping = gg$aes(x = dep_delay, y = arr_delay)) +
  gg$geom_point()

Overplotting

When points are being plotted on top of each other over and over again and it is difficult to know the number of points being plotted.

  • Method 1: Changing the transparency of the points

Setting the alpha argument in geom_point (usually, alpha argument is set by default at 1 point – 100% opaque). By specifying a value of alpha argument, we can change the transparency of the points (to less than 1).

df_alaska_flights <- df_flights %>% 
  dplyr::filter(carrier == "AS")

gg$ggplot(data = df_alaska_flights, 
          mapping = gg$aes(x = dep_delay, y = arr_delay)) +
  gg$geom_point(alpha = 0.2)
## Warning: Removed 5 rows containing missing values (geom_point).