02-05-2021

Data Wrangling In R Cheat Sheet

I reproduce some of the plots from Rstudio’s ggplot2 cheat sheet using Base R graphics. I didn’t try to pretty up these plots, but you should.

I use this dataset

The main functions that I generally use for plotting are

Plotting Functions
- plot: Makes scatterplots, line plots, among other plots.
- lines: Adds lines to an already-made plot.
- par: Change plotting options.
- hist: Makes a histogram.
- boxplot: Makes a boxplot.
- text: Adds text to an already-made plot.
- legend: Adds a legend to an already-made plot.
- mosaicplot: Makes a mosaic plot.
- barplot: Makes a bar plot.
- jitter: Adds a small value to data (so points don’t overlap on a plot).
- rug: Adds a rugplot to an already-made plot.
- polygon: Adds a shape to an already-made plot.
- points: Adds a scatterplot to an already-made plot.
- mtext: Adds text on the edges of an already-made plot.
Sometimes needed to transform data (or make new data) to make appropriate plots:
- table: Builds frequency and two-way tables.
- density: Calculates the density.
- loess: Calculates a smooth line.
- predict: Predicts new values based on a model.

Data Wrangling Cheat Sheet with Python and R There are numerous functions, dedicated to cleaning or merging data. Keeping track of all of them can be difficult even for experienced data analysts. A collection of readings on data wrangling. A collection of readings on data wrangling. Data Wrangling; Welcome. 10.1 Suffixes; 10.2 Examples. Data wrangling cheat sheet. R data wrangling data carpentry lesson. Workshop Description Data is rarely perfect out of the box. This workshop will cover how to. Reshaping Data - Change the layout of a data set Subset Observations (Rows) Subset Variables (Columns) F M A Each variable is saved in its own column F M A Each observation is saved in its own row In a tidy data set: & Tidy Data - A foundation for wrangling in R Tidy data complements R’s vectorized operations. R will automatically preserve.

All of the plotting functions have arguments that control the way the plot looks. You should read about these arguments. In particular, read carefully the help page ?plot.default. Useful ones are:

Data Wrangling In R Cheat Sheet Printable

main: This controls the title.
xlab, ylab: These control the x and y axis labels.
col: This will control the color of the lines/points/areas.
cex: This will control the size of points.
pch: The type of point (circle, dot, triangle, etc…)
lwd: Line width.
lty: Line type (solid, dashed, dotted, etc…).

Discrete

Barplot

Different type of bar plot

Continuous X, Continuous Y

Scatterplot

Jitter points to account for overlaying points.

Add a rug plot

Add a Loess Smoother

Loess smoother with upper and lower 95% confidence bands

Loess smoother with upper and lower 95% confidence bands and that fancy shading from ggplot2.

Add text to a plot

Discrete X, Discrete Y

Mosaic Plot

Color code a scatterplot by a categorical variable and add a legend.

par sets the graphics options, where mfrow is the parameter controling the facets.

The first line sets the new options and saves the old options in the list old_options. The last line reinstates the old options.

This R Markdown site was created with workflowr

10.1 Scoped verbs vs. purrr

It can be easy to get confused between purrr and scoped verbs. The following diagram illustrates which to use for different combinations of inputs and outputs. For example, use a scoped verb if you want to start and end with a tibble, but purrr if you want to start with a tibble and end up with a vector.

10.2 Suffixes

suffix	use when
_all	you want to apply the verb to all columns
_at	you want to apply the verb to specified columns
_if	you want to apply the verb to all the columns with some property

Tidyverse Cheat Sheet Pdf

10.3 Examples

10.3.1`mutate()`, `summarize()`, `select()`, and `rename()`

10.3.1.1 Named functions

Verb	Example	Example explanation
summarize_all	summarize_all(mean)	finds the mean of all variables
summarize_at	summarize_at(vars(x, y), mean)	finds the mean of variables x and y
summarize_if	summarize_if(is.double, mean)	finds the mean of all double variables
mutate_all	mutate_all(as.character)	converts all variables to characters
mutate_at	mutate_at(vars(x, y), as.character)	converts variables x and y to characters
mutate_if	mutate_if(is.factor, as.character)	converts all factor variables to characters
rename_all	rename_all(str_to_lower)	changes all column names to lowercase
rename_at	rename_at(vars(X, Y), str_to_lower)	changes the names of columns X and Y to x and y
rename_if	rename_if(is.double, str_to_lower)	changes the names of double columns to lowercase
select_all	select_all(str_to_lower)	selects all columns and changs their names to lowercase (better to use rename_all())
select_at	select_at(vars(X, Y), str_to_lower)	selects just columns X and Y and changes their names to x and y
select_if	select_if(is.double, str_to_lower)	selects just double columns and changes their names to lowercase

10.3.1.2 Extra arguments

Data Wrangling In R Cheat Sheet Pdf

verb	example	example_explanation
summarize_if	summarize_if(is.double, mean, na.rm = TRUE)	finds the mean, excluding NAs, of all double variables
summarize_all	summarize_all(mean, trim = 0.1, na.rm = TRUE)	finds the mean of all variables, exluding NAs. Removes the bottom and top 10% of values of each variable before computing mean

10.3.1.3 Anonymous functions

R Data Cleaning Cheat Sheet

verb	example	example_explanation
summarize_all	summarize_all(~ sum(is.na(.)))	determines the number of NAs in each column
select_if	select_if(~ n_distinct(.) > 1)	selects only the columns with more than one distinct value

Data Wrangling In R Cheat Sheet

10.3.2`filter()`

verb	example	example_explanation
filter_all	filter_all(all_vars(!is.na(.))	finds rows without any NAs
filter_all	filter_all(any_vars(!is.na(.))	finds rows with at least one non-NA value
filter_at	filter_at(vars(x, y), all_vars(!is.na(.))	finds rows where both x and y are non-NA
filter_at	filter_at(vars(x, y), any_vars(!is.na(.))	finds rows where at least one of x and y is non-NA
filter_if	filter_if(is.double, all_vars(!Is.na(.))	finds rows where all double variables are non-NA
filter_if	filter_if(is.double, any_vars(!Is.na(.))	finds rows where at least one double variable is non-NA

Data Wrangling In R Cheat Sheet Printable

Discrete

Continuous X, Continuous Y

Discrete X, Discrete Y

10.1 Scoped verbs vs. purrr

10.2 Suffixes

Tidyverse Cheat Sheet Pdf

10.3 Examples

10.3.1mutate(), summarize(), select(), and rename()

10.3.1.1 Named functions

10.3.1.2 Extra arguments

Data Wrangling In R Cheat Sheet Pdf

10.3.1.3 Anonymous functions

R Data Cleaning Cheat Sheet

Data Wrangling In R Cheat Sheet

10.3.2filter()

10.3.1`mutate()`, `summarize()`, `select()`, and `rename()`

10.3.2`filter()`