R - Beginners Workshop
- Follow the structured outline to learn R basics and data
analysis
- Submit weekly scripts following the specified format
- Contact me via mail or Slack for any queries
- R Studio is a shiny environment that helps you write R code
- You can either write code directly in the console, or use script to
organize and save your code
- You can assign variables using
<-
- Be consistent in naming variables, other people should be able to
read and understand your code
- R packages are “add-ons” to R, they provide useful new tools.
- Install a package using
install.packages("packagename").
- Use a package using
library(packagename) at the
beginning of your script.
- Use
:: to access specific functions from a package
without loading it entirely.
- Scripts facilitate reproducible research
- create vectors using
c()
- Numeric variables are used for computations, character variables
often contain additional information
- You can index vectors by using
vector[index] to return
or exclude specific indices
- Use
which() to filter vectors based on specific
conditions
- The working directory is where R looks for and
saves files.
- Absolute paths give full file locations; relative paths use the
working directory.
-
R Projects manage the working directory
automatically, keeping work organized.
- Using R Projects makes code portable, reproducible, and
easier to share.
- A structured project with separate folders for data and scripts
improves workflow.
- Get an overview of the data by inspecting it or using
glimpse() / describe()
- Consult a codebook for more in-depth descriptions of the variables
in the data
- Visualize the distribution of a variable using
hist()
- Use
ggplot(data = data, mapping = aes()) to provide the
data and mapping to the plots
- Add visualization steps like
geom_point() or
geom_smooth() using +
- Get to know new data by inspecting it and computing key descriptive
statistics
- Visualize distributions of key variables in order to learn about
factors that impact them
- Visualize distribution of a numeric and a categorical variable using
geom_density()
- Visualize distribution of two categorial variables using
geom_bar()
- Use the pipe operator
%>% to link multiple functions
together
- Use
filter() to filter rows based on certain
conditions
- Use
select() to keep only those rows that interest
you
- Use
mutate() to create and modify columns in a
dataset.
- Assign constant values, compute values from other columns, or use
conditions to define new columns.
- Use
ifelse() for conditional column creation.
- Compute row-wise sums and means efficiently using
rowSums() and rowMeans().
- Use
count() to compute the number of occurrences for
(combinations) of columns
- Use
summarize() to compute any summary statistics for
your data
- Use
group_by() to group your data so you can receive
summaries for each group separately
- Combine functions like
filter(),
group_by() and summarize() using the pipe to
receive specific results
- Apply what you have learned in new data!