R - Beginners Workshop
- Follow the structured outline to learn R basics and data
analysis
- Submit weekly scripts following the specified format
- Contact me via mail or Slack for any queries
- R Studio is a shiny environment that helps you write R code
- You can either write code directly in the console, or use script to
organize and save your code
- You can assign variables using
<-
- Be consistent in naming variables, other people should be able to
read and understand your code
- R packages are “add-ons” to R, they provide useful new tools.
- Install a package using
install.packages("packagename")
.
- Use a package using
library(packagename)
at the
beginning of your script.
- Use
::
to access specific functions from a package
without loading it entirely.
- Scripts facilitate reproducible research
- create vectors using
c()
- Numeric variables are used for computations, character variables
often contain additional information
- You can index vectors by using
vector[index]
to return
or exclude specific indices
- Use
which()
to filter vectors based on specific
conditions
- The working directory is where R looks for and
saves files.
- Absolute paths give full file locations; relative paths use the
working directory.
-
R Projects manage the working directory
automatically, keeping work organized.
- Using R Projects makes code portable, reproducible, and
easier to share.
- A structured project with separate folders for data and scripts
improves workflow.
- Get an overview of the data by inspecting it or using
glimpse()
/ describe()
- Consult a codebook for more in-depth descriptions of the variables
in the data
- Visualize the distribution of a variable using
hist()
- Use
ggplot(data = data, mapping = aes())
to provide the
data and mapping to the plots
- Add visualization steps like
geom_point()
or
geom_smooth()
using +
- Get to know new data by inspecting it and computing key descriptive
statistics
- Visualize distributions of key variables in order to learn about
factors that impact them
- Visualize distribution of a numeric and a categorical variable using
geom_density()
- Visualize distribution of two categorial variables using
geom_bar()
- Use the pipe operator
%>%
to link multiple functions
together
- Use
filter()
to filter rows based on certain
conditions
- Use
select()
to keep only those rows that interest
you
- Use
mutate()
to create and modify columns in a
dataset.
- Assign constant values, compute values from other columns, or use
conditions to define new columns.
- Use
ifelse()
for conditional column creation.
- Compute row-wise sums and means efficiently using
rowSums()
and rowMeans()
.
- Use
count()
to compute the number of occurrences for
(combinations) of columns
- Use
summarize()
to compute any summary statistics for
your data
- Use
group_by()
to group your data so you can receive
summaries for each group separately
- Combine functions like
filter()
,
group_by()
and summarize()
using the pipe to
receive specific results
- Apply what you have learned in new data!