R and RStudio
R
is a free, statistics and data science software that
provides you a handy way of daytoday data handling such cleaning,
editing, analyse, visualise and communicate the outputs. You can
download R
from CRAN
, the comprehensive
R
archive network, https://cloud.rproject.org. It works for Windows, MAC
and Linux operating system. You just need to download the right version.
A new version of R
released every year with some more
updates over the year. This module is prepared based on the version
R 4.3.0
.
RStudio
is an integrated development environment (IDE)
for R
programming which is also available free of cost at
https://posit.co/download/rstudiodesktop/.
While we are
talking use R
we basically work in RStudio
and
R
is working in the background.
There are many advantages of using R
. Some of them are
below.

Reproducibility: In
R
once you write the code for data analysis, producing a fancy graph, write a function and save in a safe directory you can reproduce the same outputs even after years of time. Its the beauty of coding inR
. 
Help \(\&\) support: A comprehensive online help available. You can find example code for anything you want to do using
R
. You just need to copy and paste the right code for you. Some example websites where you can find the help are below. LES  https://letsenjoystatistics.com/.
 Quick
R
 https://www.statmethods.net/.  STHDA  http://www.sthda.com/english/.
 https://stackoverflow.com/.
 … many more.

Package ready on demand: Thousands of
R
packages are available to help done your task. 
Functionality: Beside doing statistics and data analysis,
R
can be used for many other purposes such as writing report, an article, thesis, books, preparing PowerPoint slides, creating web application, checking facebook status, sending email etc. So, its almost all in one package. The frequently used interfaces areR Markdown
andR Sweave
. Other programming languages such asPython
,C++
, andSQL
can be called and used inR
. So, it provides versatile opportunity. 
Example datasets:
R
has a huge number of example datasets those are built in with the packages. You can access anyone of them and practice. To get the list of built it datasets just typedata()
on theconsole
and hit theenter
button. Of course more datasets will come along with the package you install. There are some command that is very helpful to explore the built in datasets. For example,help(package="datasets")
will provide the documentation of the datasets,data(package="ggplot2")
give the list of the datasets built in with the packageggplot2
, anddata(package = .packages(all.available = TRUE))
gives the list of all datasets from all installed packages,dplyr::storms
will give access to the datasetstorms
built in with the packagedplyr
. 
Handling multiple datasets:
R
provides an opportunity of handling multiple datasets at the same time. So, you can load many data sets at the same time and work on multiple datasets together. 
Zero money: Most importantly all these can be done with a cost of a
ZERO
.
Therefore, it is worth investing some time to learn a program like
R
to be an efficient data and web handler.
Get started
To start using R
simply download the right version for
your machine and install it. Most cases, you need to install
R
first and then install RStudio
however in
some cases while downloading from the university software centre you may
need installing RStudio
only. Once installed, just double
click the RStudio
desktop icon to open it. You will see
pane layout (various windows) as below.
In the above figure we see two most important windows are marked (in
rectangle). The top one (source
) is the input window and
the below (console
) is both input and output window. The
red circled tabs Plots
and Help
are also
frequently used. The pane layout can be rearranged simply following
Tools > Global Options > Pane Layout
. I prefer
showing two panes only, source
in the left and
console
in the right. Maybe you do not see the
source
when open the RStudio
for the first
time. To get it, simply click the little green plus
sign on
the top left corner and select R Script
.
Typing in R
Typing in R
is easy and simple. You can either type in
console
or in source
however I recommend
typing most of the codes in the source
which you can save
and use anytime later. The codes written on the source is called the
R script
. You can save console
as well however
it is not so efficient because every time you save a
console
it occupies lots of space in your computer.
Coding in R
is object oriented. That means you can
consider the data, graphs, output of an analysis as an object and assign
a name to it. This is really handy because you can call the object by
its name anytime. Assign an object a name using an equal sign. Let’s see
a simple code example.
# Creating a vector of values 1 to 5 and assign a name "x"
x = c(1:5)
# Displaying the x
print(x)
## [1] 1 2 3 4 5
# Creating a vector of values from 1 to 2 with an increment of 0.2 with a name "y"
y = seq(1,2, by=0.2)
# Displaying "y"
print(y)
## [1] 1.0 1.2 1.4 1.6 1.8 2.0
# Combining x and y to create a matrix with 2 columns
dt = matrix(cbind(x,y), ncol=2)
# Naming the column names x and y
colnames(dt) = c('x', 'y')
# Display the matrix
print(dt)
## x y
## [1,] 1 1.0
## [2,] 2 1.2
## [3,] 3 1.4
## [4,] 4 1.6
## [5,] 5 1.8
## [6,] 1 2.0
To add notes or comments on the R
script use
#
sign.
Setting the working directory
Working directory is a folder in your computer where your data sets and stored in. Any results including new datasets and graphs are directly saved into the current working directory.
setwd("C:/Users/mmoinuddin/OneDrive  UCLan/LES")
The folder LES
is my working directory where I stored my
datasets and I can call them anytime I like. You can copy and paste the
code and change the path of your own. Once you set your working
directory you can check it using the following command line.
getwd()
## [1] "C:/Users/mmoinuddin/OneDrive  UCLan/LES"
You can get the list of files stored in the current working directory
by simply typing the dir()
command in the
console
and hitting the enter
button on your
keyboard.
dir()
## [1] "data spread _AE.docx" "data_types.png"
## [3] "diet article _AE.docx" "evidence_pyramid.png"
## [5] "FHS_1.png" "FHS_2.png"
## [7] "google_cough.png" "google_flu1.jpeg"
## [9] "google_flu2.jpeg" "google_flu21.jpeg.png"
## [11] "Hypothesis T1 _AE.rtf" "john_map2.jpg"
## [13] "john_snow_map_tubeW.png" "John_Snow_Pub_1.jpg"
## [15] "john_snow_waterpomp.jpg" "LESwithR.html"
## [17] "LESwithR.Rmd" "LES home.png"
## [19] "LES home2.png" "LES home3.png"
## [21] "LES logo with Text.png" "LES logo.png"
## [23] "LES with R.Rmd" "LES_home_HD.png"
## [25] "LES_logo.pptx" "les_logo_modified.png"
## [27] "Ox_evidence_level.png" "Pump_Handle__John_Snow_.jpg"
## [29] "research_process.png" "rstudio_consoles.png"
## [31] "Snowcholeramap1.jpg" "SPSS data for teaching"
## [33] "study_designs.png"
Install packages
An R
package consists of a bundle of functions to be
used for a specific task. For example, function for calculating average
(mean
), to get the number of rows and columns
(dim
) are under package base
, for calculating
variance (var
), conducting statistical ttest
(t.test
), Chisquare test (chisq.test
) are
under the package stats
. Packages for basic data handling
and analysis are mostly already installed which are called
R core
packages. However, for a specific task you need some
packages to install. Installing a packages is very simple. To install a
package use code install.packages("package name")
. For
example, to install the package ggplot2
for fancy plots the
command is below.
# install.packages("ggplot2", repos = "https://cloud.rproject.org")
Note that I have added some extra information
repos = "https://cloud.rproject.org"
inside the brackets.
This is because I have written this document using
R Markdown
feature. You can install multiple packages using
single command.
Loading packages
Once a package is installed in your machine you will not need to
install it anymore. You just need to load the package once whenever you
open shutdown and open RStudio
. A package can be loaded
using the code library()
. I have loaded most of my required
packages below.
library(ggplot2)
library(tidyverse)
library(foreign)
library(psych)
library(readxl)
library(MASS)
Any function of a package can be used even without loading it however
you need to write a bit of extra code. For example,
ggplot()
is the main function for any plot in the
ggplot2
package can be used without loading the package
using the code ggplot2::ggplot()
.
The following packages would be useful to install after you first
time install R
and RStudio
.
Packages  Packages  Packages  Packages 

MASS  epiDisplay  maps  ggpmisc 
readxl  foreign  mice  margins 
tidyverse  summarytools  gtools  ggeffects 
psych  ggplot2  dplyr  lubridate 
Loading the data into R
To start exploring, you need a dataset to load into R
first. The command for loading a dataset depends on the format of the
dataset. Frequently used commands for loading a dataset are
load()
and readRDS()
for an R
data file, read.csv()
for a .csv
file,
read_xlsx()
and read_excel()
for
Excel
file, read.spss()
for SPSS
file and read.dta()
for a STATA
file etc. That
means any type of file can be loaded into R
. These are the
common file type. There are some other file types as well which is less
common however exists. R
has option for any kind of data to
read.
Throughout this course we will be using two data sets,
Birthweight_reduced
and Diet
datasets. In my
working directory these data files are stored in different formats such
as SPSS
, CSV
and Excel
. Let’s
start with the Birthweight_reduced
dataset. I will load all
three versions.
setwd("C:/Users/mmoinuddin/OneDrive  UCLan/LES")
df_csv < read.csv("SPSS data for teaching/Birthweight_reduced_kg_R.csv")
df_spss < read.spss("SPSS data for teaching/Birthweight_reduced_kg_SPSS.sav", to.data.frame = TRUE)
df_excel < read_xlsx("SPSS data for teaching/Birthweight_reduced_kg_R.xlsx")
While reading data from Excel
file you can specify the
sheet
name. Also, you can specify the number of rows you
want to load.
Understanding the dataset
To know a little bit about the dataset three useful commands are
dim()
 to know how many variables and rows are there,
names()
 to know the variable names, head()

to see a number of row in the dataset. Let’s see what these commands
give us.
dim(df_csv)
## [1] 42 16
names(df_csv)
## [1] "ID" "Length" "Birthweight" "Headcirc" "Gestation"
## [6] "smoker" "mage" "mnocig" "mheight" "mppwt"
## [11] "fage" "fedyrs" "fnocig" "fheight" "lowbwt"
## [16] "mage35"
head(df_csv)
## ID Length Birthweight Headcirc Gestation smoker mage mnocig mheight mppwt
## 1 1360 56 4.55 34 44 0 20 0 162 57
## 2 1016 53 4.32 36 40 0 19 0 171 62
## 3 462 58 4.10 39 41 0 35 0 172 58
## 4 1187 53 4.07 38 44 0 20 0 174 68
## 5 553 54 3.94 37 42 0 24 0 175 66
## 6 1636 51 3.93 38 38 0 29 0 165 61
## fage fedyrs fnocig fheight lowbwt mage35
## 1 23 10 35 179 0 0
## 2 19 12 0 183 0 0
## 3 31 16 25 185 0 1
## 4 26 14 25 189 0 0
## 5 30 12 0 184 0 0
## 6 31 16 0 180 0 0
Data formats in R
There are multiple types of datasets that we usually use in
R
. In the R language we this is usually called an
object
. An object can be a list
,
matrix
, tibble
, data.frame
etc. A
list can contain several other types of objects. There are different
ways of accessing a specific element of the datasets. As you go alonge
with using it you will be an expert.
Frequently used commands
I believe it will be very useful to get a list of commands usually needed to do data manipulation, management and analysis in Biostatistics setting. The table below contains the list of commands with their sources package name.
Command  Use 

Data management  
help() 
Obtain documentation for a given R command 
example() 
View some examples on the use of a command 
c() 
Enter data manually to a vector in R 
seq() 
Make arithmetic progression vector 
rep() 
Make vector of repeated values 
data() 
Load a (as a data.frame) builtin dataset 
View() 
View dataset in a spreadsheettype format 
load() 
Load an existing .Rdata file 
readRDS() 
Load an existing .RDS file 
read.csv() 
Load an existing CSV file 
read.spss() 
Load an existing SPSS file 
read_xls() 
Load an existing Excel (.xls) file 
read_xlsx() 
Load an existing Excel (.xlsx) file 
read_excel() 
Load an existing Excel file 
write.csv() 
Saving an working data file as a CSV file 
install.packages() 
Install new packages 
library() 
Load an R package already installed 
require() 
Load an R package already installed 
dim() 
See number of rows/cols of data.frame 
names() 
See column/variable names of data.frame 
length() 
Give length of a vector 
ls() 
Lists memory contents 
rm() 
Removes an item from memory 
as.numeric() 
Convert string to numeric 
as.data.frame() 
Convert a matrix into a data frame 
factor() 
Create/replace/label a factor variable 
ordered() 
Create/replace/label an ordered variable 
mutate() 
Create new variable from existing one 
if_else() 
Create new variable using condition 
Statistics  
table() 
Get a frequency table for a variable 
addmargins() 
Add marginal sums to an existing table 
prop.table() 
Compute proportions from a contingency table 
summary() 
Get summary statistics for a variable 
describe() 
Get specific summary statistics 
describeBy() 
Get specific summary statistics by groups 
xtabs() 
Crosstabulation tables using formulas 
mean() 
Calculate mean for a variable 
median() 
Calculate median for a variable 
var() 
Calculate variance of values in vector 
sd() 
Calculate sd of values in vector 
sum() 
Add up all values in a vector 
sample() 
Take a sample from a vector of data 
cor() 
Calculate correlation between two variables 
prop.test() 
Inference for 1 proportion using normal approx 
t.test() 
Carries out a student ttest 
chisq.test() 
Carries out a chisquare test 
fisher.test() 
Carries out a Fisher exact test 
aov() 
Perform ANOVA using formula 
wilcox.test() 
MannWhitney U test for independent samples 
wilcox.test() 
Wilcoxon Signed Rank test for paired samples 
kruskal.test() 
Kruskal Wallis test 
lm() 
Linear regression model 
glm() 
Generalized linear model (linear/logistic/Poisson reg) 
Visualise  
hist() 
Create a histogram 
boxplot() 
Create a boxplot 
plot() 
Create a scatter plot (many more . . . ) 
abline(a,b) 
Add a line on a scatter plot 
pairs() 
Scatter plot matrix 
pdf() 
Save graph as pdf file. 
jpeg() 
Save graph as png or jpeg 
dev.off() 
Necessary after pdf() and jpeg() commands 
ggplot() 
Generate fancy graphs of many types 
This list is not an exclusive lists rather its the beginning. To get
help for any command, use help()
or ?
. For
example, to get help for the command dim
, type
help(dim)
or ?dim
and hit enter
button. Go to the bottom of the help page to see multiple examples.
Alternatively Google
it. Let us see some practical in the
classroom.
Leave A Comment