Recap on what we have covered.

Session 1 covered introduction to R data types, inputing data, plotting and statistics.

Recap (1/3)

R stores data in five main data types.

Recap.(2/3)

Data can be read into R as a table with the read.table() function and written to file with the write.table() function.

Table <- read.table("data/readThisTable.csv",sep=",",header=T,row.names=1) 
Table[1:3,] 
       Sample_1.hi Sample_2.hi Sample_3.hi Sample_4.low Sample_5.low 
Gene_a    4.570237    3.230467    3.351827     3.930877     4.098247 
Gene_b    3.561733    3.632285    3.587523     4.185287     1.380976 
Gene_c    3.797274    2.874462    4.016916     4.175772     1.988263 
       Sample_1.low 
Gene_a     4.418726 
Gene_b     5.936990 
Gene_c     3.780917 
write.table(Table,file="data/writeThisTable.csv", sep=",", row.names =F,col.names=T) 

Recap.(3/3)

R has a rich set of statistical functions.

1- pnorm(8,mean=8,sd=3) 
[1] 0.5 
tTestExample <- read.table("data/tTestData.csv",sep=",",header=T) 
Result <- t.test(tTestExample$A,tTestExample$B,alternative ="two.sided", var.equal = T) 
Result 
 
    Two Sample t-test 
 
data:  tTestExample$A and tTestExample$B 
t = -41.3528, df = 18, p-value < 2.2e-16 
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval: 
 -14.60253 -13.19051 
sample estimates: 
mean of x mean of y  
 26.50152  40.39804  

Conditions and Loops

Conditions and Loops (1/21)

We have looked at using logical vectors as a way to index other data types

x <- 1:10 
x[x < 4] 
[1] 1 2 3 

Logicals are also used in controlling how scripted procedures execute.

Conditions and Loops (2/21) - Two important control structures

  • Conditional branching (if,else)
  • Loops (for, while)

While I’m analysing data, if I need to execute complex statistical procedures on the data I will use R else I will use a calculator.

Conditions and Loops (3/21) - Conditional Branching.

Conditional Branching is the evaluation of a logical to determine whether a chunk of code is executed.

In R, we use the if statement with the logical to be evaluated in () and dependent code to be executed in {}.

x <- TRUE 
if(x){ 
  message("x is true") 
} 
x is true 
x <- FALSE 
if(x){ 
  message("x is true") 
} 

Conditions and Loops (4/21) - Evaluating in if() statements

More often, we construct the logical value within () itself.This can be termed the condition.

x <- 10 
y <- 4 
if(x > y){ 
  message("The value of x is ",x," is greater than ", y) 
} 
The value of x is 10 is greater than 4 

Here the message is printed because x is greater than y.

y <- 20 
if(x > y){ 
  message("The value of x is ",x," is greater than ", y) 
} 

Here, x is not longer greater than y, so no message is printed.

We really still want a message telling us what was the result of the condition.

Conditions and Loops (5/21) -else following an if().

If we want to perform an operation when the condition is false we can follow the if() statement with an else statement.

x < - 10 
[1] FALSE 
if(x < 5){ 
  message(x, " is less than to 5") 
   }else{ 
     message(x," is greater than or equal to 5") 
} 
10 is greater than or equal to 5 

With the addition of the else statement, when x is not greater than 5 the code following the else statement is executed.

x <- 3 
if(x < 5){ 
  message(x, " is less than 5") 
   }else{ 
     message(x," is greater than or equal to 5") 
} 
3 is less than 5 

Conditions and Loops (6/21) - else if

We may wish to execute different procedures under multiple conditions. This can be controlled in R using the else if() following an initial if() statement.

x <- 5 
if(x > 5){ 
  message(x," is greater than 5") 
  }else if(x == 5){ 
    message(x," is 5") 
  }else{ 
    message(x, " is less than 5") 
  } 
5 is 5 

Conditions and Loops (7/21) -ifelse()

A useful function to evaluate conditional statements over vectors is the ifelse() function.

x <- 1:10 
message(x) 

The ifelse() functions take the arguments of the condition to evaluate, the action if the condition is true and the action when condition is false.

ifelse(x <= 3,"lessOrEqual","more")  
 [1] "lessOrEqual" "lessOrEqual" "lessOrEqual" "more"        "more"        
 [6] "more"        "more"        "more"        "more"        "more"        

This allows for multiple nested “else if” statements to be applied to vectors.

ifelse(x == 3,"same", 
       ifelse(x < 3,"less","more") 
       )  
 [1] "less" "less" "same" "more" "more" "more" "more" "more" "more" "more" 

Conditions and Loops (8/21) -Loops

The two main generic methods of looping in R are while and for

  • while - while loops repeat the execution of code while a condition evaluates as true.

  • for - for loops repeat the execution of code for a range of specified values.

Conditions and Loops (9/21) -While loops

While loops are most useful if you know the condition will be satisified but are not sure when. (i.e. Looking for a point when a number first occurs in a list).

x <- 1 
while(x != 3){ 
  message("x is ",x," ") 
  x <- x+1 
} 
x is 1  
x is 2  
message("Finally x is 3") 
Finally x is 3 

Conditions and Loops (10/21) -For loops

For loops allow the user to cycle through a range of values applying an operation for every value.

Here we cycle through a numeric vector and print out its value.

x <- 1:5 
for(i in x){ 
  message("Loop",i," ", appendLF = F) 
} 
Loop1 Loop2 Loop3 Loop4 Loop5 

Similarly we can cycle through other vector types (or lists)

x <- toupper(letters[1:5]) 
for(i in x){ 
  message("Loop",i," ", appendLF = F) 
} 
LoopA LoopB LoopC LoopD LoopE 

Conditions and Loops (11/21) - Looping through indices

We may wish to keep track of the position in x we are evaluating to retrieve the same index in other variables. A common practice is to loop though all possible index positions of x using the expression 1:length(x).

geneName <- c("Ikzf1","Myc","Igll1") 
expression <- c(10.4,4.3,6.5) 
1:length(geneName) 
[1] 1 2 3 
for(i in 1:length(geneName)){ 
  message(geneName[i]," has an RPKM of ",expression[i]) 
} 
Ikzf1 has an RPKM of 10.4 
Myc has an RPKM of 4.3 
Igll1 has an RPKM of 6.5 

Conditions and Loops (12/21) -Loops and conditionals

Left:60% Loops can be combined with conditional statements to allow for complex control of their execution over R objects.

x <- 1:13 
 
for(i in 1:13){ 
  if(i > 10){ 
    message("Number ",i," is greater than 10") 
  }else if(i == 10){ 
    message("Number ",i," is  10")  
  }else{ 
    message("Number ",i," is less than  10")  
  } 
} 

Number 1 is less than  10 
Number 2 is less than  10 
Number 3 is less than  10 
Number 4 is less than  10 
Number 5 is less than  10 
Number 6 is less than  10 
Number 7 is less than  10 
Number 8 is less than  10 
Number 9 is less than  10 
Number 10 is  10 
Number 11 is greater than 10 
Number 12 is greater than 10 
Number 13 is greater than 10 

Conditions and Loops (13/21) - Breaking loops

We can use conditionals to exit a loop if a condition is satisfied, just a like while loop.

x <- 1:13 
 
for(i in 1:13){ 
  if(i < 10){ 
    message("Number ",i," is less than 10") 
  }else if(i == 10){ 
    message("Number ",i," is  10") 
    break 
  }else{ 
    message("Number ",i," is greater than  10")  
  } 
} 

Number 1 is less than 10 
Number 2 is less than 10 
Number 3 is less than 10 
Number 4 is less than 10 
Number 5 is less than 10 
Number 6 is less than 10 
Number 7 is less than 10 
Number 8 is less than 10 
Number 9 is less than 10 
Number 10 is  10 

Conditions and Loops (14/21) -Functions to loop over data types

There are functions which allow you to loop over a data type and apply a function to the subsection of that data.

  • apply - Apply function to rows or columns of a matrix/data frame and return results as a vector,matrix or list.

  • lapply - Apply function to every element of a vector or list and return results as a list.

  • sapply - Apply function to every element of a vector or list and return results as a vector,matrix or list.

Conditions and Loops (15/21) - apply()

The apply() function applys a function to the rows or columns of a matrix. The arguments FUN specifies the function to apply and MARGIN whether to apply the functions by rows/columns or both.

apply(DATA,MARGIN,FUN,...) 
  • DATA - A matrix (or something to be coerced into a matrix)
  • MARGIN - 1 for rows, 2 for columns, c(1,2) for cells

Conditions and Loops (16/21) - apply() example

matExample <- matrix(c(1:4),nrow=2,ncol=2,byrow=T) 
matExample 
     [,1] [,2] 
[1,]    1    2 
[2,]    3    4 

Get the mean of rows

apply(matExample,1,mean) 
[1] 1.5 3.5 

Get the mean of columns

apply(matExample,2,mean) 
[1] 2 3 

Conditions and Loops (16/21) - Additional arguments to apply

Additional arguments to be used by the function in the apply loop can be specified after the function argument.

Arguments may be ordered as if passed to function directly. For paste() function however this isn’t possible.

apply(matExample,1,paste,collapse=";") 
[1] "1;2" "3;4" 

Conditions and Loops (17/21) - lapply()

Similar to apply, lapply applies a function to every element of a vector or list.

lapply returns a list object containing the results of evaluating the function.

lapply(c(1,2),mean) 
[[1]] 
[1] 1 
 
[[2]] 
[1] 2 

As with apply() additional arguments can be supplied after the function name argument.

lapply(list(1,NA,2),mean,na.rm=T) 
[[1]] 
[1] 1 
 
[[2]] 
[1] NaN 
 
[[3]] 
[1] 2 

Conditions and Loops (18/21) -sapply()

sapply (smart apply) acts as lapply but attempts to return the results as the most appropriate data type.

Here sapply returns a vector where lapply would return lists.

exampleVector <- c(1,2,3,4,5) 
exampleList <- list(1,2,3,4,5) 
sapply(exampleVector,mean,na.rm=T) 
[1] 1 2 3 4 5 
sapply(exampleList,mean,na.rm=T) 
[1] 1 2 3 4 5 

Conditions and Loops (19/21) - sapply() example

In this example lapply returns a list of vectors from the quantile function.

exampleList <- list(row1=1:5, row2=6:10, row3=11:15) 
exampleList 
$row1 
[1] 1 2 3 4 5 
 
$row2 
[1]  6  7  8  9 10 
 
$row3 
[1] 11 12 13 14 15 

lapply(exampleList,quantile) 
$row1 
  0%  25%  50%  75% 100%  
   1    2    3    4    5  
 
$row2 
  0%  25%  50%  75% 100%  
   6    7    8    9   10  
 
$row3 
  0%  25%  50%  75% 100%  
  11   12   13   14   15  

Conditions and Loops (20/21) - sapply() example 2

Here is an example of sapply parsing a result from the quantile function in a smart way.

When a function always returns a vector of the same length, sapply will create a matrix with elements by column.

sapply(exampleList,quantile) 
     row1 row2 row3 
0%      1    6   11 
25%     2    7   12 
50%     3    8   13 
75%     4    9   14 
100%    5   10   15 

Conditions and Loops (21/21) - sapply() example 4

When sapply cannot parse the result to a vector or matrix, a list will be returned.

exampleList <- list(df=data.frame(sample=paste0("patient",1:2), data=c(1,12)), vec=c(1,3,4,5)) 
sapply(exampleList,summary) 
$df 
      sample       data       
 patient1:1   Min.   : 1.00   
 patient2:1   1st Qu.: 3.75   
              Median : 6.50   
              Mean   : 6.50   
              3rd Qu.: 9.25   
              Max.   :12.00   
 
$vec 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.  
   1.00    2.50    3.50    3.25    4.25    5.00  

Time for an exercise!

Exercise on loops and conditional branching can be found here

Answers to exercise.

Answers can be found here here

Functions

Functions (1/) - Built in functions

As we have seen, a function is command which requires one or more arguments and returns a single R object.

This allows for the user to perform complex calculations and prodecures with one simple operation.

x=rnorm(100,70,10) 
y <- jitter(x,amount=1)+20 
mean(x) 
[1] 70.40512 
lmExample <- data.frame(X=x,Y=y) 
lmResult <- lm(Y~X,data=lmExample) 

plot(Y~X,data=lmExample,main="Line of best fit with lm()", 
     xlim=c(0,150),ylim=c(0,150)) 
abline(lmResult,col="red",lty=3,lwd=3) 

plot of chunk unnamed-chunk-35

Functions (2/) - Functions can be defined in R

Although we have access to many built functions in R, there will be many complex tasks we wish to perform regularly which are particular to our own work and for which no suitable function exists.

For these tasks we can construct our own functions with function()

Function_Name <- function(Arguments){ 
      Result <- Arguments 
  return(Result) 
} 

Functions (3/) - Defining your own functions

To define a function with function() we need to decide
- the argument names within () - the expression to be evaluated within {}
- the variable to which the function will be assigned with <-. - the output from the function using return()

Function_name <- function(Argument1,Argument2){ Expression}

myFirstFunction <- function(myArgument1,myArgument2){ 
  myResult <- (myArgument1*myArgument2) 
  return(myResult) 
} 
myFirstFunction(4,5) 
[1] 20 

Functions (4/) - Default arguments

In functions, a default value for an argument may be used. This allows the function to provide a value for an argument when the user does not specify one.

Default arguments can be specified by assigning a value to the argument with = operator

mySecondFunction <- function(myArgument1,myArgument2=10){ 
  myResult <- (myArgument1*myArgument2) 
  return(myResult) 
} 
mySecondFunction(4,5) 
[1] 20 
mySecondFunction(4) 
[1] 40 

Functions (5/) -Missing Arguments

In some cases a function may wish to deal with missing arguments in a different way to setting a generic default for the argument. The missing() function can be used to evaluate whether an argument has been defined

mySecondFunction <- function(myArgument1,myArgument2){ 
  if(missing(myArgument2)){ 
    message("Value for myArgument2 not provided so will square myArgument1") 
    myResult <- myArgument1*myArgument1 
  }else{ 
    myResult <- (myArgument1*myArgument2)    
  } 
  return(myResult) 
} 
mySecondFunction(4) 
Value for myArgument2 not provided so will square myArgument1 
[1] 16 

Functions (6/) -Returning objects from functions

We have seen that a function returns the value within the return() function. If no return is specified, the result of last line evaluated in the function is returned.

myforthFunction <- function(myArgument1,myArgument2=10){ 
  myResult <- (myArgument1*myArgument2) 
  return(myResult) 
  print("I returned the result") 
} 
myfifthFunction <- function(myArgument1,myArgument2=10){ 
(myArgument1*myArgument2) 
} 
 
myforthFunction(4,5) 
[1] 20 
myfifthFunction(4,5) 
[1] 20 

Note that the print() statment after the return() is not evaluated in myforthFuntion.

Functions (7/) - Returning lists from functions

The return() function can only return one R object at a time. To return multiple data objects from one function call, a list can be used to contain other data objects.

mySixthFunction <- function(arg1,arg2){ 
  result1 <- arg1*arg2 
  result2 <- date() 
  return(list(Calculation=result1,DateRun=result2)) 
} 
result <- mySixthFunction(10,10) 
result 
$Calculation 
[1] 100 
 
$DateRun 
[1] "Tue Feb  3 11:04:13 2015" 

Functions (8/) -Scope

When arguments or variables are created within a function, they only exist within that function and disappear once the function is complete.

mySeventhFunction <- function(arg1,arg2){ 
  internalValue <- arg1*arg2 
  return(internalValue) 
} 
result <- mySeventhFunction(10,10) 
internalValue 
Error in eval(expr, envir, enclos): object 'internalValue' not found 
arg1 
Error in eval(expr, envir, enclos): object 'arg1' not found 

Time for an exercise!

Exercise on functions can be found here

Answers to exercise.

Answers can be found here here

Scripts

Saving scripts

Once we have got our functions together and know how we want to analyse our data, we can save our analysis as a script. By convention R scripts typically end in .r or .R

To save a file in RStudio.

-> File -> Save as

To open a previous R script

->File -> Open File..

To save all the objects (workspace) with extension .RData

->Session -> Save workspace as

Sourcing scripts.

R scripts allow us to save and reuse custom functions we have written. To run the code from an R script we can use the source() function with the name of the R script as the argument.

The file dayOfWeek.r in the “scripts” directory contains a simple R script to tell you what day it is after your marathon R coding session.

#Contents of dayOfWeek.r 
dayOfWeek <- function(){ 
  return(gsub(" .*","",date()))   
} 
source("scripts/dayOfWeek.R") 
dayOfWeek() 
[1] "Tue" 

Rscript

R scripts can be run non-interactively from the command line with the Rscript command, usually with the option –vanilla to avoid saving or restoring workspaces. All messages/warnings/errors will be output to the console.

Rscript --vanilla myscript.r 

An alternative to Rscript is R CMD BATCH. Here all messages/warnings/errors are directed to a file and the processing time appended.

R CMD BATCH myscript.r 

Sending arguments to Rscript

To provide arguments to an R script at the command line we must add commandArgs() function to parse command line arguments.

args <- commandArgs(TRUE) 
myFirstArgument <- args[1] 
myFirstArgument 
as.numeric(myFirstArgument 
'10' 
as.numeric(myFirstArgument) 
10 

Since vectors can only be one type, all command line arguments are strings and must be converted to numeric if needed with as.numeric()

Loading libraries

Libraries can be loaded using the library() function with an argument of the name of the library

library(ggplot2) 

You can see what libraries are available in the Packages panel or by the library() function with no arguments supplied

library() 

Installing libraries

Libraries can be installed through the R studio menu

-> Tools -> Install packages ..

Or by using the install.packages() command

install.packages("Hmisc") 

Getting help

The end

Two tips

Use vectorisation Keep 2D numeric data in matrices