Basic concepts Flashcards
When was R created and which language inspired it?
R was created in 1996 and was inspired by S Language.
What’s the main purpose of R?
It’s an statistical environment for data analysis and graphs creation.
What are the 3 main characteristics of R?
- Free and open source
- Intepreted language (instead of compiled)
- Object oriented (everything is an object in R)
What’s the current version of R? (as of Feb 2021)
4.0.4
What are the 4 possible IDEs mentioned by Dr. Fernando in the class?
- R Studio Desktop
- R Studio Cloud (create an account first)
- Google Collab
- Emacs + ESS (highly recommended)
What is the Working Directory? When do I need to set it? How to set it?
It’s the folder to which R will be redirected. All imported and exported files will be in this directory. It’s very important to set it before I start working.
I can set it with the function setwd(“/home/paulojardim/pasta”)
I can get it with the function getwd( )
What function do I use to list all the objects created in an environement?
ls( )
It returns a vector of character strings giving the name of the objects.
What is R Workspace?
It’s the place in memory where the variables (objects) are saved. It’s all that was created during a sessios, saved in RAM memory. We can save it in a .Rdata file if they were produced after a long calculation. But the ideal is saving the code itself.
What function should I use to generate random numbers of a uniform distribution? What are its arguments and which are mandatory? What’s the default for the not mandatory?
runif (n, min = 0, max = 1)
n = number of observations we want to return. It’s the mandatory argument
min and max are the limits, they are not mandatory and assume 0 and 1 if not provided.
e.g. runif(5) returns 5 random numbers between 0 and 1
Do I need to name the arguments when I call a funtion in R? What about order?
No, I don’t neeed to name them. But if I don’t name them I need to respect the order. If I name them I can use in any order.
How can I easily see the arguments of a function?
I can call args( ) function.
e.g. args( sample ) returns:
function (x, size, replace = FALSE, prob = NULL)
How do I know what are the mandatory arguments of a function?
When I see the args of a function, the ones that don’t have a default value are mandatory:
e.g. in sample() function below, x and size are mandatory
function (x, size, replace = FALSE, prob = NULL)
What is the techincal name of “…” and when should I use it in my function?
It’s called ellipsis and I use basically in two situations:
- When it makes sense for a function receiving an undefined number of arguments (e.g. print function). Then I can transform the arguments in a list:
- arguments = list(…)*
2. When I need to receive arguments to pass to a generic function.
What function should I use to concatenate strings?
paste(“string”, “string2”, “string3” , …… )
How can I easily see the documentation of a function?
I can use ?function or help(function).
They both return the same thing
Within the documentation, what’s the session that tells me about what the function returns?
The Value session.
What function returns a list of functions/objects containing a expression?
apropos(“mod”)
What function returns a list of functions containing a word in any part of their documentation?
help.search(“geo”)
How can I see what are the loaded/attached packages at the moment and what are their code paths in my computer?
- search()* - lists the loaded/attached packages
- searchpaths()* - lists their paths in my computer
What is the most basic way of getting access to R official documentation?
- Run R from the terminal by executing “R” command
- In R prompt, execute help.start( ). It will launch a local webserver and open the html manuals and documentation.
Introduction to R and The R Language Definition are the main ones.
Worthy reading!
What are the 7 packages that are loaded/attached automatically when we run R?
- base
- utils
- stats
- graphics
- grDevices
- datasets
- methods
How do I load an installed package in R?
I need to run function library() providing the name of the package.
How do I install a new package in R?
Run function install.packages(“package_name”)
How do I verify if the installed packages need updates?
Execute function packageStatus()
How do I update all installed packages automatically?
Run function update.packages(ask = FALSE)
How do I create a simple function in R?
helloWorld = function( ) {
writeLines(“Hello”)
}
it’s preferable using the arrow instead of equal sign but Brainscape bugs with arrow
How can I delete one object of my workspace? And how can I delete all objects?
rm(x)
rm(list = ls( ) )
What’s the meaning of “everything is a vector in R”? Is it bad? What about simple number like 15?
R doesn’t have primitive data types in the way that other languages do. In R even the simplest numeric value is an example of a vector.
This might seems like a crazy idea and potentially inefficient, but it fits in well with the sort of calculations you want to do in R.
A number occurring by itself in an expression (e.g. 15) is taken as a vector of length one.
Why R console prints [1] 5 when I type 5?
Because 5 is also a vector (everything in R is). It’s a vector of length one. The [1] means that the console is printing the first element of the vector. When I print a big vector, each row has the number corresponding to the index of the first vector element in that row.
What are the two types of vectors and their subtypes? What’s the main difference between those two types?
Atomic Vectors and Lists.
Atomic vectors have six types:
- double
- integer
- character
- logical
- complex
- raw
Lists are also a vector but they can contain more than one datatype. That’s their main difference.
What’s the difference between type and class?
Complex data structures are created based on atomic vectors. When they are created we have a class. There are thousands of classes. One object can be of any of these clasess but their type will always be one of the six vector types (or a list)
How do I check the type of an object? And its class?
I use typeof( ) function to check the type.
And I use class( ) function to check the class?
Whats the type of x? And its class?
x = c(2, 4, 6)
type: double
class: numeric
What’s the type of x? And its class?
x = c(2L, 4L, 6L)
Type: integer
Class: integer
What’s the type of x? And its class?
x = c(“a”, “b”, “c”)
Type: character
Class: character
What’s the type of x? And its class?
x = c(TRUE, FALSE, TRUE)
Type: logical
Class: logical
Whats the type of x? And its class?
x = c(2 + 1i, 4 + 1i, 6 + 1i)
Type: complex
Class: complex
What’s the type of x? And its class?
x = raw(3)
Type: raw
Class: raw
What function do I use to create new vectors?
c( )
Concatenate function.
Does R understand a number like 5 as an integer? What do I need to do to accomplish it? Is there a difference in termos of memory usage?
No, it understands and stores as a double. If I want it to take the number as integer I need to use the sufix L:
5L
Yes, there is a difference in terms of memory usage because double numbers require more space.
What function to I use to see an estimate of the space in memory that is being used to store an R object? What package do I need to load?
function object.size(myobject)
I don’t need to load any package because it’s in utils package which is pre loaded.
What function do I use to generate a sequence? What are its arguments?
seq ( )
seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)),
length.out = NULL, along.with = NULL, …)
- length.out - is the total number of elements I want
- along.with* - take the length from the length of this argument.
What’s the difference between:
rep(1:4, 2)
and rep(1:4, each = 2)
- rep(1:4, 2) returns “1 2 3 4 1 2 3 4”
- rep(1:4, each = 2) returns “1 1 2 2 3 3 4 4”
Can I do math operations between a vector and a number? What’s the result of c(3,4,5) * 2 ?
Yes! Remember that numbers ARE vectors of lenght 1. The result is “6 8 10”
We can do math operations with vectors of the same legth or of multiple length.
What’s the recycling rule and how does it work?
It’s the way R behaves when you do arithmetic operations with two vectors of different sizes.
The shortest vector is concatenated to itself till it’s length is the same as the longer vector. Then R does the operation.
It only works if the longer object legth is multiple of the shorter object length.
e.g. c(1, 2, 3) * c(4, 5, 6, 7, 8, 9) is actually:
c(1, 2, 3, 1, 2, 3) * c(4, 5, 6, 7, 8, 9)
What is the type of z? What are the values of it?
num = c(2, 4, 5, 6)
z = num > 4
z is a logical vector: FALSE FALSE TRUE TRUE
What operator do I use if I want to know if number 3 is part of this vector?
num = c(2, 4, 5, 6)
How does the code look like?
Operator %in% :
3 %in% num
What happens if I create a vector with elements of different types? Why that happens?
Elements are coerced to an unique type that can represent all the elements.
This is called implicit coercion.
It happens because a vector can only contain elements of the same type.
What’s the difference between implicit coercion and explicit coercion?
Implicit coercion is performed by R. It happens when I provide types different from what R was expecting, for example.
Explicit coerction is requested by me by calling the functions as.( )
What are the two main ways of creating regular sequences in R? What’s the relationship between them?
- Using the Colon Operator: from:to
e. g. 1:5 generates 1 2 3 4 5 - Using seq( ) function.
e. g. seq(1:4) generates 1 2 3 4
seq( ) is a generalization of from:to
Whats the result of this code? Why?
x = 0:6
typeof(x)
“integer” because the colon operator returns a integer vector unless the elements of the sequence cannot be represented as integers, in which case it returns a double vector.
What’s the type of object num and how can I convert it to integer?
num = c(‘1’, ‘2’, ‘5’)
Type and class is ‘character’
I can covert to integer by calling function as.integer(num)
I’m doing explicit coercion here.
What’s the meaning of NA? What’s the result of typeof(NA)
Not Available / Missing value
NA is a logical constant of length 1 which contains a missing value indicator.
Missing values in the statistical sense, that is, variables whose value is not known.
typeof(NA) returns logical
How do I test if each value of x is missing? What will be the return?
x = c(3, 5, NA, 2)
I can test with:
is.na(x)
The result will be:
FALSE FALSE TRUE FALSE
How do I test if x has some missing value?
x = c(3, 5, NA, 2)
What’s the result for x?
I call the function
any( is.na(x) )
The result for x will be: TRUE
What’s the meaning of these constants? Give examples of
NaN
Inf
-Inf
- NaN is “Not a Number”. (e.g. “0/0”)
- Inf is “Infinite number” (e.g. “1/0”)
- -Inf is “negative infinite number” (e.g. “-1/0”)
What’s is the output of this code?
x = c(-1,0,1)/0
x
is.na(x)
-Inf NaN Inf
FALSE TRUE FALSE
What’s is the output of this code?
x = c(-1,0,1)/0
x
is.infinite(x)
-Inf NaN Inf
TRUE FALSE TRUE
What’s the difference between a factor and a character vector?
factor is a class used to store items that have a finite number of possible values. These values are also called Levels of the factor. “levels” is an attribute of the class factor.
Factors may look like a character vector but it’s stored and treated differently. Internally they are stored as integers, being each level an integer. Hence the type of a factor object is integer.
How can I create a factor?
By providing a character vector to function factor( ):
factor(c(“alta”,”baixa”,”baixa”,”media”, “alta”,”media”,”baixa”,”media”,”media”))
What am I doing in this code and what’s the output?
fac = factor(c(“alta”,”baixa”,”baixa”,”media”,”media”,”media”))
f2 = as.character(fator)
typeof(f2)
I’m creating a factor. Then I convert it to a character vector.
The output are:
- [1] “character”
- [1] “alta” “baixa” “baixa” “media” “media” “media”
Because when I convert a factor to character, I have a character vector with the names of the factor as characters.
What am I doing in this code and what’s the output?
fac = factor(c(“alta”,”baixa”,”baixa”,”media”,”media”,”media”))
f2 = as.integer(fator)
typeof(f2)
f2
I’m creating a factor. Then I convert it to an integer vector.
The outputs are:
- [1] “integer”
- [1] 1 2 2 3 3 3
Because when I convert a factor to an integer, I have a integer vector with numbers that internally represent each level.
How should this function be called if I want the levels to be sorted by this order: baixa, media, alta?
fac “alta”,”media”,”baixa”,”media”,”media”))
I should add the arguments levels and ordered:
fac “alta”,”media”,”baixa”,”media”,”media”),
levels=c(‘baixa’, ‘media’, ‘alta’),
ordered = TRUE )
I have factor called fac and I want to know what are the levels and how many they are. What functions should I use?
- levels(fac)
- nlevels(fac)
What are matrices? What are the main characteristics?
Matrices are vectors that can be in two dimensions. They are objects of class Matrix. Their type depends on their content.
Main characteristics:
- They are bidimensional
- Can contain only 1 type of data
How does the code look like if I want to create a matrix like this one?
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
matrix(1:12, nrow = 3, ncol= 4)
How does the code look like if I want to create a matrix like this one?
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
matrix(1:12, nrow = 3, ncol = 4, byrow = TRUE)
How do I verify the dimensions of my matrix called “m” ?
dim(m)
How do I add a new column to this matrix called “m” with all values as 99 in the new column?
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
cbind(m, rep(99, 3))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 99
[2,] 2 5 8 11 99
[3,] 3 6 9 12 99
How do I add a new row to this matrix called “m” with all values as 99 in the new row?
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
rbind(matriz, rep(99, 4))
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
[4,] 99 99 99 99
How can I easily transform vector “m” into a matrix without using matrix( ) function.
m = 1:10
By changing its dimensions with function dim:
dim(m) = c(2, 5)
What is ther operator used to multiply matrices?
Matrix Multiplication operator:
’%*%’
What is an array? What are its main characteristics?
Array is a kind of matrix which can have more than 2 dimensions. Arrays are objects of class array.
Main characteristics:
- n-dimensional structure
- can only have one data type
How do I create an array object?
With the array function, providing an atomic vector and the dimensions of the array:
ar = array(1:12, dim = c(2, 2, 3))
Is this code possible? Why?
lista = list(1:30, “R”, list(TRUE, FALSE))
Yes, because a list can contain different data types, including other lists.
What’s the output of this code?
li = list(1:30, “R”, list(TRUE, FALSE))
class(li); typeof(li)
[1] “list”
[1] “list”
What function can I use to vizualize the basic structure of an object (e.g. a list or data.frame)?
str( )
What is the output of this code? Why?
li =list(1:30, “R”, list(TRUE, FALSE))
dim(li)
NULL
Because a list is an one dimensional structure.
Can I put a matrix and a factor in the same list?
Yes, sure! A list can store objects of different classes and different dimensions.
What is a dataframe?
A dataframe is a two dimensional list to store a dataset.
A dataframe object is of class ‘dataframe’ and its type is ‘list’.
They are the most common structures to work with data in R.
Main characteristics:
- a list of vectors and/or factors with the same length
- It can contain different types of data (columns)
- Two dimensional structure
How can I create this simple dataframe?
name sex age
1 John M 32
2 Joseph M 34
3 Mary F 30
By calling the data.frame function with vectors as the arguments:
da = data.frame(name = c(“John”, “Joseph”, “Mary”),
sex = c(“M”, “M”, “F”),
age = c(32, 34, 30))
What’s the argument of data.frame function to say whether or not the characters will be treated as factors
stringsAsFactors
What if If pass two vectors of differents lengths to create a data.frame?
It will be created but the shorter vector will be filled with NA in the last elements.
What’s the difference between calling these two functions for my dataframe df?
- *as.matrix(df)
data. matrix(df)**
- as.matrix(df): will try to convert df into a matrix, which will generally result in a coercion of its type to character.
- data.matrix(df): Returns the matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Factors and ordered factors are replaced by their internal codes. Characters become NAs by coersion.
What is an attribute in R? Can any object have attributes?
It’s a peace of information that can be attached to the object. All objects except NULL can have atttributes attached to them
What are the main attributes for objects in R?
names
dimnames
dim
class
How can I see the attributes of an object?
I should call the function attributes(my_object)
How can I get and set an attribute of an object?
Two ways:
- Using function attr( )
e. g. attr(x, ‘names’) = c(‘one’, ‘two, ‘three’) - Using special accessor functions when the attribute has one
e. g. names(x) = c(‘one’, ‘two, ‘three’)
* *dim(x) = c(2, 4)
* *length(x) = 10 # completes with NA
How can I give names to rows and columns of a matrix?
I need to set the dimnames attribute of the matrix. It can be done by using the accessor functions below:
rownames(m) = c(“A”,”B”,”C”)
colnames(m) = c(“T1”,”T2”,”T3”,”T4”)
How can I change names of rows and columns of a data frame?
I should use the functions row.names( ) and names( ).
There is no such a thing like col.names because data frames are a type of list, so it just has names which are the “columns” names.
How many systems for object orientation does R have? What are them?
Three systems:
S3, S4, and RC (Reference Classes).
What are the main characteristics of S3 system in R?
It implements an object oriented style called generic-function (opposed to message-passing OO that Java and C# implement). The generic functions decide which method to apply depending on the class of the object.
It’s the most basic and most used programming style in R.
What’s is the main difference of S4 system compared to S3?
The generic functions must have a formal class defined.
What’s the main difference of RC system compared to S3 and S4?
The methods belong to objects and not functions as in S3 and S4. It makes R look more like other programming languages like C# and Java. This is the newest system in R.
(Reference Classes)
What function should I use to see all the methods of a generic function or class? How does it work? What does it return?
Call function:
methods(function_name)
It lists methods for S3 Generic Functions or Classes. Methods are found in all packages on the current search() path.
The elements of the list look like: .. We have one item for each to which the generic function has a special method.
What is the method dispatch? How does it work?
It’s the mechanism responsible for identifying the class of the object that is passed to a generic function and based on that, dispatch the execution to the correct method of the function. The generic function calls UseMethod() function to decide which method to use.
Can I call a method directly in S3?
Yes, because methods are just normal R functions.
But I shouldn’t do this because I lose the benefits of having a generic function. I should always call the generic function and let the method dispatch take care of it.
How can I create a method for a generic function (e.g. mean) that handles a specific class? Will my new method appear in results of methods(mean)?
mean. = function(x, …) {
write my code here e.g.:
rowMeans(x, …)
}
Yes, the new method will appear in results of methods(mean)
Why do we say that the S3 system gives us freedom and power to create entire R packages?
Because it allows us to create generic functions, then create methods to handle special classes with these functions and we can even create new classes that will also be handled by our methods later.
What’s the difference between these two lines of code? What do they return?
-1:5
-(1:5)
- *-1:5** creates a sequence from -1 to 5:
- *-1 0 1 2 3 4 5**
- *-(1:5)** creates a sequence from 1 to 5 and changes all elements to negative:
- *-1 -2 -3 -4 -5**
What is the basic syntax of a for loop in R which prints numbers 1 to 10?
for ( i in 1:10 ) {
print(i)
}
Do for loops need to go through a sequence of numbers?
No, I can use the same for( ) structure to go through a vector of unordered integers or even a character vector.
What is the function used to see basic statistics of a data frame, like median, quartiles, min, max?
summary(df)
How can I add a new column to my data frame da? The new column should be named “grade” and have 0 in all rows.
da$grade = 0
How can I check the number of rows of my data frame df? And the number of columns?
nrow(df) : number of rows
length(df) or ncol(df) : number of columns
What function can I use to compare if two data frames or two columns are equal?
identical(df$column, df2$column)
How can I create a numeric vector with 20 items, filled with 0s?
What about a character vector with 30 blank elements?
numeric(20)
character(30)
Why should I prefer vectorized operations instead of loopings in R?
Because they are more efficient to compute, so they run faster, because R just needs to interpret the code one time, running compiled code inside. Also, they require less code.
What function can I use to see the CPU time of my code?
system.time(
my_code_here
)
What’s the main recomendation if I NEED to run a for loop that will generate a vector at the end?
Create the vector before the loop with enough length to store ALL results. Never grow a vector using c( )! At each iteration R needs to allocate a new space in memory to store the new vector and delete the old one.
How does object x look like and what’s its class and type?
x = cbind(x1 = 3, x2 = c(4:1, 2:5))
x looks like the object below. Its type is double and class is matrix.
x1 x2
[1,] 3 4
[2,] 3 3
[3,] 3 2
[4,] 3 1
[5,] 3 2
[6,] 3 3
[7,] 3 4
[8,] 3 5
I have a data frame called students, with a column called grade. I want to add a new column called status, which will be “approved” for students with grades greater then or equal to 7 and “not approved” to the others. How can I accomplish this with a vectorized function?
Using vectorized ifelse:
students$status = ifelse(students$grade >= 7, “approved”, “not approved”)
When does the structure repeat{ } stop?
When I call break within it.
What is this code doing?
aggregate(prova1 ~ situacao, data = notas, FUN = mean)
It’s calling vectorized function aggregate on data frame notas. It will group by column situacao and will apply function mean over column prova1 to each group.
So it will return a data frame with one column for each group of situacao and a row with mean of prova1 for each group
What are the functions of *apply( ) family? What are the basic differences between them?
- apply( ): able to operant on rows and columns (MARGIN)
- sapply( ): operates on columns. Simplifies the result to a vector
- lapply( ): operates on columns. Returns a list
- tapply( ): operates on columns. Allows grouping the function by another column.
O que é importante saber ao criar funções que recebem um vetor de x como argumento, ao invés de um valor x apenas? Qual o efeito colateral e como resolver?
No caso de funções de uma única variável de entrada o R automaticamente vetoriza a operação, ou seja, aplica a função a cada ponto do vetor de entrada.
Quando eu passo um vetor de vetores para uma função, estou passando um vetor apenas, e como minha função já está olhando pra indices específicos desse vetor, ela despreza todos os outros.
Neste caso, uma das formas de avaliar a função em mais de um ponto é usar uma instrução for percorrendo uma matriz de entradas por linha.