Basic concepts Flashcards

Question 1

Q

When was R created and which language inspired it?

Answer

A

R was created in 1996 and was inspired by S Language.

Question 2

Q

What’s the main purpose of R?

Answer

A

It’s an statistical environment for data analysis and graphs creation.

Question 3

Q

What are the 3 main characteristics of R?

Answer

A

Free and open source
Intepreted language (instead of compiled)
Object oriented (everything is an object in R)

Question 4

Q

What’s the current version of R? (as of Feb 2021)

Question 5

Q

What are the 4 possible IDEs mentioned by Dr. Fernando in the class?

Answer

A

R Studio Desktop
R Studio Cloud (create an account first)
Google Collab
Emacs + ESS (highly recommended)

Question 6

Q

What is the Working Directory? When do I need to set it? How to set it?

Answer

A

It’s the folder to which R will be redirected. All imported and exported files will be in this directory. It’s very important to set it before I start working.

I can set it with the function setwd(“/home/paulojardim/pasta”)

I can get it with the function getwd( )

Question 7

Q

What function do I use to list all the objects created in an environement?

Answer

A

ls( )

It returns a vector of character strings giving the name of the objects.

Question 8

Q

What is R Workspace?

Answer

A

It’s the place in memory where the variables (objects) are saved. It’s all that was created during a sessios, saved in RAM memory. We can save it in a .Rdata file if they were produced after a long calculation. But the ideal is saving the code itself.

Question 9

Q

What function should I use to generate random numbers of a uniform distribution? What are its arguments and which are mandatory? What’s the default for the not mandatory?

Answer

A

runif (n, min = 0, max = 1)

n = number of observations we want to return. It’s the mandatory argument

min and max are the limits, they are not mandatory and assume 0 and 1 if not provided.

e.g. runif(5) returns 5 random numbers between 0 and 1

Question 10

Q

Do I need to name the arguments when I call a funtion in R? What about order?

Answer

A

No, I don’t neeed to name them. But if I don’t name them I need to respect the order. If I name them I can use in any order.

Question 11

Q

How can I easily see the arguments of a function?

Answer

A

I can call args( ) function.

e.g. args( sample ) returns:

function (x, size, replace = FALSE, prob = NULL)

Question 12

Q

How do I know what are the mandatory arguments of a function?

Answer

A

When I see the args of a function, the ones that don’t have a default value are mandatory:

e.g. in sample() function below, x and size are mandatory

function (x, size, replace = FALSE, prob = NULL)

Question 13

Q

What is the techincal name of “…” and when should I use it in my function?

Answer

A

It’s called ellipsis and I use basically in two situations:

When it makes sense for a function receiving an undefined number of arguments (e.g. print function). Then I can transform the arguments in a list:

arguments = list(…)*
2. When I need to receive arguments to pass to a generic function.

Question 14

Q

What function should I use to concatenate strings?

Answer

A

paste(“string”, “string2”, “string3” , …… )

Question 15

Q

How can I easily see the documentation of a function?

Answer

A

I can use ?function or help(function).

They both return the same thing

Question 16

Q

Within the documentation, what’s the session that tells me about what the function returns?

Answer

A

The Value session.

Question 17

Q

What function returns a list of functions/objects containing a expression?

Answer

A

apropos(“mod”)

Question 18

Q

What function returns a list of functions containing a word in any part of their documentation?

Answer

A

help.search(“geo”)

Question 19

Q

How can I see what are the loaded/attached packages at the moment and what are their code paths in my computer?

Answer

A

search()* - lists the loaded/attached packages
searchpaths()* - lists their paths in my computer

Question 20

Q

What is the most basic way of getting access to R official documentation?

Answer

A

Run R from the terminal by executing “R” command
In R prompt, execute help.start( ). It will launch a local webserver and open the html manuals and documentation.

Introduction to R and The R Language Definition are the main ones.

Worthy reading!

Question 21

Q

What are the 7 packages that are loaded/attached automatically when we run R?

Answer

A

base
utils
stats
graphics
grDevices
datasets
methods

Question 22

Q

How do I load an installed package in R?

Answer

A

I need to run function library() providing the name of the package.

Question 23

Q

How do I install a new package in R?

Answer

A

Run function install.packages(“package_name”)

Question 24

Q

How do I verify if the installed packages need updates?

Answer

A

Execute function packageStatus()

Question 25

Q

How do I update all installed packages automatically?

Answer

A

Run function update.packages(ask = FALSE)

Question 26

Q

How do I create a simple function in R?

Answer

A

helloWorld = function( ) {

writeLines(“Hello”)

}

it’s preferable using the arrow instead of equal sign but Brainscape bugs with arrow

Question 27

Q

How can I delete one object of my workspace? And how can I delete all objects?

Answer

A

rm(x)

rm(list = ls( ) )

Question 28

Q

What’s the meaning of “everything is a vector in R”? Is it bad? What about simple number like 15?

Answer

A

R doesn’t have primitive data types in the way that other languages do. In R even the simplest numeric value is an example of a vector.

This might seems like a crazy idea and potentially inefficient, but it fits in well with the sort of calculations you want to do in R.

A number occurring by itself in an expression (e.g. 15) is taken as a vector of length one.

Question 29

Q

Why R console prints [1] 5 when I type 5?

Answer

A

Because 5 is also a vector (everything in R is). It’s a vector of length one. The [1] means that the console is printing the first element of the vector. When I print a big vector, each row has the number corresponding to the index of the first vector element in that row.

Question 30

Q

What are the two types of vectors and their subtypes? What’s the main difference between those two types?

Answer

A

Atomic Vectors and Lists.

Atomic vectors have six types:

double
integer
character
logical
complex
raw

Lists are also a vector but they can contain more than one datatype. That’s their main difference.

Question 31

Q

What’s the difference between type and class?

Answer

A

Complex data structures are created based on atomic vectors. When they are created we have a class. There are thousands of classes. One object can be of any of these clasess but their type will always be one of the six vector types (or a list)

Question 32

Q

How do I check the type of an object? And its class?

Answer

A

I use typeof( ) function to check the type.

And I use class( ) function to check the class?

Question 33

Q

Whats the type of x? And its class?

x = c(2, 4, 6)

Answer

A

type: double
class: numeric

Question 34

Q

What’s the type of x? And its class?

x = c(2L, 4L, 6L)

Answer

A

Type: integer

Class: integer

Question 35

Q

What’s the type of x? And its class?

x = c(“a”, “b”, “c”)

Answer

A

Type: character

Class: character

Question 36

Q

What’s the type of x? And its class?

x = c(TRUE, FALSE, TRUE)

Answer

A

Type: logical

Class: logical

Question 37

Q

Whats the type of x? And its class?

x = c(2 + 1i, 4 + 1i, 6 + 1i)

Answer

A

Type: complex

Class: complex

Question 38

Q

What’s the type of x? And its class?

x = raw(3)

Answer

A

Type: raw

Class: raw

Question 39

Q

What function do I use to create new vectors?

Answer

A

c( )

Concatenate function.

Question 40

Q

Does R understand a number like 5 as an integer? What do I need to do to accomplish it? Is there a difference in termos of memory usage?

Answer

A

No, it understands and stores as a double. If I want it to take the number as integer I need to use the sufix L:

5L

Yes, there is a difference in terms of memory usage because double numbers require more space.

Question 41

Q

What function to I use to see an estimate of the space in memory that is being used to store an R object? What package do I need to load?

Answer

A

function object.size(myobject)

I don’t need to load any package because it’s in utils package which is pre loaded.

Question 42

Q

What function do I use to generate a sequence? What are its arguments?

Answer

A

seq ( )

seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)),
length.out = NULL, along.with = NULL, …)

length.out - is the total number of elements I want
- along.with* - take the length from the length of this argument.

Question 43

Q

What’s the difference between:

rep(1:4, 2)

and
rep(1:4, each = 2)

Answer

A

rep(1:4, 2) returns “1 2 3 4 1 2 3 4”
rep(1:4, each = 2) returns “1 1 2 2 3 3 4 4”

Question 44

Q

Can I do math operations between a vector and a number? What’s the result of c(3,4,5) * 2 ?

Answer

A

Yes! Remember that numbers ARE vectors of lenght 1. The result is “6 8 10”

We can do math operations with vectors of the same legth or of multiple length.

Question 45

Q

What’s the recycling rule and how does it work?

Answer

A

It’s the way R behaves when you do arithmetic operations with two vectors of different sizes.

The shortest vector is concatenated to itself till it’s length is the same as the longer vector. Then R does the operation.

It only works if the longer object legth is multiple of the shorter object length.

e.g. c(1, 2, 3) * c(4, 5, 6, 7, 8, 9) is actually:

c(1, 2, 3, 1, 2, 3) * c(4, 5, 6, 7, 8, 9)

Question 46

Q

What is the type of z? What are the values of it?

num = c(2, 4, 5, 6)

z = num > 4

Answer

A

z is a logical vector: FALSE FALSE TRUE TRUE

Question 47

Q

What operator do I use if I want to know if number 3 is part of this vector?

num = c(2, 4, 5, 6)

How does the code look like?

Answer

A

Operator %in% :

3 %in% num

Question 48

Q

What happens if I create a vector with elements of different types? Why that happens?

Answer

A

Elements are coerced to an unique type that can represent all the elements.

This is called implicit coercion.

It happens because a vector can only contain elements of the same type.

Question 49

Q

What’s the difference between implicit coercion and explicit coercion?

Answer

A

Implicit coercion is performed by R. It happens when I provide types different from what R was expecting, for example.

Explicit coerction is requested by me by calling the functions as.( )

Question 50

Q

What are the two main ways of creating regular sequences in R? What’s the relationship between them?

Answer

A

Using the Colon Operator: from:to
e. g. 1:5 generates 1 2 3 4 5
Using seq( ) function.
e. g. seq(1:4) generates 1 2 3 4

seq( ) is a generalization of from:to

Question 51

Q

Whats the result of this code? Why?

x = 0:6
typeof(x)

Answer

A

“integer” because the colon operator returns a integer vector unless the elements of the sequence cannot be represented as integers, in which case it returns a double vector.

Question 52

Q

What’s the type of object num and how can I convert it to integer?

num = c(‘1’, ‘2’, ‘5’)

Answer

A

Type and class is ‘character’

I can covert to integer by calling function as.integer(num)

I’m doing explicit coercion here.

Question 53

Q

What’s the meaning of NA? What’s the result of typeof(NA)

Answer

A

Not Available / Missing value

NA is a logical constant of length 1 which contains a missing value indicator.

Missing values in the statistical sense, that is, variables whose value is not known.

typeof(NA) returns logical

Question 54

Q

How do I test if each value of x is missing? What will be the return?

x = c(3, 5, NA, 2)

Answer

A

I can test with:

is.na(x)

The result will be:

FALSE FALSE TRUE FALSE

Question 55

Q

How do I test if x has some missing value?

x = c(3, 5, NA, 2)

What’s the result for x?

Answer

A

I call the function

any( is.na(x) )

The result for x will be: TRUE

Question 56

Q

What’s the meaning of these constants? Give examples of

NaN

Inf

-Inf

Answer

A

NaN is “Not a Number”. (e.g. “0/0”)
Inf is “Infinite number” (e.g. “1/0”)
-Inf is “negative infinite number” (e.g. “-1/0”)

Question 57

Q

What’s is the output of this code?

x = c(-1,0,1)/0
x
is.na(x)

Answer

A

-Inf NaN Inf

FALSE TRUE FALSE

Question 58

Q

What’s is the output of this code?

x = c(-1,0,1)/0
x
is.infinite(x)

Answer

A

-Inf NaN Inf

TRUE FALSE TRUE

Question 59

Q

What’s the difference between a factor and a character vector?

Answer

A

factor is a class used to store items that have a finite number of possible values. These values are also called Levels of the factor. “levels” is an attribute of the class factor.

Factors may look like a character vector but it’s stored and treated differently. Internally they are stored as integers, being each level an integer. Hence the type of a factor object is integer.

Question 60

Q

How can I create a factor?

Answer

A

By providing a character vector to function factor( ):

factor(c(“alta”,”baixa”,”baixa”,”media”, “alta”,”media”,”baixa”,”media”,”media”))

Question 61

Q

What am I doing in this code and what’s the output?

fac = factor(c(“alta”,”baixa”,”baixa”,”media”,”media”,”media”))
f2 = as.character(fator)
typeof(f2)

Answer

A

I’m creating a factor. Then I convert it to a character vector.

The output are:

[1] “character”
[1] “alta” “baixa” “baixa” “media” “media” “media”

Because when I convert a factor to character, I have a character vector with the names of the factor as characters.

Question 62

Q

What am I doing in this code and what’s the output?

fac = factor(c(“alta”,”baixa”,”baixa”,”media”,”media”,”media”))
f2 = as.integer(fator)
typeof(f2)
f2

Answer

A

I’m creating a factor. Then I convert it to an integer vector.

The outputs are:

[1] “integer”
[1] 1 2 2 3 3 3

Because when I convert a factor to an integer, I have a integer vector with numbers that internally represent each level.

Question 63

Q

How should this function be called if I want the levels to be sorted by this order: baixa, media, alta?

fac “alta”,”media”,”baixa”,”media”,”media”))

Answer

A

I should add the arguments levels and ordered:

fac “alta”,”media”,”baixa”,”media”,”media”),
levels=c(‘baixa’, ‘media’, ‘alta’),
ordered = TRUE )

Question 64

Q

I have factor called fac and I want to know what are the levels and how many they are. What functions should I use?

Answer

A

levels(fac)
nlevels(fac)

Answer 64

A

Matrices are vectors that can be in two dimensions. They are objects of class Matrix. Their type depends on their content.

Main characteristics:

They are bidimensional
Can contain only 1 type of data

Answer 65

A

matrix(1:12, nrow = 3, ncol= 4)

Answer 66

A

matrix(1:12, nrow = 3, ncol = 4, byrow = TRUE)

Answer 67

A

dim(m)

Answer 68

A

cbind(m, rep(99, 3))

[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 99
[2,] 2 5 8 11 99
[3,] 3 6 9 12 99

Answer 69

A

rbind(matriz, rep(99, 4))

[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
[4,] 99 99 99 99

Answer 70

A

By changing its dimensions with function dim:

dim(m) = c(2, 5)

Answer 71

A

Matrix Multiplication operator:

’%*%’

Answer 72

A

Array is a kind of matrix which can have more than 2 dimensions. Arrays are objects of class array.

Main characteristics:

n-dimensional structure
can only have one data type

Answer 73

A

With the array function, providing an atomic vector and the dimensions of the array:

ar = array(1:12, dim = c(2, 2, 3))

Answer 74

A

Yes, because a list can contain different data types, including other lists.

Answer 75

A

[1] “list”
[1] “list”

Answer 76

A

str( )

Answer 77

A

NULL

Because a list is an one dimensional structure.

Answer 78

A

Yes, sure! A list can store objects of different classes and different dimensions.

Answer 79

A

A dataframe is a two dimensional list to store a dataset.
A dataframe object is of class ‘dataframe’ and its type is ‘list’.
They are the most common structures to work with data in R.
Main characteristics:

a list of vectors and/or factors with the same length
It can contain different types of data (columns)
Two dimensional structure

Answer 80

A

By calling the data.frame function with vectors as the arguments:

da = data.frame(name = c(“John”, “Joseph”, “Mary”),
sex = c(“M”, “M”, “F”),
age = c(32, 34, 30))

Answer 81

A

stringsAsFactors

Answer 82

A

It will be created but the shorter vector will be filled with NA in the last elements.

Answer 83

A

as.matrix(df): will try to convert df into a matrix, which will generally result in a coercion of its type to character.
data.matrix(df): Returns the matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Factors and ordered factors are replaced by their internal codes. Characters become NAs by coersion.

Answer 84

A

It’s a peace of information that can be attached to the object. All objects except NULL can have atttributes attached to them

Answer 85

A

names
dimnames
dim
class

Answer 86

A

I should call the function attributes(my_object)

Answer 87

A

Two ways:

Using function attr( )
e. g. attr(x, ‘names’) = c(‘one’, ‘two, ‘three’)
Using special accessor functions when the attribute has one
e. g. names(x) = c(‘one’, ‘two, ‘three’)
* *dim(x) = c(2, 4)
* *length(x) = 10 # completes with NA

Answer 88

A

I need to set the dimnames attribute of the matrix. It can be done by using the accessor functions below:

rownames(m) = c(“A”,”B”,”C”)

colnames(m) = c(“T1”,”T2”,”T3”,”T4”)

Answer 89

A

I should use the functions row.names( ) and names( ).

There is no such a thing like col.names because data frames are a type of list, so it just has names which are the “columns” names.

Answer 90

A

Three systems:

S3, S4, and RC (Reference Classes).

Answer 91

A

It implements an object oriented style called generic-function (opposed to message-passing OO that Java and C# implement). The generic functions decide which method to apply depending on the class of the object.

It’s the most basic and most used programming style in R.

Answer 92

A

The generic functions must have a formal class defined.

Answer 93

A

The methods belong to objects and not functions as in S3 and S4. It makes R look more like other programming languages like C# and Java. This is the newest system in R.

(Reference Classes)

Answer 94

A

Call function:

methods(function_name)

It lists methods for S3 Generic Functions or Classes. Methods are found in all packages on the current search() path.
The elements of the list look like: .. We have one item for each to which the generic function has a special method.

Answer 95

A

It’s the mechanism responsible for identifying the class of the object that is passed to a generic function and based on that, dispatch the execution to the correct method of the function. The generic function calls UseMethod() function to decide which method to use.

Answer 96

A

Yes, because methods are just normal R functions.

But I shouldn’t do this because I lose the benefits of having a generic function. I should always call the generic function and let the method dispatch take care of it.

Answer 97

A

mean. = function(x, …) {

write my code here e.g.:
rowMeans(x, …)

}

Yes, the new method will appear in results of methods(mean)

Answer 98

A

Because it allows us to create generic functions, then create methods to handle special classes with these functions and we can even create new classes that will also be handled by our methods later.

Answer 99

A

*-1:5** creates a sequence from -1 to 5:
*-1 0 1 2 3 4 5**
*-(1:5)** creates a sequence from 1 to 5 and changes all elements to negative:
*-1 -2 -3 -4 -5**

Answer 100

A

for ( i in 1:10 ) {

print(i)

}

Answer 101

A

No, I can use the same for( ) structure to go through a vector of unordered integers or even a character vector.

Answer 102

A

summary(df)

Answer 103

A

da$grade = 0

Answer 104

A

nrow(df) : number of rows

length(df) or ncol(df) : number of columns

Answer 105

A

identical(df$column, df2$column)

Answer 106

A

numeric(20)

character(30)

Answer 107

A

Because they are more efficient to compute, so they run faster, because R just needs to interpret the code one time, running compiled code inside. Also, they require less code.

Answer 108

A

system.time(

my_code_here

)

Answer 109

A

Create the vector before the loop with enough length to store ALL results. Never grow a vector using c( )! At each iteration R needs to allocate a new space in memory to store the new vector and delete the old one.

Answer 110

A

x looks like the object below. Its type is double and class is matrix.

x1 x2
[1,] 3 4
[2,] 3 3
[3,] 3 2
[4,] 3 1
[5,] 3 2
[6,] 3 3
[7,] 3 4
[8,] 3 5

Answer 111

A

Using vectorized ifelse:

students$status = ifelse(students$grade >= 7, “approved”, “not approved”)

Answer 112

A

When I call break within it.

Answer 113

A

It’s calling vectorized function aggregate on data frame notas. It will group by column situacao and will apply function mean over column prova1 to each group.

So it will return a data frame with one column for each group of situacao and a row with mean of prova1 for each group

Answer 114

A

apply( ): able to operant on rows and columns (MARGIN)
sapply( ): operates on columns. Simplifies the result to a vector
lapply( ): operates on columns. Returns a list
tapply( ): operates on columns. Allows grouping the function by another column.

Answer 115

A

No caso de funções de uma única variável de entrada o R automaticamente vetoriza a operação, ou seja, aplica a função a cada ponto do vetor de entrada.

Quando eu passo um vetor de vetores para uma função, estou passando um vetor apenas, e como minha função já está olhando pra indices específicos desse vetor, ela despreza todos os outros.

Neste caso, uma das formas de avaliar a função em mais de um ponto é usar uma instrução for percorrendo uma matriz de entradas por linha.