R Flashcards

1
Q

Modulo operation

A

The modulo returns the remainder of the division of the number to the left by the number on its right, for example 5 modulo 3 or 5 %% 3 is 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Check the data type of a variable

A

class()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

assign value to variable

A

var <- value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how to create a vector

A

with the combine function c()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

assign names to vector

A

names()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

[2:5] –> which values does this include?

A

includes the second and fifth value of a vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define a new variable based on a selection from a vector

A

new_var <- some_vector[c(…,3, 4, …)] or [:]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Construct a matrix with 3 rows that contain the numbers 1 up to 9

A

matrix(1:9, byrow = TRUE, nrow = 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name the columns and rows of a matrix

A

rownames()
colnames()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

calculate sums of rows of matrix or of columns

A

rowSums() or colSums

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Merge matrices and/or vectors together by column (right) or below

A

cbind() or for below: rbind()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

in console, check out contents of workspace

A

ls()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

data on the rows 1, 2, 3 and columns 2, 3, 4.

A

my_matrix[1:3,2:4]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

encode the vector as a factor –> and, optional, also give them an order

A

factor() –> factor(temperature_vector, order = TRUE, levels = c(“Low”, “Medium”, “High”))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

change factor levels of a factor vector to …

A

levels(factor_vector) <- c(“”, “”) –> the order with which you assign the levels is important. If you don’t specify the levels of the factor when creating the vector, R will automatically assign them alphabetically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

quick overview of the contents of a variable

A

summary()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

see first or last rows of a built-in dataframe

A

head() or tail()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

select a subset based on a certain condition from your dataset

A

subset(dataframe, condition)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

order a vector

A

order()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Call order() on a dataframe by ordering it based on certain column

A

dataframe$column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

see structure of a dataframe

A

str()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

create a dataframe

A

data.frame()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

add components to a list, then assign names to the components

A

my_list <- list(your_comp1, your_comp2)
names(my_list) <- c(“name1”, “name2”)

or

my_list <- list(name1 = your_comp1,
name2 = your_comp2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

filter dataframe

A

filter()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

sort the rows of a df based on a positions vector

A

planets_df[positions, ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

sort values in a dataset

A

arrange(column_to_use) or arrange(desc(column_to_use)) for descending

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

pipe

A

%>%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

change values in a dataframe

A

mutate(what_is_replaced = what_is_calculated)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

package for data visualization

A

ggplot2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

create visualization

A

ggplot(dataset, aes(aesthetic mapping of variables) + type of graph)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

function for creating scatterplot with ggplot

A

geom_point()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

to ggplot, add color and size of dots

A

ggplot(dataset, aes(aesthetic mapping of variables, color = …, size = …) + type of graph)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

divide one plot into multiple smaller plots

A

faceting: facet_wrap(~…)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

turn groups into one row each before summarize()

A

group_by()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

after specifying type of graph, also specify log scale

A

+ scale_x_log10()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

turn many rows into one with pipe

A

… %>% summarize(… = mean(…))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

make a line plot

A

geom_line()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

make a bar plot

A

geom_col()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

make a histogram

A

geom_histogram(binwidth = … or bins = …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

make a boxplot

A

geom_boxplot()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

how are the lines going up and down from the boxplot called?

A

“whiskers”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

add title to ggplot

A

+ggtitle(“…”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Data visualization points of consideration

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Add a smooth geom to the plot

A

geom_smooth()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

geom_point() has an alpha argument - what does it mean?

A

controls the opacity of the points. A value of 1 (the default) means that the points are totally opaque; a value of 0 means the points are totally transparent (and therefore invisible). Values in between specify transparency.

26
Q

when only is it necessary to map onto the color aesthetic in geom?

A

When all layers should NOT inherit the same aesthetics or when mixing different data sources

26
Q

how is fill distinct from color?

A

fill differs that color usually, but not always, refers to the outline of a shape

27
Q

change line pattern

A

linetype

27
Q

text on a plot or axes

A

geom_text() layer and aes(label = …)

27
Q

attributes vs aesthetics

A

attributes are always called in the geom layer, for example, it’s color attribute is set by the color argument, its size by the size argument

28
Q

get rownames of dataframe

A

rownames()

29
Q

default position for dataplot

A

identity (put position exactly where it originally should be)

29
Q

add random noise to points to counteract overplotting

A

position = “jitter” or before everything set posn_j <- position_jitter(…) and alter position = posn_j

30
Q

all aesthetics are a scale, so how can be access that scale?

A

with scale_…*()

30
Q

most common scale arguments

A

limits, breaks, expand, labels

31
Q

set the x- and y-axis labels

A

You’ll also make use of some functions for improving the appearance of the plot.

labs() to set the x- and y-axis labels. It takes strings for each argument.

31
Q

define properties of the color scale (i.e. axis). The first argument sets the legend title. values is a named vector of colors to use.

A

scale_fill_manual()

32
Q

set y limits of axis

A

+ ylim()

33
Q

to which axes are the independent and dependent variables mapped?

A

Typically, the dependent variable is mapped onto the the y-axis and the independent variable is mapped onto the x-axis.

34
Q

Markdown: Bold, italics, both, headers, inline link, reference link, images, block quote, lists, soft break

A

(1-6 # possible)

Bold:
Italics:
Both:
some_text_displayed
instead of (link) directly, use […], and at the end, […]: link
images: same as links, just enter ! before []
block quote: simply enter the symbol > before
Lists: either just * or 1., 2., etc.
Soft break: with 2 spaces ( )

34
Q

picture of different geometries

A
34
Q

how to offset bars in a histogram? How to “use the complete space top to bottom”?

A

position = “dodge”, position = “fill”

35
Q

for plotting, count number of cases at each x position

A

geom_bar()

36
Q

plot errorbars e.g. for a mean

A

geom_errorbar(aes(ylim = avg - stdev, ymax = avg + stdev))

36
Q

in aes, set different line types

A

linetype = …

37
Q

modify visual elements not part of the data (text, line, rectangle)

A

element_…()

38
Q

aesthetics for categorical variables

A
39
Q

in a plot, how to change e.g. the axis title colour

A

+ theme(axis.title = element_text(color = …))

40
Q

do we need to modify each e.g. text item individually to e.g. change the colour?

A

no, they inherent from each other in a hierarchy. All text elements inherit from text, so if we changed that argument, all downstream arguments would be affected. The same goes for line and rectangle.

41
Q

remove legend

A

theme(legend.position = “none”) - also: “top”, “bottom”, “left”, or “right’”: place it at that side of the plot.

42
Q

remove an element in a plot

A

eg line: line = element_blank()

43
Q

look inside data how each column looks

A

glimpse()

44
Q

Basic data types

A

Character, Numeric (Double/Integer), Logical

45
Q

vector vs matrix vs array vs dataframe

A

row in excel, excel sheet, stacked excel sheets, 2-D array which can hold different data types down each column

46
Q

Tibble

A

2-D array with less functionality than a dataframe to limit user mistakes - tibble is the unifying feature of tidyverse (data is expected to be in tibble)

47
Q

Feature matrix X c R^(NXD)

A

feature matrix X which contains N observations and D features (and R = real numbers)

48
Q

Dimensionality of data - what counts?

A

When we talk about ‘dimensionality’ we typically mean ‘how many independent variables do I have for analysis’?

48
Q

A single observations forms a row of data - correct?

A

Yes

49
Q

is a number (and the opposite)

A

is.na (is not a number) and !is.na (opposite)

50
Q

how do tidy up names of dataframe

A

clean_names()

51
Q

you need to convert characters to numbers - how to do that

A

by creating a factor - function called factor()

52
Q

each variable forms a column, each observation forms a row, each cell is a single value - correct?

A

yes

53
Q

which format in tidyverse is considered tidy?

A

long format

54
Q

function to change date format - and package in tidyverse

A

dmy() - Lubridate package

55
Q

create tidy format in tidyverse - function

A

pivot_longer()

56
Q

select columns of interest

A

select() –> no need for “” to select column, just write column_name

57
Q

select rows of interest

A

filter()

58
Q

select all columns in a dataframe except one

A

select(-column_name)

59
Q

instead of using mutate: drop all non-transfomred variables - function

A

transmute()

60
Q

mutate vs transmutate

A

mutate() keeps all variables in the original dataframe (unless otherwise specified in the .keep argument.) transmute() returns a dataframe with only the newly computed or modified variables.

61
Q

filter in a df from specific date to specific date

A

filter(date >= as.Date(“2020-01-01”)) %>%
filter(date < as.Date(“2021-01-01”)) %>%

62
Q

in timeseries, use e.g. previous value, what function?

A

lag()

63
Q

mutate datetime into year

A

mutate(year = year(date))

64
Q

join two dataframes - function

A

inner_join(df1, df2, by = …)

65
Q

for pivot_longer, use all available columns

A

pivot_longer(cols = everything(), …, …)

66
Q

how to in ggplot access different infos/settings for plot

A

+ theme(…)

67
Q

draw a straight line in a ggplot

A

+ geom_smooth(method = “lm”, color = …) - other line would e.g. be “loess” -> works basically like a local regression

68
Q

define some sort of baseline in a plot

A

geom_hline(yintercept = 0, color = …)

69
Q

filter filters rows, select selects columns - correct?

A

yes

70
Q

get probability of z-value - function

A

xpnorm(probability, mean =…, sd = …)

71
Q

given a particular probability of 𝑍 < 𝑧, what is the corresponding value 𝑧?

A

qnorm(value z, mean = …, sd = …)

72
Q

Function for correlation

A

cor(X,Y)

73
Q

Get summary statistics about variables in tibbles

A

skim() of the skimr package - skim() works with (grouped) tibbles

74
Q

get some statistics on some data (e.g. tibbles)

A

favstats(column, data)

75
Q

USe the dplyr package to select to columns from e.g. a tibble/data set and plot correlations with one (simple/nice) function

A

… %>%
dplyr::select(…, …) %>%
ggpairs()

76
Q

add a trendline over an existing plot, and use a linear model, and control whether the standard error (confidence interval) of the fitted line should be displayed - do not display it

A

geom_smooth(method=lm, se=FALSE, color=…)

77
Q

Which library to plot 3 essential plots for residuals? Code for that?

A
  • ggfortify
  • model1 %>% autoplot(which = 1:3) + theme_bw()
78
Q

What does :: do?

A

:: is operator which helps to access a specific function from a specific package

79
Q

Function/plots that allow to check whether assumptions of a Linear Regression model have been satisfied (i.e. examination of the behavior of the residuals for model inadequacies)

A
  • autoplot()
  • from ggfortify library to plot 3 essential plots for residuals
80
Q

Build a multiple regression model

A

model <- lm(y_variable ~ 1st_var + 2nd_var + 3rd_var)

81
Q

use a library to compare valid models (regression) to chose the best one

A
82
Q

Test in RStudio for Multicollinearity

A