R Flashcards

1
Q

console

A

where to write R functions and code; doesn’t save the code you wrote

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

environment

A

where you see the objects you’ve created

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

order of operations in R

A

PEDMAS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

print()

A

prints the value stored in an object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

rules for naming identifiers in R

A

must start with letter or period; if it starts with a period, can’t be followed by a digit

reserved words can’t be used as identifiers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

function

A

piece of code that performs a specific tasl

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

arguments should be listed within…

A

parentheses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

data types in R

A

numeric: double or integer (L after the number)
string: character
logical: TRUE or FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

typeof()

A

displays the data type of the argument passed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

alphabetical string comparison

how are they compared?

A

dictionary order; assume all in lowercase
if there is a tie when everything is assumed lowercase, lowercase < uppercase
if there is a number, digit < letter

numbers < lowercase letters < uppercase letters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

why is TRUE + FALSE = 1?

A

TRUE is coerced to 1
FALSE is coerced to 0
therefore 1 + 0 = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

implicit coercion

A

R converts data types to be able to accomplish commands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

AND, OR, NOT

what are the symbols used?

A

&, |, !

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

when can you break a line in R?

A

after , & and %>%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

atomic data

A

object that holds a single value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

vector

can a vector have different data types? what if it’s NA?

A

object that holds multiple values of the same data type; like a column/row array

always the same data type, even if it’s NA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

creating a vector

A

vectorname <- c(element1, element2, element3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
  1. TRUE&TRUE
  2. FALSE&TRUE
  3. TRUE&FALSE
A
  1. TRUE
  2. FALSE
  3. FALSE
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

creating a new vector that is numbers added to an old vector

A

newvector <- oldvector + 2
OR newvector <- oldvector +c(2, 2, 1, 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

length(x)

x is an object

A

outputs the number of elements in X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

subsetting

how are indices numbered?

A

retrieving specific elements from a vector using indices

numbered starting from 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q
  1. retrieve an element from a vector with a given index
  2. retrieve a range of indices
  3. retrieve specific elements with specific indices
  4. retrieve all but index 3
  5. retrieve all but index 2 and 3
A
  1. vectorname[index]
  2. vectorname[index:index]
  3. vectorname[c(1, 2, 4)]
  4. vectorname[-3]
  5. vectorname[-c(2,3)]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

adding new elements to an existing vector

A

vectorname[3:4] <- c(“newvalue1”,”newvalue2”)
assign to locations with no values within an existing vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

which(x)

what is x?

A

gives indices of TRUEs; output is a vector of position numbers

used to identify particular observations that satisfy the condition specified

x is a logical vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

vectors

subsetting the entries that satisfy a condition

A

column[which(condition)]

ex. cities[which(population > 100000)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

max, min, range, sum, mean, sd, var, sqrt, sort

A

max value, min value, min and max values, sum, average, sd, variance, square root, puts elements in ascending order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

coercion in vector creation

what is the order?

A

if elements of a vector are specified in different data types, R will coerce them into 1 data type

therefore typeof(vector elements) = highest-ordered data type

logical > integer > double > character

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

how to install a package/library

A

install.packages(package_name)

might need to add dependencies = TRUE, ex for tidyverse

R will download from CRAN

29
Q

how to load a package/library

> means completed

A

library(package_name)

30
Q

what is

data frame

what is tibble?

A

data frame = R object that stores a collection of obs for 1 or more variables; a table

tibble: allows a collection of vectors to be combined into a data frame

31
Q

accessing a value from a data frame

A

df[[row, column]] outputs an atomic value

df[row, column] outputs another data frame

32
Q

accessing a column of a data frame

A

new_object <- df$column

33
Q

extract specific items from a column that satisfy a condition in another column

A

df$column1[which(df$column2 {logical})]

34
Q

modify a single value in a data frame

A

df$column[which(df$column == identifier)] <- new value

35
Q

calling the items in a column that satisfy a condition, as a vector

A

df$column {logical}

outputs a vector of values within the column that satisfies the condition

36
Q

creating a new column in an existing df

calculation column?

A

df$new_column <- c(values)

calculation column: df$new_column <- df$column1 / df$column2

37
Q

what functions need na.rm?

does this modify the data

A

aggregate functions like mean, sum, sd, var, etc

doesn’t modify the data, only removes the NAs from the calculation

38
Q

how to identify the NAs in a column

how to find the number of NAs in a column?

A

is.na(df$column)

outputs logical vector

sum(is.na(df$column))

39
Q

loading an RDS file

A

use the GUI to load the data; executes readRDS by itself

40
Q

select()

syntax

A

extracts columns specified

new_obj <- select(df, variables)
View(new_obj)

need new obj so that it doesn’t display in interface

41
Q

filter()

syntax; what happens to NAs?

A

keeps rows/obs where conditions specified are satisfied; only TRUES are kept

new_obj <- filter(df, conditions)
View(new_obj)

can combine conditions with &, |
rows where conditions evaluate to NA are dropped

42
Q

combining select and filter with and wihtout pipe

A

with pipe: df %>% filter(conditions) %>% select(columns)

without pipe: select(filter(df, column == condition, columns)

43
Q

mutate()

syntax

A

adds new columns to a df

df <- mutate(df, new_variable = function/operation)

assign to original df so that it doesn’t run in console

44
Q

add a new column to obs that meet 2 conditions (and remove NAs), and select 2 columns

A

df %>% mutate(new_column = operation, na.rm = TRUE) %>% filter(condition1 & condition2) %>% select(columns)

df %>% mutate %>% filter %>% select

45
Q

summarise()

syntax

A

used to create an aggregate statistic over obs; used with mean, median, sd, n(), n_distinct etc

summarise(df, new_var = agg_func(existing_var))
outputs a df

needs existing variables!

46
Q

n() and n_distinct()

A

counts the number of rows passed into mutate or summarize

n_distinct() finds the number of unique rows

often use filter %>% summarise(n()) to find # rows that satisfy filter

47
Q

how to filter out NAs

2 methods

A
  1. filter(df, !is.na(column))
  2. filter(column>0) since logical comparison filters out obs that are NA
48
Q

group_by

A

takes an existing data frame and converts it to a grouped dataframe where subsequent ops are performed group by group

oftne followed by mutate or summarise

49
Q

when to use group_by + mutate vs group_by + summarize?

difference in how many columns are kept

A

mutate: for an atomic value function, adding an extra column and assigning values to that column, or an if_else; retains all the columns

summarize: aggregate operation over each group and displays one aggregate result per group; removes all the extra columns except those specified in group_by and the extra one for the agg stat

50
Q

how to use group_by with n()

A

counts the rows in each group separately and summarizes it

df %>% group_by(groupvar) %>% summarize(newvar = n())

51
Q

finding how many instances appear 2x in the dataset

A

df %>% filter %>% group_by(groupvariable) %>% summarize(noftimes = n()) %>% group_by(noftimes) %>% summarize(nofvariables = n()) %>% filter(nofvariables == 2)

52
Q

when to use == vs =

A

== for logical comparison (filter, if_else)

= for assignment

53
Q

if_else

what can it be combined with?

A

makes conditional assignment based on the logical comparison provided

used with mutate and summarise, with or without group_by

54
Q

when to add na.rm?

A

every aggregate function (sum, mean, var, min, max, etc)

55
Q

arrange()

multiple columns? what is it used with?

A

orders the rows of the data frame by the variables specified;

if multiple columns are specified, the first column is used until a tie, where the second column is used

used with select()

56
Q

duplicated()

syntax; how to find number of duplicates in a df?

A

duplicated(x) where x is the df; returns a logical vector where TRUE = a duplicate of an earlier row

sum(duplicated(df)) = finds number of duplicates

57
Q

how to find a duplicated entry in a df and return its value

A

df %>% filter(duplicated(df)) returns values in console

View(df %>% filter(column1 = identifier, …) returns a shortened df with the duplicated entries

58
Q

how to find the location of a duplicated entry

A

df %>% filter(duplicated(df)) %>%
which()

59
Q

how to take a duplicate out of the dataset

A
  1. find the duplicate using tempdf %>% filter(duplicated(df))
  2. df <- df %>% filter(column1 != identifier | column 2 != identifier)
60
Q

inner_join(x,y)

how are columns matched? what if the column names aren’t the same?

A

joins 2 dfs, and returns all rows from x where there are matching values in y, and all columns from x and y

matches columns based on same column names and gives all combinations

use the by-argument: by = c(“xname” = “yname”)

61
Q

left_join(x,y)

syntax; what happens to unmatched entries?

A

new_df <– left_join(x, y) returns all rows form x where there are matching values in y and all columns from x and y, keeping all obs from x

puts unmatched entires in x

62
Q

inner_join, left_join

one-to-one matching vs non-one-to-one matching

how to make non-one-to-one matching work?

A

one-to-one: unique identifiers; each row in x matches with at most 1 row in y

non-one-to-one; no common column name, and a row in x is used with multiple rows in y

by-argument: left_join(x, y, by = c(“xcolumn” = “ycolumn”)

63
Q

multiple columns as matching variables to merge datasets

how to use the by-argument?

A

if the names are the same, it will match automatically

if not, use by = c(“x1” = “y1”, “x2” = “y2”)

64
Q

as.character(), is.character()

what types of objects does it work on?

A

as.character() converts a numerical obj into a character obj

can use atomic values or vectors (all elements get converted)

is.character() checks if the obj is a character

65
Q

as.numeric(), is.numeric()

what types of objects does it work on?

A

as.numeric() converts characters into numeric; works on atomic values or vectors

is.numeric() checks if the obj is numeric

66
Q

change an existing df column from character to number

A

df <- df %>% mutate(column = as.numeric(column) %>% select(columns))

67
Q

export an RDS

syntax

A

saveRDS(data object, “file name”

68
Q

CSV import and export

difference between CSV and RDS?

A

import: read_csv(“path/url”), or use GUI (same as RDS)
export: library(readr), write_csv(data_frame, “file name”)

CSV is compatible with all languages, lists rows with attributes separated by commas. RDS is specific to R and retains data type while CSV does not