R Flashcards

Question

# vectors subsetting the entries that satisfy a condition

Answer 1

column[which(condition)] | ex. cities[which(population > 100000)]

Answer 2

max value, min value, min and max values, sum, average, sd, variance, square root, puts elements in ascending order

Answer 3

if elements of a vector are specified in different data types, R will coerce them into 1 data type therefore typeof(vector elements) = highest-ordered data type | logical > integer > double > character

Answer 4

install.packages(package_name) might need to add dependencies = TRUE, ex for tidyverse | R will download from CRAN

Answer 5

library(package_name)

Answer 6

data frame = R object that stores a collection of obs for 1 or more variables; a table tibble: allows a collection of vectors to be combined into a data frame

Answer 7

df[[row, column]] outputs an atomic value df[row, column] outputs another data frame

Answer 8

new_object <- df$column

Answer 9

df$column1[which(df$column2 {logical})]

Answer 10

df$column[which(df$column == identifier)] <- new value

Answer 11

df$column {logical} outputs a vector of values within the column that satisfies the condition

Answer 12

df$new_column <- c(values) | calculation column: df$new_column <- df$column1 / df$column2

Answer 13

aggregate functions like mean, sum, sd, var, etc | doesn't modify the data, only removes the NAs from the calculation

Answer 14

is.na(df$column) outputs logical vector | sum(is.na(df$column))

Answer 15

use the GUI to load the data; executes readRDS by itself

Answer 16

extracts columns specified new_obj <- select(df, variables) View(new_obj) | need new obj so that it doesn't display in interface

Answer 17

keeps rows/obs where conditions specified are satisfied; only TRUES are kept new_obj <- filter(df, conditions) View(new_obj) can combine conditions with &, | rows where conditions evaluate to NA are dropped

Answer 18

with pipe: df %>% filter(conditions) %>% select(columns) without pipe: select(filter(df, column == condition, columns)

Answer 19

adds new columns to a df df <- mutate(df, new_variable = function/operation) | assign to original df so that it doesn't run in console

Answer 20

df %>% mutate(new_column = operation, na.rm = TRUE) %>% filter(condition1 & condition2) %>% select(columns) | df %>% mutate %>% filter %>% select

Answer 21

used to create an aggregate statistic over obs; used with mean, median, sd, n(), n_distinct etc summarise(df, new_var = agg_func(existing_var)) outputs a df | needs existing variables!

Answer 22

counts the number of rows passed into mutate or summarize n_distinct() finds the number of unique rows | often use filter %>% summarise(n()) to find # rows that satisfy filter

Answer 23

1. filter(df, !is.na(column)) 2. filter(column>0) since logical comparison filters out obs that are NA

Answer 24

takes an existing data frame and converts it to a grouped dataframe where subsequent ops are performed group by group oftne followed by mutate or summarise

Answer 25

mutate: for an atomic value function, adding an extra column and assigning values to that column, or an if_else; retains all the columns summarize: aggregate operation over each group and displays one aggregate result per group; removes all the extra columns except those specified in group_by and the extra one for the agg stat

Answer 26

counts the rows in each group separately and summarizes it df %>% group_by(groupvar) %>% summarize(newvar = n())

Answer 27

df %>% filter %>% group_by(groupvariable) %>% summarize(noftimes = n()) %>% group_by(noftimes) %>% summarize(nofvariables = n()) %>% filter(nofvariables == 2)

Answer 28

== for logical comparison (filter, if_else) = for assignment

Answer 29

makes conditional assignment based on the logical comparison provided used with mutate and summarise, with or without group_by

Answer 30

every aggregate function (sum, mean, var, min, max, etc)

Answer 31

orders the rows of the data frame by the variables specified; if multiple columns are specified, the first column is used until a tie, where the second column is used used with select()

Answer 32

duplicated(x) where x is the df; returns a logical vector where TRUE = a duplicate of an earlier row sum(duplicated(df)) = finds number of duplicates

Answer 33

df %>% filter(duplicated(df)) returns values in console View(df %>% filter(column1 = identifier, ...) returns a shortened df with the duplicated entries

Answer 34

df %>% filter(duplicated(df)) %>% which()

Answer 35

1. find the duplicate using tempdf %>% filter(duplicated(df)) 2. df <- df %>% filter(column1 != identifier | column 2 != identifier)

Answer 36

joins 2 dfs, and returns all rows from x where there are matching values in y, and all columns from x and y matches columns based on same column names and gives all combinations use the by-argument: by = c("xname" = "yname")

Answer 37

new_df <– left_join(x, y) returns all rows form x where there are matching values in y and all columns from x and y, keeping all obs from x puts unmatched entires in x

Answer 38

one-to-one: unique identifiers; each row in x matches with at most 1 row in y non-one-to-one; no common column name, and a row in x is used with multiple rows in y by-argument: left_join(x, y, by = c("xcolumn" = "ycolumn")

Answer 39

if the names are the same, it will match automatically if not, use by = c("x1" = "y1", "x2" = "y2")

Answer 40

as.character() converts a numerical obj into a character obj can use atomic values or vectors (all elements get converted) is.character() checks if the obj is a character

Answer 41

as.numeric() converts characters into numeric; works on atomic values or vectors is.numeric() checks if the obj is numeric

Answer 42

df <- df %>% mutate(column = as.numeric(column) %>% select(columns))

Answer 43

saveRDS(data object, "file name"

Answer 44

import: read_csv("path/url"), or use GUI (same as RDS) export: library(readr), write_csv(data_frame, "file name") CSV is compatible with all languages, lists rows with attributes separated by commas. RDS is specific to R and retains data type while CSV does not

R Flashcards

(68 cards)