R Flashcards
In R’s lattice, makes plots show up Top-> Bottom,
Left -> Right?
…, as.table = TRUE
dplyr version of:
merge(x, y, all.x=T, all.y = T)
full_join(x, y)
with stringr, return the 1st match for a regex?
str_extract(str, regex)
with stringr, replace each vowel in x with “-“?
str_replace_all(x, “[aeiou]”, “-“)
with stringr, replace 1 with one and 2 with two in x?
str_replace_all(x, c(1 = “one”, 2 = “two))
with stringr, return all matches in a string for a regex?
str_extract_all(x, regex)
dplyr version of:
merge(x, y)
inner_join(x, y)
In R’s plot, set number size at tick marks?
plot(…, cex.axis = number)
do with join operation:
flights %>%
filter(dest %in% top_dest$dest)
flights %>%
semi_join(top_dest)
In R, after xaxt = “n”, add ticks for the years 2008 and 2016?
axis.Date(1,
at = c(as.Date(“1/1/2008”), as.Date(“1/1/2016”)),
label = c(“2008”, “2009”))
In R, set the outer margin to leave 2 lines for text on Top and add “Title” there?
par(oma = c(0, 0, 2, 0))
mtext(“title”, outer = T)
With stringr, treat na as string?
str_replace_na()
With stringr, turn myVec (a vector) into one long string with no spaces?
str_c(myVec, collapse = “”)
In R’s plot(), set point type?
plot(…, pch = [0:255])
In R, add a surrogate key to dat?
dat %>%
mutate(surrogate_key = row_number())
Confirm tailnum is the primary key in planes in R?
planes %>%
count(tailnum) %>%
filter(n > 1)
In base R, what function is critical to unique arrangements of plots?
layout()
In R, add a line for a linear model with y & x?
abline(lm(y ~ x))
In R’s plot(), set axis label size?
plot(…, cex.lab = #)
stringr’s function to filter string matches?
str_subset()
Make tidy with tidyr:
tablea
country ‘99’ ‘00’ ‘01’
A x, y, z
B …
C
tablea %>%
gather(‘99’:’01’, key = ‘year’, value = ‘measure’)
with stringr, return the 1st match in each sentence?
sentences %>%
_______(“(a|the) ([^ ]+)”)
str_match
in base R, calculate the mean of variables in DAT at each level of FACTOR?
by(DAT, FACTOR, FUN = mean)
in R, add “label” on the right side of an existing plot in the outer margin?
mtext(“label”, 4, outer = TRUE)
In R, create a function that returns hello or goodbye based on user’s choice?
myFunc
In ggplot, how do you zoom in to the range 0-50 on the y-axis?
… +
coord_cartesian(ylim = c(0, 50))
In R, plot x as an overlay to an existing plot in the top right corner?
par(fig = c(0.5, 1, 0.5, 1), new = TRUE)
plot(x)
In R, how do you review this layout?
nf
layout.show(nf)
In R, how can you define a number of regions within the current device that can be treated as separated graphics devices?
split.screen()
Why manually call regex() in stringr functions?
To set arguments, which include: ignore_case multiline comments dotall
In R, create box plot from Y and FACTOR with notches?
plot(FACTOR, Y, notches = T)
In R, tx data from R to Excel?
1) write.table(data, “clipboard”, sep = “\t”, co.names = NA)
2) paste in Excel
In R, set orientation of #s on tick marks to always horizontal?
par(las = 1)
In R, remove white space before and after string?
trimws(string)
In R, sum of each row of matrix X?
rowSums(x)
In R, get names in a factor variable?
levels(factor)
In R, get # of names in factor variable?
nlevels(factor)
In R, reorder names in a factor variable?
factor(factor, levels = c(‘name1’, ‘name2’))
In R, turn factor names into integers?
as.vector(unclass(factor))
In R, set # of digits to 5 for any output?
options(“digits” = 5)
xyplot(root~week | plant):
add a line for a regression?
xyplot(root~week | plant,
-> panel.abline(lm(root~week))
In R, generate a q-q plot?
qqnorm()
In R, given events A and B, and sample space S, calculate probability of at least A or B occuring?
length(union(A, B)) / length(S)
Base R, return X where X is NA?
x[is.na(x)]
in R, reverse order of vector Y?
rev(Y)
In R, log to base n of x?
log(x, n)
In R, return max at each point of vector x?
cummax(x)
In R, what is the square root of x?
sqrt(x)
In R, difftime() vs. as.difftime()?
difftime() calculates the # of days between dates and as.difftime() creates a time object out of times, not dates.
In R, caluclate 25th percentile of x?
quantile(x, 0.25)
In R, generate a list of 4 1s, 4 2s, and so on up to 10?
rep(1:10, each = 4)
In R, calculate probability of A & B occuring within sample space S?
length(intersect(A, B)) / length (S)
In R, what function is equivalent to IF() formula in excel?
ifelse()
In R, get 5 items from vector Keys that allowed to grab the same value repeatedly?
sample(Keys, 5, replace = T)
Return df’s column B?
df$B
In R, name m’s columns AA, BB, CC?
colnames(m) GETS c(“AA”, “BB”, “CC”)
In R, probability of only A, not B within sample space S?
length(setdiff(A, B)) / length(S)
In R, return a histogram of vector x and then add dashed density lines?
hist(x)
lines(density(x), lty = “dashed”)
In R, return the dimensions of vector v?
dim(v)
In R, return df with the columns medians removed?
sweep(dat, 2, apply(dat, 2, median))
In R, test for normality of dat & describe null hypothesis?
shapiro.test(dat)
Null hypothesis = normally distributed
In R, X is weights and Y is heights. Create a scatter plot with X and Y labels and filled in dots?
plot(X, Y, xlab = “weight”, ylab = “height”, pch = 16)
Calculate the sum of the rows in m, by group?
rowsum(m, group)
What is the square root of x ?
sqrt(x)
In R, what is the working directory?
getwd()
In R, how do you create a list of words out of a string?
strsplit(str, “ “)
In R, get product of all values in vector X?
prod(x)
In R: add function to remove NAs?
newDat
na.omit(dat)
In R, x
x[which(abs(x-50) == min(abs(x - 50))]
In R, return just the means of Dat[,c(1, 2, 3)] by variables var5 and var6?
aggregate(Dat[,c(1,2, 3)], by=list(var5, var6), mean)
In R, how do you see data available in the loaded package “UsingR”?
data(package = “UsingR”)
In R, is it daylight savings right now?
as.POSIXlt(Sys.time())$isdst
In R’s legend, how do you set the fill color for symbols?
pt.bg = …
In R, set tick marks to be on the inside by the default length?
tcl = 0.5
When using dplyr’s arrange(), where do missing values end up?
the end
dates
strptime(dates, “%d%b%y”)
strptime(dates, “\%d\%b\%y”)
With dplyr, return all columns from flights except year through day?
select(flights, -(year:day))
With dplyr, return columns from flights with “ijk” in the name?
select(flights, contains(“ijk”))
dply command to reorder the rows?
arrange()
With dplyr, assign new names to specific columns while returning all columns?
rename()
With dplyr, put flights in descending order by distance?
arrange(flights, desc(distance))
With dplyr, put flights data in order by year, month and then day?
arrange(flights, year, month, day)
In R, find chi-square value for alpha, where x follows chi-square dist with 12 degrees of freedom?
qchisq(0.05, 12, lower.tail=F)
I think lower.tail = F is default…
In R, command to find what package is qplot is?
find(“qplot”)
In R, what function is useful for mathematical notations inside the plots functions?
expression()
In R’s lattice package, create a scatter plot for weight vs age given gender?
xyplot(weight ~ age | gender)
In ggplot, what grammar does size, shape, color and x/y locations relate to?
aesthetics… aes()
In ggplot, 2 ways to facet?
facet_wrap()
facet_grid()
What argument to jitter dots in geom_point?
position = “jitter”
ggplot(data = mpg, mapping = aes(x = displ, y = hwy) +
-> geom_point()
vs.
ggplot(data = mpg) +
-> geom_point(mapping = aes(x = displ, y = hwy))
Same graph, top uses global mapping and bottom uses local mapping
… + facet_grid(drv ~ .)
facets plot by drv along a column (up and down)
ggplot(data = diamonds) +
-> geom_bar(aes(x=cut, fill=clarity)
1) stacked bar chart?
2) 100% stacked bar chart?
3) Grouped bar chart?
position = …
1) no argument
2) “fill”
3) “dodge”
To plot hwy ~ displ from mpg:
ggplot(data = mpg) +
-> geom_point(? 1 = ? 2(x=displ, y = hwy)
? 1 mapping
? 2 aes
In R’s plot(), argument for no tick marks and no #?
xaxt = “n”, yaxt = “n”
In R, return items exclusive to A as compared to B?
setdiff(A, B)
In R, create a sample of 1000 families with 3 children and probability of 0, 1, 2, 3 boys as equal to 1/8, 3/8, 3/8, 1/8?
sample(0:3, size = 1000, prob = c(1/8, 3/8, 1/8, 3/8), replace = T)
In R, transpose matrix m?
t(m)
In R, show all possible scatterplot dot types?
plot(0:25, pch = 0:25
In R, 4 useful function for standard normal distribution?
pnorm(): cum probability
dnorm(): probability density
qnorm(): quantile function
rnorm(): random #s from distribution
In R, return Test’s attributes?
attributes(Test)
In R, how do you time a function?
system.time(functionName())
In R, create a QQ plot with a diagonal line for dat?
qqnorm(dat)
qqline(dat)
In R, check if file “fname.txt” exists in working directory?
file.exists(“fname.txt”)
A
lapply(list.object, length)
In R, with data.frame(DF), typeof(DF)?
“list”
In R’s data.frame(), suppress factor creation?
stringsAsFactors = F
In R, mean of each row of matrix X?
rowMeans(X)
In R, what are attributes?
metadata about objects
In R’s data frame df, what does length(df) return?
the same as ncol(df)
In R, set x’s “cust_attr” attribute?
attr(x, “cust_attr”)
In R, what function is useful for running random #s through a formula?
replicate()
remake this card…
In R, generate a plot a plot on a 3-d plane using vectors x, y, z?
persp(x, y, z)
What 3 attributes stay with modified objects?
Names - names()
Dimensions - dim()
Class - class()
In R, describe bootstrap test for testing a mean?
1) Create vector of means based on samples from true data -> x.bar
2) p.val testVal)]) / length(x.bar)
In R, given vector A, B, and function F that takes 2 arguments, create an array of dimensions (A, B) that is the result of function(A,B) for each cell?
outer(A, B, F)
In R, how do I print vector Y without printing missing values?
Y[!is.na(Y)]
not_cancelled %>%
-> count(dest)
A table showing count of not cancelled by dest
What 2 R packages are useful for larger, interactive heatmaps?
1) d3heatmap
2) heatmaply
In an R plot, set x & y labels color and font?
col. lab =
font. lab
In R, make a scatter.plot matrix of the data in obj?
pairs(obj)
dplyr’s measures of position:
1) x[1]
2) x[2]
3) x[length(x)]
1) first(x)
2) nth(x, 2)
3) last(x)
Geom for a tile plot?
geom_tile()
In R, calculate the mean of X without NAs?
mean(x, na.rm=T)
Rather than filtering out messy data, another–perhaps better–route?
Make the values missing
In R, find F value for alpha = 0.05 in the lower tail, where x follows f-dist and df1 = 5, df2 = 15?
qf(0.05, 5, 15)
In base R, x is a vector of age data. Create a histogram with an x-label, a title, and bins of size 20. Then add lines to the histogram?
hist(x, xlab=”Age” main = “title”, breaks = 20)
lines(density(x))
In RStudio, start a new script?
ctrl - shift - n
ggplot histogram?
geom_histogram()
Convert data frame to tibble?
as_tibble()
ggplot frequency polygon?
geom_freqpoly()
not_cancelled %>%
-> count(tailnum, wt=distance)
A table showing miles flown by each tailnum among not_cancelled
With dplyr, return dat’s columns var1, var2, var3?
select(dat, num_range(“var”, 1:3))
A function useful for 5<=x<=10?
between(x, 5, 10)
In base R, change the color of the axes?
par(fg = )
In R, return items in both A and B?
intersect(A, B)
ggplot’s bar graph?
geom_bar()
readr’s parsing functions when the data are already read into R?
parse_*()
delays ?
- > filter(n > 25) ?
- > ggplot(aes(x=n, y=delay)) ?
- > -> geom_point(alpha = 1/10)
1) %>%
2) %>%
3) +
Create tibble from individual vectors?
tibble()
In readr, read in txt.csv and identify the comment lines as starting with #?
read_csv(“txt.csv”, comment = “#”)
in readr functions, don’t read first 5 lines?
skip = 5
In R, return 50th percentile of x?
median(x)
What ggplot function is critical for horizontal bar chart?
coord_flip()
In dplyr, how do you remove grouping?
ungroup()
In RStudio, send previously sent chunk from editor?
ctrl - shift - p
What is geom_bin2d() and geom_hex()?
Divides coordinate plane into 2d bins and uses fill color to show density
ggplot(smaller, aes(carat, price)) +
-> geom_boxplot(aes(group = ???(carat, 0.1)))
cut_width
tidyverse package for querying databases?
DBI
tidyverse package to read in SPSS, Stata, or SAS files?
haven
Using geom_point(), add transparency?
alpha = ..
readr’s identify encoding?
guess_encoding()
What are the main differences between data.frame and tibble?
1) printing
2) subsetting
print(flights, 1? = 10, 2? = 3?)
1=argument for number of rows
2=argument for number of columns
3=argument for all columns
1) n
2) width
3) Inf
For a density plot (frequency polygon) in ggplot:
ggplot(data = diamons, aes(x=price, y=1?)) +
-> 2?(aes(color=cat), binwidth=500)
1) ..density..
2) geom_freqpoly
In R, return the mean of each column in matrx x?
colMeans(x)
In R, create a boxplot of age data in x?
boxplot(x)
In R, return the sum of columns in matrix m without colSums()?
apply(m, 2, sum)
In R, test if x is TRUE?
isTRUE(x)
In dplyr, return flights where month equals the last 6 months of the year?
filter(flights, month %in% 7:12)
dplyr command to pick variables by name?
select()
dplyr command to operate on data group-by-group?
group_by()
R shortcut for
alt + -
dplyr command to pick observations by value?
filter()
In R, view list of functions and data in package “spatial”?
library(help = spatial)
With dplyr, return flights where month equals 1 and day equals 1?
filter(flights, month == 1, day ==1)
dply command to create new variables with functions of existing variables?
mutate()
What happens to NA values using dplyr’s filter()?
Filter excludes NA & False
With dplyr, return year, month, and day from flights?
select(flights, year, month, day)
With dplyr, return flight’s column time_hour and then all other columns?
select(flights, time_hour, everything())
In R, what argument is used for removing borders?
bty = ‘n’
border type…
In R, what are 2 ways to show overlapping dots in a scatterplot?
1) jitter(x) or jitter(y) or both
2) sunflowerplot(x, y)
In R, what is the default graphics window size?
7 inches by 7 inches
In R, what does which(requests %in% stock) return?
The index of items in requests that match an item from stock
In R, set the line thickness?
lwd = #
Line width…
In R, what does this do?
peas[1:length(peas) %% 2 ==0]
Returns objects in peas at even rows
In R, pmin(x, y, z)?
Returns the minimum of x, y, or z across each item in x, y, and z
In R, what is the difference between unique() and duplicated()
unique returns just the unique items while duplicated returns a boolean vector identifying duplicates
In R, what does this do?
peas[-length(peas)]
Returns peas without its last item
In R, what functions are useful in naming rows or columns?
rownames()
colnames() or names()
In R, how do I generate 1, 1.5, 2, 2.5, 3?
seq(1, 3, 0.5)
In R, what function is helpful in making flat contingency tables?
ftable()
In R, DF is data for 2 factor variables with 2 levels each, count up the combinations for dat1 & dat2?
table(dat1, dat2)
In R, create a plot of x and y that looks like a line plot with no right border?
plot(x, y, type = “l”, bty = “c”)
In R, how you view complete list of available packages?
library()
With dplyr, by_day=group_by(flights, year, month, day):
return the average daily dep_delay?
summarize(by_day, mean(dep_delay, na.rm=T)
In R, turn off x-, y-labels, and title?
ann = F
annotations…
In R, remove all user-defined variables?
rm(list=ls())
In R, how can you fine tune top, left, right and bottom axes?
axis() 3 _3_ 2 | | 4 |_1 _|
In R, counts
table(counts)
In R, add a grid to a plot?
tck=1 (default is 0)
tick marks…
In R, sub() vs. gsub()?
sub replaces 1st occurrence of a pattern; gsub replaces all occurrences of a pattern
In R, set background color for graphics?
par(bg=”grey”)
background…
In R, if is.ts(dat)=true, then what is returned for plot(dat)?
a timeseries graph, which is actually plot.ts(dat)
In R, what function is superior to attach() due to environment issues?
with(data, function(…))
In R, na._(x) will return x w/o NAs?
na.omit
In R, sort DF by Var1, Var2, and then Var3?
DF[order(DF$Var1, DF$Var2, DF$Var3)]
In R’s lattice, create a box and whisker plot of Growth vs. Water and Daphnia given detergent?
boxplot(Growth ~ Water + Daphnia | Detergent)
In R, standardize dat’s columns 2:3?
scale(dat[,2:3)
In dplyr, create new variables and get rid of all others?
transmute()
read_csv(“challenge.csv”, 1? = 2?(x = col_double(), y = col_date()))
col_type = cols
Geom for a boxplot?
geom_boxplot()
In R, transfer data from Excel to R?
Copy from Excel and readClipboard()
With dplyr, return flights columns that end with “es”?
select(flights, ends_with(“es”))
In R, return p-value when observed chi-square is 14.56 and df = 7?
1 - pchisq(14.56, 7) or
pchisq(14.56, 7, lower.tail = F)
In R, how do you prepare to make a 4 plots on the same output?
par(mfrow = c(2, 2)) (row) par(mfcol = c(2, 2)) (column)
In R, justify text w/i the text() function?
adj = c(x, y)
In R, x
DOTplot(x)
In R, when is the argument used to chane the plotting symbol color?
When pch = 21:25
In R, rows in DF whereVar1is greater than its median and Var2 is True?
DF[DF$Var1 > median(DF$Var1) & Var2 == T]
In R, x
x[-which(is.na(x))]
In R, N
x.bar = c()
for (i in 1:N){
-> x x.bar[i] = mean(x)
}
Whenever you group_by(), what should you include?
counts using n()
In R, if data are not normal and a t test is not possible, what is the appropriate test function?
wilcox.test()
Apply readr parsing heuristics to the character columns in data frame?
type_convert()
tidyverse package to read Excel?
readxl
read_csv(“file.txt”, ? = “#N/A”
na
In R, return the names of columns in a data frame?
names(table)
Call df$x using pipe?
df %>% .$x
In readr, parse_() vs. col_()
parse_*() when dealing with character vector
In R, x
x[x %% 4 == 0]
In R, name matrix m’s rows A, B, C, D?
rownames(m)
dplyr command to collapse many values down to single summary?
summarize()
With dplyr, number of unique items?
n_distinct()
read_csv(“file.csv”, ? = F)
col_names
In R, rotate text 45 degrees in a plot?
arg srt = #
What are R’s 6 types of atomic vectors?
logical = T, F integer = 1L, 2L, 3L double (numeric) = 2.5, 4.5 Character = "a", "1" complex & raw, which are both rare
In R, how do you view loaded libraries and environments?
search()
ggplot(data = mpg, aes(x = displ, y = hwy)) +
-> geom_point(data = ?)
Only include subcompacts from class variable?
filter(mpg, class = “subcompact”)
In R, sum x when x is less than 5?
sum(x[x<5])
In R, how you save your existing history of commands to “fname”?
savehistory(file = “fname”)
In R, cut(x, c(0, 2, 4, 6))?
Return a vector of length(x) that is a factor with (2, 4], etc., which is the same as 2 <= x < 4
In R, add an arrow from (1,1) to (3,8)?
arrows(1, 1, 3, 8)
In R, return months from dates? POSIXlt
dates$mon
In R’s plot() or lines() function, what arguments sets line type?
lty
In R, return a current date/time?
Sys.time() or date()
In R’s plot, what argument for setting the scale for y?
ylim = c(0, 100) (example…)
In R, return the day of the month for POSIXlt formatted dates?
dates$mday
In R, output DF as “table.txt” that includes the names of rows and columns?
write.table(DF, “table.txt”, col.names = T, row.names = T)
In R, how do you find all objects that match “lm”?
apropos(“lm”)
In R, what function is useful for generating a pallete in grey scale?
grey()
In R, capitalize all characters in a string?
toupper()
In an R plot, how do you add dots from additional data?
points()
In R’s hist(), set bin edges for count data with range 0:9 and width of 1?
breaks = (-0.5:9.5)
In R, what is current value parameter ‘family’?
par(‘family’)
In R, remove quotes around a string for printing?
noquote()
In R, take a bunch of DVs across columns and make it 1 long vector?
stack()
Reorder class based on hwy’s median?:
gglplot(mpg, aes(class, hwy)) +
-> geom_boxplot()?
ggplot(mpg) +
-> geom_boxplot(aes(reorder(class, hwy, FUN = median), hwy))
In R, given dates of class POSIXlt, return seconds?
dates$sec
In R, how you create a plot’s key?
legend()
In R, return dates day of the year?
dates$yday
In R, xv[which(abs(xv-108)==(min(abs(xv-108))]
Returns xv that is nearest 108
In R, iris[,5] is flower names. Return index of rows that contain names that include a “a”?
grep(“a”, iris[,5])
In R, what is ‘not’, ‘and’, and ‘or’ inside and outside an if operation?
not = ! and ! and = & and && or = | and ||
In R, set axis notation color and font?
col.axis, font.axis
In R, return a vector of the position of a matched pattern in the text where it exists and a -1 otherwise?
regexpr()
In R, view all available datasets included in installed packages?
data(package = .package(all.available=TRUE))
In R, calculate the proportion of each item in a table based on the grand total?
prop.table(table) —- (no margin….)
For R, what is a for script to print 1-5 one a time?
for (i in 1:5){
->print(i)}
In R, quickly return a set of common statistics for obj?
summary(obj) or fivenum(obj)
In R, return probability that x is <=4 based on a normal distribution where mean = 5 and sd = 0.125?
pnorm(4, mean=5, sd = 0.125)
In R, find probability that -1
pt(1.5, 29) - pt(-1, 29)
In R, return the sums of rows of M without using rowSums()?
apply(M, 1, sum)
In R, what function can add words to a graph based on x and y coordinates?
text()
Using R’s RColorBrewer package and the set2 pallette, create an 8-color pallette?
brewer.pal(8, “Set2”)
In R, turn a vector of positive and negative numbers into -1s, 0s, and 1?
sign()
In R, how do you restore a previously save R file called “Fname”
load(file = “Fname”)
In R, what function is useful for printing a sentence as output?
paste()
In base R, read in “fname.txt”, which is a file that has columns separated by whitespace & a header line?
dat
In R, if t
t2
In base R, return the positions of matched patterns in each string for all strings in S?
gregexpr(pattern, text)
In base R, what is a 1st step when doing date calculations?
Convert objects to POSIXlt
In R, tapply(temp, month, function(x) sqrt(var(x) / length(x)))?
Returns temp by month after function operation, which is the standard error.
In R, what happens to a vector of words in a data frame? How do you go back?
- coerced to factor
- as.character(factor)
In R, how do you get the modulo of 119/3 and how do you get the integer quotient?
1) 119 %% 3 = modulo (remainder)
2) 119%/% 3 = integer
In R, how do you generate n random numbers from a uniform distribution between 0 + 1?
runif(n)
In R, closest integer to x between x + 0?
trunc(x) or floor(x)
In R, how do you see an example for the “lm” function?
example(lm)
In R, return the length of vector x?
length(x)
In R, anti log of x?
exp(x)
In R, see help pages for sum() function?
?sum
In R, return vector of ranks of values in x?
rank(x)
In R, sample variance of vector x?
var(x)
In R, how do you combine vector x with vector y?
c(x, y)
In R, vector of the product of all values of x up to that point?
cumprod(x)
In R, dat
tapply(dat$height, list(dat$gender, dat$race), mean)
Describe match(x, y)?
Returns y’s index numbers for each item of x that is in y
With dplyr, return flights columns that have a title w/ a repeated character back to back?
select(flights, match(“(.)\1”)
Five ways to subset a tibble?
1) .$name-vector
2) .[[‘name]]-vector
3) .[[position]]-vector
4) .[‘name’]-tibble
5) .[position]-tibble
In R, set the size of the margin around the plot based on lines of text?
par(mar = c(bottom, left, top, right))
In R, sum each column of matrix x?
colSums(x)
In R, how do you remove the variable x?
rm(x)
In R, x
x >= 5
In R, how can you enter values one at a time from input?
scan()
In R, how do you view existing variables?
ls() or objects()
In R, how do you see a list of built in datasets?
data()
In R, smallest integer > x?
ceiling(x)
In R, how do I return all of dat’s columns from row 4 or all of dat’s rows fromcolumn 10?
dat[4,]
dat[,10]
In R, return vector of the cumulative sum of x?
cumsum(x)
In R, round x to nearest integer?
round(x, digits = 0)
In R, assign dat to a file I choose, which is a csv with headers?
dat
In R, return the name of the day of the week for dates?
weekdays(dates)
In R, x
stem(x)
In R, how do you create a function that returns multiple variables?
Use return() with a list containing the variables to be returned
In R, prepare to plot 16 graphs, 2 in each row?
par(mfrow = c(8, 2))
In R, return a sequence of dates between 10/1/1997 and 10/1/1997, with a date every 3 months?
seq(as.POSIXlt(“1997-10-01”), as.POSIXlt(“2007-10-01”),
->”3 months”)
In R, make a bar graph of the categorical data day with a label “A” on x-axis, a title “Title”, and “B” on y-axis?
barplot(day, xlab = “A”, ylab = “B”, main = “Title”)
In R, right justify text in a graph and then left justify it?
par(adj = 1) par(adj = 0)
In R, correlation of vector x and y?
cor(x, y)
In R, return min of vector X up to each point in vector?
cummin(x)
In R, x
which(x<3)
In R, return vector from 5 to 25 that increases by 0.25?
seq(5, 25, 0.25)
In R, x
x[x<=50]
In R, how do you force it to make you push enter subsequent graphs?
par(ask = TRUE)
In R, return sorted version of x?
sort(x)
In R, what function provides info about ow to cite R software?
citation()
In an R plot, how do you add stepped lines that connect points?
lines(x, y, type = “s”)
S for up then over
s for over then up
In R, return any item in A or B?
union(A, B)
In R, return the positions in a vector of a matched pattern?
grep()
In R, what are four useful functions for rounding?
round()
ceiling()
floor()
trunc()
In R, which.max(x)?
Returns index of the maximum value of x.
In R, vector’s 3 common properties?
Type - typeof()
Length - length()
Attributes - attributes()
In R, how do you install and load a package?
install.packages(“package”)
library(package)
In R, is any value greater than 0 in X?
Are all values greater than 0 in X?
any(X>0)
all(X>0)
With R’s RColorBrewer package, create a 12-color pallette with the “Spectral” colors?
brewer.pal(12, “Spectral”)
In R’s plot(), do not include any axis?
axes = FALSE
In R, read in a csv file saved as “fname.csv”?
read.csv(“fname.csv”)
In R, suppress the creation of the y-axis?
yaxt = “n”
In R, given dates of class POSIXct, return the minutes object?
as.POSIXlt(dates)$min
In R, view the components of a list?
unlist(list)
In R, sort dataframe DF by the variables CARS?
DF[order(DF$CARS,]
In R’s plot(), argument for the label on the x-axis?
xlab = “label”
In R:
union(A, B) vs intersect(A, B) vs setdiff(A,B)
union provides all items from A and B
intersect provides items that are in A and B
setdiff returns items in A that are not in B
In R, prepare to overlay existing plot with another plot?
par(new = TRUE)
In R, test whether 2 items/sets are equal?
setequal(a, b)
In R, set the font to serif for plotted text?
par(family = ‘seriff’)
In base R, return items that match a pattern?
grep(pattern, vector, value = T)
In ggplot, add labels?
labs()
In R, create a vector of A, B, C that each repeat 4 times?
gl(3, 4, labels = LETTERS[1:3])
In R, what is coplot()?
coplot(y~x|z) returns multiple scatter plots y vs x at various ranges of z.
In R plot, set title color and font?
col. main =
font. main =
In R’s plot what function is useful for drawing the area under the curve?
polygon()
In R’s plot, plot labels L using X and Y, centered on X, placed half a character below original points?
text(X, Y, labels - L, pos = 1, offset =0.5)
position refers to first position, X
In base R, how do you join strings?
paste()
In R’s lattice, draw a histogram for minTemp given month?
histogram(~minTemp | month)
month must be a factor…
In R, how can you point and click on the location you want a legend?
locator(1) as position argument
In R’s graphs change the box line color?
fg =
In R, what is the probability of A given B within sample space S?
(length(intersect(A,B)) / length(S)) /
length(B) /length(S)
In R, what function is useful for creating file paths?
file.path()
In ggplot, describe the 7 parameters for making any plot using a generic example?
ggplot(data = DATA) +
GEOM_FUNCTION(aes(MAP), stat = STAT, position = POSITION)+
COORDINATE_FUCTION +
FACET FUNCTION
data, geom, map, statistic, position, coordinate, facet
In R, return the hour object for right now?
as.POSIXlt(Sys.time())$hour
In R, sort dataframe DF by Var1 in reverse order?
DF[rev(order(DF$Var1)),]
In base R, what function is useful for counting letters in a string?
nchar()
In R plot, set subtitle color and font?
col. sub =
font. sub =
In R, return dates day of week?
dates$wday
In R: what function is for applying functions to rows/columns of a matrices of dataframes?
apply()
In R: what function is for applying functions to vectors?
sapply()
In R: what function is for applying functions to lists?
lapply()
In R: what function is for applying functions to a DF?
tapply()
In R’s plot(), set plot char size?
argument cex =
In base R, extract from STRING the characters from M to N?
substr(M, N, STRING)
In R, dates
strptime(dates, “%d/%m/%Y”)
In R, reverse sort DF by factor Var1 and normal sort if by Var2?
DF[order(-rank(DF$Var1), DF$Var2)]
With tidyr, merge table5’s century and year columns to make new_year column?
unite(table4, new_year, century, year)
In base R, join Dat1’s Var1 and Var2 with Dat2’s Name1 and Name2, including incomplete cases?
merge(Dat1, Dat2, by.x = c(“Var1”, “Var2”), by.y =c(“Name1”, “Name2”))
tidyr::unite’s default sep?
_
tidyr function to replace missing values with last observation?
fill()
Make tidy with tidyr:
TABLEA Country->type->count x -> cases -> # y -> cases -> # z -> cases -> # x -> pop -> # y -> pop -> # z -> pop-> #
TABLEA %>% spread(key = type, value = count)
With tidyr, 2 ways to set separates’s sep parameter?
1: regular expression
2: position (positive # = far left, neg # = far right)
With tidyr, combine multiple columns into a single column?
unite()
Default sep in tidyr’s separate function?
any non-alphanumeric character
tidyr verb to deal with observations scattered across rows
spread()
In R, how do you adjust the plotting region?
plt=c(BOTTOM, LEFT, TOP, RIGHT)
In R, how do you set the fill color for boxplots, histograms, etc?
col =
What tidyr verb to turn a variable spread across columns into a single column?
gather()
With tidyr, split a ‘rate’ column (from dat), x/y, into 2 columns?
separate(dat, rate, into = c(“x”, “y”))
tidyr’s function for making implicit missing values explicit?
complete()
Stocks has year, quarter, and return, use tidyr to check for missing values?
stocks %>% complete(year, quarter)
With tidyr, separate, pull, gather, and spread functions, re-evaluate column types?
convert=TRUE)
In R’s plot(), set orientation of #s on tick marks?
the argument las =
x
str_sub(x, 1, 3)
3 ways to use ‘by’ in join operations from dplyr?
1) default, by = null, uses all variables that appear in both tables
2) character vector, by = ‘varname’, uses the variable name specified from both tables
3) 2 character vectors, by = c(‘a’ = ‘b’), use “a” from X and “b” from Y
astr
“ater”
In R, return the proportion of items in each group organized and computed by column, using the matrix dat?
prop.table(dat, 2)
In R, how do you load the environment history saved in “fname”?
loadhistory(file = “fname”)
dplyr version of:
merge(x, y, all.x = TRUE)
left_join(x, y)
In R, create Y that is a sorted version x?
Y
In R, how do you save existing environment objects to “fname”?
save.image(file=’fname’)
In R, view history?
history(Inf)
With stringr, return the start and end of first match in x?
str_locate()
What is the explicit way to str_view(fruit, ‘nana’)?
str_view(fruit, regex(‘nan’))
With stringr, return boolean vector for string matches?
str_detect()
Two stringr functions to test regexp?
str_view()
str_view_all()
With str_extract_all(), return a matrix result?
simplify = TRUE
Using dplyr and stringr: return df$words where words is equal to “x$”
df %>% filter(str_detect(words, “x$”))
In R, given FACTOR and Y, create a plot that shows the value of Y for each case with FACTOR?
stripchart(Y~FACTOR)
Merge with dplyr:
flights %>%
-> _____(airlines, by = ‘corner’)
left_join
str_sub(“Apple”, -3, -1)
plt
In R, how do you save the object X to “fname”?
save(x, file = “fname”)
To speed up stringr functions for simple searches, what do you replace regex() with?
fixed()
What are dplyr’s filtering joins?
semi_join(x, y): keeps all in x that have match in y
anti_join(x, y): drops all in x that have match in y
str_c(“p”, c(‘a’, ‘b’, ‘c’), ‘s’)?
‘pas’ ‘pbs’ ‘pcs’
x %>% full_join(y)?
Keeps all observations of x and y
In R v
matrix(R, nrow=3, byrow=TRUE)
stringr’s version of nchar()?
str_length()
with stringr, combine strings with no space?
str_c()
The default sep = “”
x %>% right_join(y)?
Keep all of y’s observations
With stringr, what are possible with boundary()?
character
line
sentence
word
With stringr, identify the number of string matches?
str_count()
dplyr version of:
merge(x, y, all.y=TRUE)
right_join(x, y)
x %>% left_join(y)?
return all observations of x
ggplot(dat, aes(x)) + geom_bar()
forcats: Add a line to prevent dropping levels of x that have no values.
… + scale_x_discrete(drop = FALSE)
ggplot(relig, aes(tvhours, relig)) + geom_point()
forcats: Rewrite this to put relig in order of tvhours?
ggplot(relig, aes(tvhours, fct_reorder(relig, tvhours)))+
-> geom_point
Using forcats, reorder FACTOR so that “Not Applicable” is the first category?
fct_releve(FACTOR, “Not Applicable”)
forcats function to make legend colors match order of plotted objects?
fct_reorder2()
gss_cat %>%
- > mutate(marital = marital …???
- > ggplot(aes(martial) +
- > geom_bar()
Add forcats functions to get marital in order of increasing frequency on the plot
gss_cat %>%
- > mutate(marital = marital %>% fct_infreq() %>% fct_rev()) %>%
- > ggplot(aes(martial) +
- > geom_bar()
forcats function to adjust a factor’s levels?
fct_recode()
forcats function to adjust a factor’s levels, while reducing the number of levels as well because you can pass a vector of levels for each new level?
fct_collapse()
forcats function to aggregate smaller factor levels into an “Other” category?
fct_lump()
lubridate function to get current date?
today()
lubridate function to get current date-time?
now()
lubridate function to create date from “2011-01-15”
ymd()
lubridate function to create date from “Jun 15 2011”
mdy()
lubridate function to create date from “15 April 2009”
dmy()
lubridate function to create date-time from “2011-01-15 20:11:19”
ymd_hms()
lubridate function to create date from month, day, and year spread across columns?
make_date(year, month, day)
lubridate function to create date-time from month, day, year, hour, min, second spread across columns?
make_datetime(year, month, day, hour, minute, second)
lubridate function to convert date to datetime?
as_datetime()
lubridate function to convert datetime to date?
as_date()
lubridate function to extract year from dt
year(dt)
lubridate function to extract the full month name from dt
month(dt, label = T, abbr = F)
lubridate function to extract the day of the month from dt
mday(dt)
lubridate function to extract day of the year from dt
yday(dt)
lubridate function to extract full day of the week name from dt
wday(dt, label = T, abbr = F)
lubridate function to extract hour from dt
hour(dt)
lubridate function to extract minute from dt
minute(dt)
lubridate function to extract second from dt
second(dt)
lubridate function to round dt
floor_date(dt, “week”)
lubridate function to round dt
ceiling_date(dt, “month”)
dt
year(dt)
dt
update(dt, year = 2010, mday = 19)
my_age
as.duration(my_age)
my_age
my_age + dyears(2) + dweeks(7) + ddays(3)
my_age
my_age + years(2) + weeks(7) + days(3)
What is the difference between lubridate’s durations and periods?
durations use seconds and are exact, but can do unexpected things around day light savings time
periods work with “human” times and aren’t exact, but can do what you would expect around day light savings time (for example)