R Flashcards
In R’s lattice, makes plots show up Top-> Bottom,
Left -> Right?
…, as.table = TRUE
dplyr version of:
merge(x, y, all.x=T, all.y = T)
full_join(x, y)
with stringr, return the 1st match for a regex?
str_extract(str, regex)
with stringr, replace each vowel in x with “-“?
str_replace_all(x, “[aeiou]”, “-“)
with stringr, replace 1 with one and 2 with two in x?
str_replace_all(x, c(1 = “one”, 2 = “two))
with stringr, return all matches in a string for a regex?
str_extract_all(x, regex)
dplyr version of:
merge(x, y)
inner_join(x, y)
In R’s plot, set number size at tick marks?
plot(…, cex.axis = number)
do with join operation:
flights %>%
filter(dest %in% top_dest$dest)
flights %>%
semi_join(top_dest)
In R, after xaxt = “n”, add ticks for the years 2008 and 2016?
axis.Date(1,
at = c(as.Date(“1/1/2008”), as.Date(“1/1/2016”)),
label = c(“2008”, “2009”))
In R, set the outer margin to leave 2 lines for text on Top and add “Title” there?
par(oma = c(0, 0, 2, 0))
mtext(“title”, outer = T)
With stringr, treat na as string?
str_replace_na()
With stringr, turn myVec (a vector) into one long string with no spaces?
str_c(myVec, collapse = “”)
In R’s plot(), set point type?
plot(…, pch = [0:255])
In R, add a surrogate key to dat?
dat %>%
mutate(surrogate_key = row_number())
Confirm tailnum is the primary key in planes in R?
planes %>%
count(tailnum) %>%
filter(n > 1)
In base R, what function is critical to unique arrangements of plots?
layout()
In R, add a line for a linear model with y & x?
abline(lm(y ~ x))
In R’s plot(), set axis label size?
plot(…, cex.lab = #)
stringr’s function to filter string matches?
str_subset()
Make tidy with tidyr:
tablea
country ‘99’ ‘00’ ‘01’
A x, y, z
B …
C
tablea %>%
gather(‘99’:’01’, key = ‘year’, value = ‘measure’)
with stringr, return the 1st match in each sentence?
sentences %>%
_______(“(a|the) ([^ ]+)”)
str_match
in base R, calculate the mean of variables in DAT at each level of FACTOR?
by(DAT, FACTOR, FUN = mean)
in R, add “label” on the right side of an existing plot in the outer margin?
mtext(“label”, 4, outer = TRUE)
In R, create a function that returns hello or goodbye based on user’s choice?
myFunc
In ggplot, how do you zoom in to the range 0-50 on the y-axis?
… +
coord_cartesian(ylim = c(0, 50))
In R, plot x as an overlay to an existing plot in the top right corner?
par(fig = c(0.5, 1, 0.5, 1), new = TRUE)
plot(x)
In R, how do you review this layout?
nf
layout.show(nf)
In R, how can you define a number of regions within the current device that can be treated as separated graphics devices?
split.screen()
Why manually call regex() in stringr functions?
To set arguments, which include: ignore_case multiline comments dotall
In R, create box plot from Y and FACTOR with notches?
plot(FACTOR, Y, notches = T)
In R, tx data from R to Excel?
1) write.table(data, “clipboard”, sep = “\t”, co.names = NA)
2) paste in Excel
In R, set orientation of #s on tick marks to always horizontal?
par(las = 1)
In R, remove white space before and after string?
trimws(string)
In R, sum of each row of matrix X?
rowSums(x)
In R, get names in a factor variable?
levels(factor)
In R, get # of names in factor variable?
nlevels(factor)
In R, reorder names in a factor variable?
factor(factor, levels = c(‘name1’, ‘name2’))
In R, turn factor names into integers?
as.vector(unclass(factor))
In R, set # of digits to 5 for any output?
options(“digits” = 5)
xyplot(root~week | plant):
add a line for a regression?
xyplot(root~week | plant,
-> panel.abline(lm(root~week))
In R, generate a q-q plot?
qqnorm()
In R, given events A and B, and sample space S, calculate probability of at least A or B occuring?
length(union(A, B)) / length(S)
Base R, return X where X is NA?
x[is.na(x)]
in R, reverse order of vector Y?
rev(Y)
In R, log to base n of x?
log(x, n)
In R, return max at each point of vector x?
cummax(x)
In R, what is the square root of x?
sqrt(x)
In R, difftime() vs. as.difftime()?
difftime() calculates the # of days between dates and as.difftime() creates a time object out of times, not dates.
In R, caluclate 25th percentile of x?
quantile(x, 0.25)
In R, generate a list of 4 1s, 4 2s, and so on up to 10?
rep(1:10, each = 4)
In R, calculate probability of A & B occuring within sample space S?
length(intersect(A, B)) / length (S)
In R, what function is equivalent to IF() formula in excel?
ifelse()
In R, get 5 items from vector Keys that allowed to grab the same value repeatedly?
sample(Keys, 5, replace = T)
Return df’s column B?
df$B
In R, name m’s columns AA, BB, CC?
colnames(m) GETS c(“AA”, “BB”, “CC”)
In R, probability of only A, not B within sample space S?
length(setdiff(A, B)) / length(S)
In R, return a histogram of vector x and then add dashed density lines?
hist(x)
lines(density(x), lty = “dashed”)
In R, return the dimensions of vector v?
dim(v)
In R, return df with the columns medians removed?
sweep(dat, 2, apply(dat, 2, median))
In R, test for normality of dat & describe null hypothesis?
shapiro.test(dat)
Null hypothesis = normally distributed
In R, X is weights and Y is heights. Create a scatter plot with X and Y labels and filled in dots?
plot(X, Y, xlab = “weight”, ylab = “height”, pch = 16)
Calculate the sum of the rows in m, by group?
rowsum(m, group)
What is the square root of x ?
sqrt(x)
In R, what is the working directory?
getwd()
In R, how do you create a list of words out of a string?
strsplit(str, “ “)
In R, get product of all values in vector X?
prod(x)
In R: add function to remove NAs?
newDat
na.omit(dat)
In R, x
x[which(abs(x-50) == min(abs(x - 50))]
In R, return just the means of Dat[,c(1, 2, 3)] by variables var5 and var6?
aggregate(Dat[,c(1,2, 3)], by=list(var5, var6), mean)
In R, how do you see data available in the loaded package “UsingR”?
data(package = “UsingR”)
In R, is it daylight savings right now?
as.POSIXlt(Sys.time())$isdst
In R’s legend, how do you set the fill color for symbols?
pt.bg = …
In R, set tick marks to be on the inside by the default length?
tcl = 0.5
When using dplyr’s arrange(), where do missing values end up?
the end
dates
strptime(dates, “%d%b%y”)
strptime(dates, “\%d\%b\%y”)
With dplyr, return all columns from flights except year through day?
select(flights, -(year:day))
With dplyr, return columns from flights with “ijk” in the name?
select(flights, contains(“ijk”))
dply command to reorder the rows?
arrange()
With dplyr, assign new names to specific columns while returning all columns?
rename()
With dplyr, put flights in descending order by distance?
arrange(flights, desc(distance))
With dplyr, put flights data in order by year, month and then day?
arrange(flights, year, month, day)
In R, find chi-square value for alpha, where x follows chi-square dist with 12 degrees of freedom?
qchisq(0.05, 12, lower.tail=F)
I think lower.tail = F is default…
In R, command to find what package is qplot is?
find(“qplot”)
In R, what function is useful for mathematical notations inside the plots functions?
expression()
In R’s lattice package, create a scatter plot for weight vs age given gender?
xyplot(weight ~ age | gender)
In ggplot, what grammar does size, shape, color and x/y locations relate to?
aesthetics… aes()
In ggplot, 2 ways to facet?
facet_wrap()
facet_grid()
What argument to jitter dots in geom_point?
position = “jitter”
ggplot(data = mpg, mapping = aes(x = displ, y = hwy) +
-> geom_point()
vs.
ggplot(data = mpg) +
-> geom_point(mapping = aes(x = displ, y = hwy))
Same graph, top uses global mapping and bottom uses local mapping
… + facet_grid(drv ~ .)
facets plot by drv along a column (up and down)
ggplot(data = diamonds) +
-> geom_bar(aes(x=cut, fill=clarity)
1) stacked bar chart?
2) 100% stacked bar chart?
3) Grouped bar chart?
position = …
1) no argument
2) “fill”
3) “dodge”
To plot hwy ~ displ from mpg:
ggplot(data = mpg) +
-> geom_point(? 1 = ? 2(x=displ, y = hwy)
? 1 mapping
? 2 aes
In R’s plot(), argument for no tick marks and no #?
xaxt = “n”, yaxt = “n”
In R, return items exclusive to A as compared to B?
setdiff(A, B)
In R, create a sample of 1000 families with 3 children and probability of 0, 1, 2, 3 boys as equal to 1/8, 3/8, 3/8, 1/8?
sample(0:3, size = 1000, prob = c(1/8, 3/8, 1/8, 3/8), replace = T)
In R, transpose matrix m?
t(m)
In R, show all possible scatterplot dot types?
plot(0:25, pch = 0:25
In R, 4 useful function for standard normal distribution?
pnorm(): cum probability
dnorm(): probability density
qnorm(): quantile function
rnorm(): random #s from distribution
In R, return Test’s attributes?
attributes(Test)
In R, how do you time a function?
system.time(functionName())
In R, create a QQ plot with a diagonal line for dat?
qqnorm(dat)
qqline(dat)
In R, check if file “fname.txt” exists in working directory?
file.exists(“fname.txt”)
A
lapply(list.object, length)
In R, with data.frame(DF), typeof(DF)?
“list”
In R’s data.frame(), suppress factor creation?
stringsAsFactors = F
In R, mean of each row of matrix X?
rowMeans(X)
In R, what are attributes?
metadata about objects
In R’s data frame df, what does length(df) return?
the same as ncol(df)
In R, set x’s “cust_attr” attribute?
attr(x, “cust_attr”)
In R, what function is useful for running random #s through a formula?
replicate()
remake this card…
In R, generate a plot a plot on a 3-d plane using vectors x, y, z?
persp(x, y, z)
What 3 attributes stay with modified objects?
Names - names()
Dimensions - dim()
Class - class()
In R, describe bootstrap test for testing a mean?
1) Create vector of means based on samples from true data -> x.bar
2) p.val testVal)]) / length(x.bar)
In R, given vector A, B, and function F that takes 2 arguments, create an array of dimensions (A, B) that is the result of function(A,B) for each cell?
outer(A, B, F)
In R, how do I print vector Y without printing missing values?
Y[!is.na(Y)]
not_cancelled %>%
-> count(dest)
A table showing count of not cancelled by dest
What 2 R packages are useful for larger, interactive heatmaps?
1) d3heatmap
2) heatmaply
In an R plot, set x & y labels color and font?
col. lab =
font. lab
In R, make a scatter.plot matrix of the data in obj?
pairs(obj)
dplyr’s measures of position:
1) x[1]
2) x[2]
3) x[length(x)]
1) first(x)
2) nth(x, 2)
3) last(x)
Geom for a tile plot?
geom_tile()
In R, calculate the mean of X without NAs?
mean(x, na.rm=T)
Rather than filtering out messy data, another–perhaps better–route?
Make the values missing
In R, find F value for alpha = 0.05 in the lower tail, where x follows f-dist and df1 = 5, df2 = 15?
qf(0.05, 5, 15)
In base R, x is a vector of age data. Create a histogram with an x-label, a title, and bins of size 20. Then add lines to the histogram?
hist(x, xlab=”Age” main = “title”, breaks = 20)
lines(density(x))
In RStudio, start a new script?
ctrl - shift - n
ggplot histogram?
geom_histogram()
Convert data frame to tibble?
as_tibble()
ggplot frequency polygon?
geom_freqpoly()
not_cancelled %>%
-> count(tailnum, wt=distance)
A table showing miles flown by each tailnum among not_cancelled
With dplyr, return dat’s columns var1, var2, var3?
select(dat, num_range(“var”, 1:3))
A function useful for 5<=x<=10?
between(x, 5, 10)
In base R, change the color of the axes?
par(fg = )
In R, return items in both A and B?
intersect(A, B)
ggplot’s bar graph?
geom_bar()
readr’s parsing functions when the data are already read into R?
parse_*()
delays ?
- > filter(n > 25) ?
- > ggplot(aes(x=n, y=delay)) ?
- > -> geom_point(alpha = 1/10)
1) %>%
2) %>%
3) +
Create tibble from individual vectors?
tibble()
In readr, read in txt.csv and identify the comment lines as starting with #?
read_csv(“txt.csv”, comment = “#”)
in readr functions, don’t read first 5 lines?
skip = 5
In R, return 50th percentile of x?
median(x)
What ggplot function is critical for horizontal bar chart?
coord_flip()
In dplyr, how do you remove grouping?
ungroup()
In RStudio, send previously sent chunk from editor?
ctrl - shift - p
What is geom_bin2d() and geom_hex()?
Divides coordinate plane into 2d bins and uses fill color to show density
ggplot(smaller, aes(carat, price)) +
-> geom_boxplot(aes(group = ???(carat, 0.1)))
cut_width
tidyverse package for querying databases?
DBI
tidyverse package to read in SPSS, Stata, or SAS files?
haven
Using geom_point(), add transparency?
alpha = ..
readr’s identify encoding?
guess_encoding()
What are the main differences between data.frame and tibble?
1) printing
2) subsetting
print(flights, 1? = 10, 2? = 3?)
1=argument for number of rows
2=argument for number of columns
3=argument for all columns
1) n
2) width
3) Inf
For a density plot (frequency polygon) in ggplot:
ggplot(data = diamons, aes(x=price, y=1?)) +
-> 2?(aes(color=cat), binwidth=500)
1) ..density..
2) geom_freqpoly
In R, return the mean of each column in matrx x?
colMeans(x)
In R, create a boxplot of age data in x?
boxplot(x)
In R, return the sum of columns in matrix m without colSums()?
apply(m, 2, sum)
In R, test if x is TRUE?
isTRUE(x)
In dplyr, return flights where month equals the last 6 months of the year?
filter(flights, month %in% 7:12)
dplyr command to pick variables by name?
select()
dplyr command to operate on data group-by-group?
group_by()
R shortcut for
alt + -
dplyr command to pick observations by value?
filter()
In R, view list of functions and data in package “spatial”?
library(help = spatial)
With dplyr, return flights where month equals 1 and day equals 1?
filter(flights, month == 1, day ==1)
dply command to create new variables with functions of existing variables?
mutate()
What happens to NA values using dplyr’s filter()?
Filter excludes NA & False
With dplyr, return year, month, and day from flights?
select(flights, year, month, day)
With dplyr, return flight’s column time_hour and then all other columns?
select(flights, time_hour, everything())
In R, what argument is used for removing borders?
bty = ‘n’
border type…
In R, what are 2 ways to show overlapping dots in a scatterplot?
1) jitter(x) or jitter(y) or both
2) sunflowerplot(x, y)
In R, what is the default graphics window size?
7 inches by 7 inches
In R, what does which(requests %in% stock) return?
The index of items in requests that match an item from stock
In R, set the line thickness?
lwd = #
Line width…
In R, what does this do?
peas[1:length(peas) %% 2 ==0]
Returns objects in peas at even rows
In R, pmin(x, y, z)?
Returns the minimum of x, y, or z across each item in x, y, and z
In R, what is the difference between unique() and duplicated()
unique returns just the unique items while duplicated returns a boolean vector identifying duplicates