R1 Flashcards
assign the variable “h” the value 2
h
r workspace
place where variables and information is stored in R.
list all variables in workspace
ls()
remove variable named “a”
rm(a)
clear workspace
rm( list = ls() )
code to multiply 3 and 5
3 * 5
code to calculate 2 to the power of 5
2 ^ 5
code to calculate 28 modulo 6
28 %% 6
WRITE R CODE TO:
Combine the variables MY_APPLES and MY_ORANGES into a new variable MY_FRUIT, which is the total amount of fruits in your fruit basket
MY_FRUIT
what is the result of ls() on an empty work space?
character(0)
what is the r variable for pi?
pi
remove variables ‘p’ and ‘q’ from the workspace
rm(p, q)
what are R’s fundamental data types called?
atomic vectors
4 ways to determine object’s type?
typeof() -type of an R object
class() - object oriented programming in R answer
mode()
3 booleans in R
TRUE or T, FALSE or F, NA
class( 2L) returns?
integer
class(2) returns
numeric
is.numeric(2)
TRUE
is.integer(2)
FALSE
class(“string of stuff”)
character
6 Basic Atomic Data Types in R:
logical, integer, double, complex, character, raw
define a vector in R
a vector is an INDEXED SET of values that are all of the same type
in R, data elements are ____, not scalar
vectors
what is the rule for types in vectors?
can only contain one type, can’t mix types
what is the process where a variable’s type is changed?
coercion
coerce logical TRUE to numeric
as.numeric(TRUE)
result of as.numeric(TRUE)
1
result of as.numeric(FALSE)
0
coerce 4 to a character
as.character(4)
can “hello” be coerced into a numeric?
No, as.numeric(“Hello”) returns NA
Can the character string: “4.5” be coerced into a numeric?
Yes. as.numeric(“4.5”) returns 4.5
as.integer(“4.5”) returns 4
which two of the following variables are logical values? TRUE; “hello”; 2L, NA
TRUE, NA
what data type is 4.5
numeric
what data type is 4L?
integer
what is the result of 5 + “five”?
ERROR. non-numeric argument to binary operator
check that 3 is a numeric value and return as boolean
is.numeric(3)
Convert the value in var1 to character and store in variable “var1_car”
var1_car
Convert var2 to a logical: var2_log
var2_log
inspect the class of var2_log
class(var2_log)
coerce var3 to a numeric: var3_num
var3_num
what function is used to create a vector?
c() the c function
name the vector CARDS using the vector SUITS
names(CARDS)
create a vector containing 3 ages and assign each value to a persons name
people
my_apples
is.vector(my_apples)
TRUE
length(my_apples)
1
How are computations on vectors performed?
element-wise
earnings
[1] 20 60 -50
earnings
[1] 50, 200, 90
earnings
[1] 50 50 0
earnings expenses
[1] TRUE TRUE FALSE
# Casino winnings poker
x roulette_vector
stuff
stuff[ 1]
people
people [‘sarah’]
what does ‘recycling’ mean in R?
if one vector is applied to a different length vector, R is smart enough to repeats the contents of the shorter vector until it has the same length
earnings
earnings[‘Monday’]
or
earnings[1]
earnings
earnings[c(2, 4)]
earnings
earnings[2:4]
earnings
earnings[c(‘Monday’, ‘Wednesday’)]
earnings
earnings > 0
earnings
earnings[ c( profitable )]
profit
2
print the number of profitable days
earnings 0
sum( profitable)
print the sum of profitable days
earnings 0
sum( earnings[c( profitable )] )
When using the minus operator for subsetting a named vector, you can subset by:
index
earnings
earnings[-1]
earnings
earnings[-c(1,3)]
e
Monday Wednesday Friday Sunday
5 -1 2 2
What does this do:
assign(“x”, c(10.4, 5.6, 3.1, 6.4, 21.7))
x
Does this cause an error?
c(10.4, 5.6, 3.1, 6.4, 21.7) -> x
No, the arrow assign operator goes both directions
c(10.4, 5.6, 3.1, 6.4, 21.7) -> x
create a vector y that contains all x elements, a zero, and then all x elements again
y
create a matrix containing values one to six in two rows
matrix(1:6, nrow= 2)
matrix(1:6, nrow= 2)
matrix(1:6, nrow= 2, byrow = TRUE)
What is the difference between these two matrices?
the first fills values down the column and the second fills across the row, left to right
> matrix(1:6, nrow= 2)
[,1] [,2] [,3]
[1,] 1 3 5
How does R fill up this 3x2 matrix with 3 values?
matrix(1:3, nrow= 2, ncol = 3)
It recylces, looping through 1-3 twice
> matrix(1:3, nrow= 2, ncol = 3)
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 2 1 3
What is the output of:
matrix(1:3, nrow= 2, ncol = 2)
> matrix(1:3, nrow= 2, ncol = 2) [,1] [,2] [1,] 1 3 [2,] 2 1 Warning message: ...not a sub-multiple or multiple of the number of rows [2]
What is the output of:
r bind
rbind(1:3, 1:3)
> rbind(1:3, 1:3)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 3
What is the output of column bind:
cbind(1:3, 1:3)
> cbind(1:3, 1:3) [,1] [,2] [1,] 1 1 [2,] 2 2 [3,] 3 3
> m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
add row with 7, 8, and 9 to m
m m [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9
> m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
add a column with 10 and 11
m
> m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
gives the rows names
rownames(m)
> m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
Give names to the columns in m
colnames(m) m
col1 col2 col3
[1,] 1 2 3
[2,] 4 5 6
what function can be used to name both rows and columns at the same time?
dimnames
create a matrix for numbers 1 to 6,
with 2 rows
and during creation, name the columns and rows
m m
col1 col2 col3
row1 1 3 5
row2 2 4 6
What happens if a matrix of numbers and a matrix of characters are bound together using rbind or cbind?
coercion
numbers to characters
What data structure(s) in R can contain different types of elements?
dataframe
list
# Star Wars box office in millions (!) box
star_wars_matrix
# Star Wars box office in millions (!) new_hope
> star_wars_matrix star_wars_matrix [,1] [,2] new_hope 460.998 314.4 empire_strikes 290.475 247.9 return_jedi 309.306 165.8
# Star Wars box office in millions (!) new_hope
rownames(star_wars_matrix)
Configure these elements in the correct order to produce pseudocode for using dimnames: = ( , ) ) dimnames list row col
dimnames = list ( row, col ) )
studentID
g
In a matrix of student grades across multiple tests, where each student is a row.
What function will give the students total points?
rowSums()
sums across rows
stems ‘, ‘>’, ‘>’)
leftHeads and one with arrows pointing left
rightArrows
For a matrix with columns of exams scores, a row for each student.
What function would easily calculate the total points scored for each test?
colSums ( )
* note capital letter
What happens when a data sequence that is too short is used to fill up a matrix in R?
R will fill up the matrix column by column and repeat the data sequence.
What are 3 functions that can be used to make a matrix?
rbind ( )
cbind ( )
matrix ( )
What are the TWO advantages of using the function cbind() and rbind() over the function matrix() when creating matrices?
You don’t need to pass it an input vector explicitly that is then converted to a matrix.
You don’t have to explicitly state the way in which the matrix has to be filled.
produce a matrix with three rows containing 12 random numeric elements between 1 and 15
m
how do you arrange rows and columns to subset matrix m?
m[ ?? , ??]
m [row, column]
select all elements from matrix m in row three
m[ 3, ]
what data type is returned from:
matrix [ ,3]
what does it contain?
vector with all elements from column 3
what does
matrix [ 4 ] return?
the fourth element in the matrix counting from upper left down each column
> m [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 1 2 3 4 [4,] 5 6 7 8 m[2, c(2,3)] What is returned
vector of 6 and 7
> m o1 e1 o2 e2 [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 1 2 3 4 [4,] 5 6 7 8 Can you subset the upper left element?
m[1, ‘o1’ ]
> matrix US_revenue non_us new hope 460.998 314.4 empire strikes 290.475 247.9 returnJedi 309.306 165.8
return the average non_us
mean (matrix [ ,2] )
> matrix US_revenue non_us new hope 460.998 314.4 empire strikes 290.475 247.9 returnJedi 309.306 165.8
Subset all data from “A New Hope” and “Return of the Jedi”
m[ c(1, 3) , ]
What does this output
matrix[c(FALSE, TRUE, TRUE), c(TRUE, TRUE)]
the last two rows for both columns
What does this return for a 3 row by 2 col matrix?
matrix[ c(FALSE, TRUE, TRUE), ]
The last two rows for both columns
Function to take the sum of each column and store it in a vector
colSums()
function to take the sum of each row and store it in a vector
rowSums()
> m [,1] [,2] a 2 2 b 2 2 > m * 2
What is the output
> m * 2
[,1] [,2]
a 4 4
b 4 4
> m [,1] [,2] a 2 2 b 2 2 > m-1 What is the output
> m-1
[,1] [,2]
a 1 1
b 1 1
> m [,1] [,2] a 2 2 b 3 3 > mm [,1] [,2] [1,] 1 1 [2,] 1 1
what is m + mm
> m + mm
[,1] [,2]
a 3 3
b 4 4
element wise addition
convert
blood
blood_factor
What does factor () function do:
- scans for categories
- store sorts levels alphabetically
- converts the character vector, to a vector of integer values. Integer values map to displayed character values.
what happens if the str function is called on a factor variable?
str ( factor_variable)
shows the number of levels, character displays, and mapped integer values
Rs default order for factor variables is:
alphabetical
blood
blood_factor2
tshirt
tshirt_factor22 creates a factor variable with 3 levels, it will have the correct 2
tshirt_factor23 specifies the labels for the 3 factors IN THE WRONG ORDER, therefore it will WRONGLY show 2 large shirts
tshirt
No, must added ordered = TRUE
tshirt_factor
tshirt_factor
TRUE
What is the output of c( “RecordName”, 100, 5)
[1] “RecordName” “190” “5”
R performed coercion to create vector with a single datatype
Save “song” with the name ‘Song’, 100 with the name ‘hundred’, and 5 with the name ‘rand’
to a single data structure without coercion. name
list(Song = “song”, hundred =100, rand =5)
how do you add names to elements in a list?
names ( listName)
Display structure of a list
str ( listName)
What data type is returned:
list [ 1 ]
subsetting a list using single brackets returns a list
what data type is returned:
x
> y [1] "x" > typeof( y) [1] "character" >
x
> x [[ 1]]
[1] “x”
character x
x
> x [[ ‘var1’]]
[1] “x”
character x
x
Error in x[[c(“var1”, “var2”)]] : subscript out of bounds
because double brackets means - return single element from a list
x
> x[ c(‘var1’, ‘var2’)]
$var1
[1] “x”
$var2
[1] 2
> str(x2) List of 2 $ var1: chr "var1" $ var4:List of 1 ..$ var3: chr "var3"
select var3
x2 [[ 2 ]]
> str(x2) List of 2 $ var1: num 1 $ var4:List of 2 ..$ var3: num 3 ..$ var4: num 4 select var3
x2[[ 2 ]] [[ 1 ]]
> str(x2) List of 2 $ var1: num 1 $ var5:List of 2 ..$ var3: num 3 ..$ var4: num 4 subset var5 to a list
x2[[ 2 ]] or x2[ 2 ] or x2$var5
rule of thumb for difference between single and double square brackets for lists?
[ ]
[[ ]]
double brackets [[ to select element
single brackets [ for sublist
> str(shining_list) List of 3 $ title : chr ... $ actors : chr [1:5] ... $ reviews: Ord.factor...
return actors
shining_list$actors
> str(shining_list) List of 3 $ title : chr ... $ actors : chr [1:5] ... $ reviews: Ord.factor...
List containing title and reviews
shining_list[c(‘title’,’reviews’)]
> str(shining_list) List of 3 $ title : chr ... $ actors : chr [1:5] ... $ reviews: Ord.factor... select the last actor
shining_list[[ ‘actors ‘]] [5]
* note use of double and single brackets for chaining selections
x1
> x3
x2 l
[1] “1” “2” “string”
make l a list rather than a vector to avoid coercion
l
What function prints the first observations of a dataset?
head ()
What function prints the last observations of a dataset?
tail ()
What function prints the dimensions of a dataset?
dim()
What function shows the structure of a dataset or list?
str ()
Encode type as a factor: type_factor
type
type_factor
planets_df
planets_df
What is a dataframe “under the hood”?
a list containing same-length vectors
The dataframe planets_df exists and you want to rename its two columns to ‘name’ and ‘distance’
names(planets_df)
change a column in a dataframe to a factor
df$colName
change a column in a dataframe to character
df$colName
create a sequence of 10 letters
LETTERS[seq( from = 1, to = 10 )]
create a vector of TRUE and FALSE randomly
bool1
c1
identical(c1, c2)
c1
false
Check that c2 is an accurate recode of c1
c1
> table(c1, c2) c2 c1 2 dog 2 1 0 dog 0 1
student1
be sure that name is set to a factor
all variables must be named
student1
> class
name age gpa
1 sarah 26 4
2 jill 21 1
order class dataframe by gpa, lowest first
ranks
> class
name age gpa
1 sarah 26 1
2 jill 21 4
order the dataset by gpa, highest first
ranks
What part of a dataframe does this return:
my_df[1,2]
first row, second column
select rows 1,2, and three and
columns 2,3, and 4 from my_df
# rows 1, 2 and 3 # columns 2, 3 and 4 my_df[ 1:3 , 2:4 ]
select the first row from my_df
# Entire first row my_df[1, ]
> df students grades control 1 E 79 TRUE 2 R 96 FALSE 3 Z 75 TRUE
subset control dataframe
# boolean vector controlT
> df students grades control 1 E 79 TRUE 2 R 96 FALSE 3 Z 75 TRUE
subset grades over 75
subset(df, subset = grades>75)
OR
logical 75
df[logical, ]
my_df[[“new_column”]]
adds column named “new column” to my_df containing values my_vec
use cbind to add a column to a dataframe
my_df
a
order(a)
OR
rank(a)
a
a[order(a)]
What function is described:
visualizes the distribution of your data by placing all values in bins and displaying the bin frequencies
hist ()
create a histogram of variable x with 10 columns
hist(x, breaks = 10)
df: $ rating : num $ votes : int $ runtime: int $ genre # Create a boxplot of the runtime variable
boxplot( df$runtime )
df: $ rating : num $ votes : int $ runtime: int $ genre # Subset rating, votes and runtime and plot all 3
toPlot
Create a pie chart from movies$genre
pietable
view 10
the result is:
> view > 10
[1] TRUE FALSE FALSE
views checks
> views > checks
[1] TRUE TRUE TRUE
How does this expression evaluate?
“Rchitect” == “rchitect”
FALSE
R is case sensitive
How does this expression evaluate?
TRUE == 1
TRUE
becauseTRUE coerces to 1 under the hood
How does this expression evaluate?
“dog”
FALSE
R determines the greater than relationship based on alphabetical order.
How does this expression evaluate?
“raining”
TRUE
> linkedin > 15
[1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE
views
what happens with the double &&
> c(TRUE, TRUE, FALSE) && c(TRUE, FALSE, FALSE)
[1] TRUE
only evalutes the first element
14
x
!!FALSE
evaluates to:
FALSE
Count the number of TRUES:
> extremes
[1] TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
> sum(extremes)
[1] 3
if condition) {
expr1
}
insert an else that evaluates “expr2”
if condition) { expr1 } else { expr2 } *else keyword comes on the same line as the closing bracket of the if part!
speed
print ( paste( “Your speed is “, speed)
speed 30) { # stop the while loop when speed exceeds 80 }
speed 30) { if (speed > 80){ break ## break } } }
# The linkedin vector has already been defined for you linkedin
# Loop version 2 for (i in 1:length(linkedin)){ print ( paste(linkedin[i], i)) }
# The linkedin vector has already been defined for you linkedin
# Loop version 1 for (l in linkedin){ print(l) }
primes_vec
primes_vec[4] # equals 7
primes_list[[4]]
for (i in 1: length(nyc)){
print (is.list(nyc [i] )) }
for (i in 1: length(nyc)){
print (is.list(nyc [[i]] )) }
if applied to a list, what do these print out?
single brackets prints true for each element, it returns a list
[double brackets prints false for each elements, it returns a vector
write a nested for loop to print out:
“On row i and column j, matrix contains x”
# define the double for loop for (i in 1 :nrow (matrix)) { for (j in 1 :ncol (matrix)) { print( paste( "On row ", i, " and column ", j, " matrix contains ", matrix [i,j] ) ) } }
what word exits a loop
break
what word skips the remainder of the code in the loop, but continues the iteration.
next
If a vector value is
for (e in matrix) {
if (e
what pseudocode would count all the uses of R or r up until the letter ‘u’?
quote
set rcount
ask if ‘u’ and ‘u’ are the same
ask if ‘u’ and ‘u’ are not the same
‘u’ == ‘u’
‘u’ != ‘u’
!’u’ == ‘u’
Print “this is” and the value of x
print ( paste ( “this is “, x) )
the documentation for sd () is "sd (x, na.rm = FALSE)" what happens if we give it a vector writing sd (x = vector, na.rm = FALSE) OR sd (vectors)
sd (x = vector, na.rm = FALSE)
explicitly assigns each vector value to x as r evaluates element wise
sd(vectors)
the function knows the first parameter is x, then it evaluates element wise
values
NA
because na.rm is FALSE so sd did not remove the missing values.
rather sd( values, na.rm = TRUE)
values
sd (values, TRUE)
or
sd (values, na.rm = TRUE)
WRONG - sd(values), gives NA
get arguments for a function
args ( )
get help documentation for a function
? function
?? function
help (function)
What is the … in r method definitions?
e.g. mean (x, …)
what is it’s purpose?
the ellipsis
a way for R to pass arguments to or from other methods without the function having to name them explicitly.
mean(x, trim = 0, na.rm = FALSE, …)
in the above definitions, time and na.rm are __________ aguments becaue they have default values
optional
Calculate the mean of the element-wise sum of the vector linkedin
mean ( linkedin )
- note mean ( sum (linkedin) )
would take the mean of one value
Calculate the mean of the element-wise sum of linkedin and facebook
mean ( (linkedin + facebook) )
remove missing values from a vector
vector
triple
Yes, r automatically returns the last value
triple
the character “dog”, the last value referenced
triple
y, the last value referenced
triple
triple
Is the return() in a function similar to:
- break in a for loop
- next in a for loop
- break in a for loop
the function stops evaluating, and ignores the rest of the fxn, and returns
Create a function pow_two(): it takes one argument and returns that number squared (that number times itself).
pow_two
create a function sum_abs(), that takes two arguments and returns the sum of the absolute values of both arguments.
# Create a function sum_abs() sum_abs
what can be done to make a function return nothing?
return ( NULL )
Develop a new function, my_filter(), that takes a single argument and that simply returns the function’s input if it’s positive. If it’s negative, have my_filter() return NULL.
my_filter
print(paste(“h”, “i”)
prints:
[1] “hi”
or
[1] “h i”
> print(paste(“h”, “i”))
[1] “h i”
change sep to nothing
> print(paste(“h”, “i”, sep = “”))
[1] “hi”
triple
5,
the value of a was not changed,
a
what is the implication of r passing variables “by value” to functions?
If R were to pass variables “by reference”, changes in the function would change value of variable.
However, R passes “by value”, so the R objects you pass to a function can never change unless you do an explicit assigment.
sample
> sample(v)
[1] 1 2 3
what returns?
sample 2){
return (x)}}
sample(c( 1,3,3))
sample(c( 3,1,1))
> sample(c( 1,3,3)) Warning... > sample(c( 3,1,1)) [1] 3 1 1 Warning .. #Only the first element is checked by if
sumOver10 10){
sum
load package ggivs
library( “ggvis” )
list packages loaded
search()
What are two way to load packages?
How does each respond when called to load package that is not installed?
library ()
error message
require()
gives warning message
returns FALSE
how do I get a package not on my computer?
how do I import a package to the session?
install.packages ( )
library() or require() loaded packages are attached so search list and available in current session
library(ggvis)
library(“ggvis”)
Are these two correct?
yes
foo
> library(foo,character.only=TRUE)
library(foo,character)
Error in library(foo, character) : there is no package called ‘foo’
words
v
words
words
correctly arrange:
lapply
data
function
… ( additional args e.g. parameters)
lapply (data, function, …)
how can you cange a list to a vector?
unlist ()
pioneers
strsplit(pioneers,”:” )
> str(split_math) List of 4 $ : chr [1:2] "GAUSS" "1777" $ : chr [1:2] "BAYES" "1702" # Convert to lowercase strings
split_low
> str(split_low) List of 4 $ : chr [1:2] "gauss" "1777" $ : chr [1:2] "bayes" "1702" create a function and use lapply to create a list of only the names
select_first
Functions in R are o_ _ _ _ _ _ _
This means that they aren’t automatically bound to a name.
objects
# Named function triple
# Anonymous function with same implementation function(x) { 3*x }
# Anonymous function function(x) { 3*x }
use lapply to perform above function on list(1,2,3)
lapply(list(1,2,3), function(x) { 3*x })
select_el
lapply(S, select_el, index = 1)
lapply() always returns a:
list
A pre-defined function returns all NULLs when used in lapply but works fine when called WITHOUT assignment.
What could be the cause?
May use invisible() behind the scenes, which returns an invisible copy of the return value, NULL in this case.
sapply is short for
simplify apply
under the hood, sapply () calls ___________() and then uses ________ to ______ to conver the list output to an array.
lapply()
simplify2array()
sapply’s USE.NAMES parameter is set to:
TRUE
FALSE
TRUE
Calculates the average of the min and max of a vector: extremes_avg
can use *apply
extremes
Create a function thattakes a vector and returns all values below zero
below_zero
sapply and lapply will give the same output when…
it returns vectors of different lengths
Will sapply() simplify a list of NULL’s?
No,
because the ‘vector-version’ of a list of NULL’s would simply be a NULL, which is no longer a vector with the same length as the input.
sapply(list(runif (10), runif (10)), function(x) c(min = min(x), mean = mean(x), max = max(x)))
This code generates a matrix with __ rows and ___ columns.
3 rows and 2 columns.
How does the length of the input list for lapply relate to the length of the output list?
same length
What is the danger of sapply?
simplifies output to array or returns the same list as lapply
dangerous because output type depends on specifics of input
what makes vapply different than sapply or lapply?
must specify output data format
List of 2
$ : num [1:5] 3 7 9 6 -1
$ : num [1:5] 6 9 12 13 5
basics
vapply ( temp, basics, numeric(3))
basics
basics
What’s an easy way to fix the FUN.VALUE arg input in vapply if it is wrong?
Read the error message:
>
Error: values must be length 3,
but FUN(X[[1]]) result is length 4
Why is vapply() considered a more robust version of sapply(),
because you explicitly restrict the output of the function you want to apply
strsplit(string, “”)
returns what TYPE of output?
list
unlist (output)
Sort the vectors inside a list alphabetically
hint: use *apply and a function
abcSort
round these numbers. what is the output:
n
round(n)
1 7 5 3
create a vector that has the even numbers 2 through 8
seq(2,8, by = 2)
print(“hello”)
use an builtin r functio to print hello twice
rep (print(“hello”), times = 2)
c
sort(c)
v
> rep(v, times= 2)
[1] 2 1 4 2 1 4
rep(v, each = 2)
[1] 2 2 1 1 4 4
v
append(v, 4)
v
append(v, 4)
s
v
x
sum( abs ( round (x) ) )
What function:
Generate sequences, by specifying the from, to and by arguments.
seq()
Replicate elements of vectors and lists.
rep()
Arrange a vector in ascending order. Works on numerics, but also on character strings and logicals.
sort()
Reverse the elements in a data structures for which reversal is defined.
rev()
Display the structure of any R object.
str()
Merge vectors or lists.
append()
Convert an R object from one class to another.
as.*()
Flatten (possibly embedded) lists to produce a vector.
unlist()
fix this sort function so it returns the highest results first:
sort (vec )
sort(vec, decreasing = TRUE)
List of 3
$ : num 1.1
$ : num 3
$ : num 5
sum this list named “x”
sum( unlist( x) )
Create a sequence that ranges from 1 to 500 in increments of 3. Assign the resulting vector to a variable seq1.
seq1
find indicies that are true from a logical vector
which ( )
what is the difference between
sub() and gsub ()
The g stands for global, as in replace globally (all)
sub will only replace the first instance of a match in a string
list
list [index ]
what arguments are necessary to search a string vector for ‘s’ with grep or grepl
grep ( x = vectorName, pattern = ‘s’)
ADD MORE regex
xx
assign today’s date to the variable “today”
today
What is the syntax difference below:
Sys.Date()
Sys.time()
Date has a capital and time is lower case
make a date object out of:
“2016-01-21”
as.Date(“2016-01-21” )
What does this do:
my_date
the Date object is default to year - month - date so will throw an error.
This explicitly assigns each so the alternate order can be used.
What is an easy method to find the difference between two dates objects?
subtract them!
what function changes data.frame into a data table?
what package must be loaded?
tbl_df()
dplyr
What is the class of a data table?
it is a data frame class so it has all the functionality of a data frame AND more
What function in dplyr is s a little like str applied to a data frame
but it tries to show you as much data as possible
glimpse ( )
print the possible values of a factor variable, all unique values present in that variable
unique( v )
OR
just print our the variable and it will show the levels
Instead of data types R has…
data objects
How do you get the name of the current working directory in R?
getwd()
How R is used in logistic regression?
Logistic regression deals with measuring the probability of a binary response variable. In R the function glm() is used to create the logistic regression.
How do you access the element in the 2nd column and 4th row of a matrix named M?
M[4,2]
What is recycling of elements in a vector?
When two vectors of different length are involved in a operation then the elements of the shorter vector are reused to complete the operation.
Can we update and delete any of the elements in a list?
We can update any of the element but we can delete only the element at the end of the list.
Give the general expression to create a matrix in R.
The general expression to create a matrix in R is - matrix(data, nrow, ncol, byrow, dimnames)
What is the output of runif(4)?
It generates 4 random numbers between 0 and 1.
What is expected from running the command - strsplit(x,”e”)?
It splits the strings in vector x into substrings at the position of letter e.
Give a R script to extract all the unique words in uppercase from the string - “The quick brown fox jumps over the lazy dog”.
x
> str(x)
List of 1
$ : int [1:4] 5 6 7 8
return the first element of x
> str(x[1]) List of 1 $ : int [1:4] 5 6 7 8 > str(x[[1]]) int [1:4] 5 6 7 8 > str(x[[1]][1]) int 5
X is the vector c(5,9.2,3,8.51,NA), What is the output of mean(x)?
NA
How do you convert the data in a JSON file to a data frame?
Using the function as.data.frame()
Give a function in R that replaces all missing values of a vector x with the sum of elements of that vector?
function(x) { x[is.na(x)]
Is an array a matrix or a matrix an array?
Every matrix can be called an array but not the reverse. Matrix is always two dimensional but array can be of any dimension.
How to find the help page on missing values?
?NA
How do you get the standard deviation for a vector x?
sd(x, na.rm=TRUE)
How do you set the path for current working directory in R?
setwd(“Path”)
What does col.max(x) do?
Find the column has the maximum value for each row.
How do you remove a vector from the R workspace?
rm(x)
List the data sets available in package “MASS”
data(package = “MASS”)
What is the use of the command - install.packages(file.choose(), repos=NULL)?
It is used to install a r package from local directory by browsing and selecting the file.
Give the command to check if the element 15 is present in vector x.
15 %in% x
What is the difference between subset() function and sample() function in R?
The subset() functions is used to select variables and observations. The sample() function is used to choose a random sample of size n from a dataset.
check if all values in a matrix are not equal to NA
missing
What is the use of “next” statement in R?
The “next” statement in R programming language is useful when we want to skip the current iteration of a loop without terminating it.
> sample 2){
+ return(x) } }
v
MUST VECTORIZE IF sample 0, return ("neg"), return(1)) } OR lapply(v, sample)
tV
it is unchanged so 1,2 because the newly appended vector was not saved to the tV variable
tV
append(tV, 3)
tV[3]
To make R treat these values as nominal variables instead of numbers, you should use what function? T
v2
What function can be used on a vector to change numeric values or ordinal values?
factor( v , order = TRUE, levels = c(“Low”,”Medium”,”High”))
Quasi-experimental design means
independent variables that cannot be randomly assigned - e.g. sex, medical condition
call the method describe from the psych package on data.frame df
psych:::describe(df)
sapply(mydata, mean)
what should be added to the above call for a real data set?
na.rm=TRUE
phrase to remember skew direction:
“the skew is where there’s ____”
“the skew is where there’s few”
Ways distributions can be not normal:
bi-modal (two peaks - two groups sampled)
pos. skew
neg. skew
platykurtic (flat)
leptokurtic (spike at mean)
function to create z-scores
scale()
scale(x, center = TRUE, scale = TRUE)
does what?
par(mfrow = c(1,2))
plot two side by side
the peak or highest point of a histogram is the value for what measurement of central tendency?
Mode
a distribution with extreme scores is best described using what measure of central tendency?
Median - less biased by extreme scores than mean
what measure of central tendency can be used to describe nominal data?
mode - most frequently ocuring
command to find packages attached, data sets, and much much more..
sessionInfo()
create a vector x that contains a regular sequence of length 100 between -4 and 4.
x
Create a subset for the dataframe x for when column “level” is 1 (experimental )
new
add a column that is twice the value of another column in the dataframe
transform(df, newCol = oldCol * 2)
for(i in 1:nrow(df)){
if (df$column[i] == x) {
df$column2[i]
df$column2[df$column1 == x]
how to import a a single function from a library
import::from(libraryName, functionName)
Open data so that it can be inspected in another tab
View(data)
- capital V
see the names for all variables
labels (data )
From a large dataset, grab only the variables/columns ‘a’ and ‘b’
cols