W2: Intro to R and RStudio Flashcards
< OR %1%
Less than
<= OR %le%
less than or equal
%gl%
greater than AND less than
%gel%
greater than or equal AND less than
%gle%
greater than AND less than or equal
%gele%
greater than or equal AND less than or equal
%!in% OR %nin%
not in
.N
all sample size
d[UserID ! = 56 & NA <= 4]
Exclude ID 56, select observation with NA at or below 4
What are the 4 data types using class()?
Logical, integer, numeric, character
What is logical data?
True (1) or False (0)
What is integer type data?
Whole numbers (pos / neg) e.g -1,0,1,2
What is numeric type data?
Real numbers (whole, decimals, fractions)
What is character data?
Text data, including numbers stored as strings
What does this represent D [ i, j, by] ?
i = rows, j = columns, by = grouping variable
%Y - %m - %d
4 digit 2019 - 03 - 12
%d / %m / %y
12 / 03 / 19 (2 digit)
%Y - %b - %d
2019 - Mar - 12
What does using factor() need?
levels = c(1, 0, 2) and
labels = c(“dog”, “cat”, rabbit”)
Name the join and argument used for:
Data with only rows present in both x and y
Natural Join, all = FALSE
Name the join and argument used for:
Data with all rows in x and y
Full Outer Join, all = TRUE
Name the join and argument used for:
Data with all rows in x
Left Outer Join, all.x = TRUE
Name the join and argument used for:
Data with all rows in y
Right Outer Join, all.y = TRUE
Which join / merge will have most rows?
Full Outer Join
What do you need to check for before merging if grouping by ID?
If there are duplicates
What does anyDuplicated( ) do?
Returns the position of duplicated data or 0 if no duplicates
What does unique(x) %in% unique(y) do?
Checks how many IDs from dataset x is in dataset y
When is it necessary to reshape data to long format?
For RM / longitudinal / panel data
What arguments are required when using reshape() to long format?
IDs will have multiple rows.
varying = list( stress = c(“stress1”, stress2”)
v.names = “Stress”
timevar = “weeks”
times = c(0, 6, 24) becomes a variable
idvar = “ID”
direction = “long”
What arguments are required when using reshape() to wide format?
v.names = c (“stress”, “happy”),
time.var = weeks
idvar = “ID”
direction = “wide”
How do you merge multiple IDs?
by = c(“ID”, “Time”)
If data has more extreme large values (upper tail) than extreme small values (lower tail), what kind of skewness is this?
Positively skewed
If data has more extreme small values (lower tail) than extreme large values (upper tail), what kind of skewness is this?
Negatively skewed
If there is no skewness (normal distribution), what value will the skewness be?
skewness = near 0
Skewness of -.93 is positive/negative?
Negative
Skewness of .76 is positive/negative?
Positive
z-score is also known as ____ score
standard score
What is the z-score formula?
z = raw score - mean / SD
What are 3 measures of variability?
range, IQ range, SD
What is the default origin date and time in R?
1970-01-01
00:00:00
What is happening here:
as.numeric(d1[1] - d1[2])/365.25
comparison of 2 dates and converted to a number and to years
surveys2 <- data.table(
ID = c(1, 2, 2, 3),
Age = c(19, 18, 18, 20))
acti2 <- data.table(
ID = c(2, 2, 3, 4),
Sleep = c(8, 7, 6, 7))
How many rows would full outer join have?
7
## ID Age Sleep
## 1: 1 19 NA
## 2: 2 18 8
## 3: 2 18 7
## 4: 2 18 8
## 5: 2 18 7
## 6: 3 20 6
## 7: 4 NA 7