test 1 Flashcards
== means
“are the 2 things equal to each other?” or “is equal to”
!= means
is not equal to
> = means
is greater than or equal to
<= means
is less than or equal to
> means
is greater than
< means
is less than
logical operators in R- what means true and what means false?
true = 1
false = 0
(what the computer tells you when you ask it)
what are the ways I can ask this question and what type of responses would I get?
are any of the x greater than or equal to 40 AND less than 60?
x >= 40 & x <60 - you would get a string of statements that say “true” or “false” for each element in the vector
sum ( x >= 40 & x < 60) - adding the sum command would count how many of them are true
which ( x >= 40 & x < 60) - tells you which element is true to the statement don’t use much but still useful
what does $ do?
- grabs out a column from what you defined
- can use it to add a new column too
how to add another column to a data frame?
df.name$column <- c(…)
if you have already defined “seedlings,” how can you extract the number of times the count “0” was observed?
sum(seedlings == 0)
sum gives the count, otherwise would just give the elements
if you defined seedlings, and now you want to see how many times each count occurred, what would you do?
df.seedlings <- data.frame (seedlings = c(0,1,2,3,4,5), freq = c(sum(seedlings==0), sum(seedlings ==1), sum(seedlings==2), sum(seedlings==3), sum(seedlings==4), sum(seedlings==5)))
first create data frame and then create 2 columns. one for the seedlings and one for the frequency. and then use the sum command to count how many seedlings were equal to 1, 2, 3, 4, and 5
you’ve defined x, now you want to get the 3rd and 4th elements of x, how?
all but the 3rd and 4th elements?
x[c(3,4)]
have to use both the square brackets and the vector ones
x[-c(3,4)]
put the negative outside of the c
defined x, how to get only even numbers out?
to get the elements in reverse?
x[seq(2, length(x), by=2)]
x[10:1]
if x is defined, how to get the sum of the first and last element if there are 10 elements? the product of second and ninth?
x[1] + x[10]
x[2] * x[9]
how to get square of each element in x?
x ^ 2
what does the sample command do and how to use it?
generates a random sample of numbers, used like this:
sample(1:100, 25, replace=FALSE)
1:100 is the range of which you want the numbers to be from
25 is the amount of numbers you want
and replace is always FALSE
what does set.seed() do?
makes the sample the same for sample1 and sample2
with the cdc file we used, how would you find how many of the subjects are male?
sum(gender==”m”)
list the 2 ways to find the median of the ages of nonsmokers vs smokers
many ways to find the median using the median command, here are 2:
- median(cdc[smoke100==1,]$age)
- smokers <- subset(cdc, smoke100=1)
& smokers_median <- median (smokers$age)
what does the ~ line mean in box plots?
used to specify relationships between variables in various functions
like:
boxplot(age ~ smoke100, data = cdc)
tells R to show the distribution of ages for smokers and nonsmokers separately, allowing you to compare the age distributions between the two groups visually
what is the comma used for in [smoke100==1,]?
used to separate row indices from column indices. In this case, we’re leaving the column part empty, meaning we’re selecting all columns
sapply()
simplify apply, applies the same function to each vector
how to solve this using sapply function:
Make a cumulative distribution of the ages of the males or females in the dataset by finding the number of subjects not older than each age in the sequence seq(from = 0,to = 100,by = 5). Use barplot() to make and attach a plot of your results.
df.males <- data.frame(Age = seq(0,100,by = 5), Cumulative_Frequency = sapply(seq(0,100,by = 5), function(X) sum(age[gender == “m”] <= X))) # males age cumulative distribution
what does adding a type =1 inside of quantile function do?
changes to empirical instead (kinda like rounding to the closest number)
how to solve this problem : What fraction of men have heights greater than the 90th percentile of height among women? What fraction of women have heights less than the 10th percentile of height among men?
quantile(height[gender == “f”], probs = 0.9, type = 1) # 90th percentile of height among females
sum(height[gender == “m”] > quantile(height[gender==”f”], probs = 0.9, type = 1)) / sum(gender == “m”) # fraction of males taller than the tallest females
and then you can go from there for the other one