test 1 Flashcards

Question 1

Q

== means

Answer

A

“are the 2 things equal to each other?” or “is equal to”

Question 2

Q

!= means

Answer

A

is not equal to

Question 3

Q

> = means

Answer

A

is greater than or equal to

Question 4

Q

<= means

Answer

A

is less than or equal to

Question 5

Q

> means

Answer

A

is greater than

Question 6

Q

< means

Answer

A

is less than

Question 7

Q

logical operators in R- what means true and what means false?

Answer

A

true = 1
false = 0

(what the computer tells you when you ask it)

Question 8

Q

what are the ways I can ask this question and what type of responses would I get?

are any of the x greater than or equal to 40 AND less than 60?

Answer

A

x >= 40 & x <60 - you would get a string of statements that say “true” or “false” for each element in the vector

sum ( x >= 40 & x < 60) - adding the sum command would count how many of them are true

which ( x >= 40 & x < 60) - tells you which element is true to the statement don’t use much but still useful

Question 9

Q

what does $ do?

Answer

A

grabs out a column from what you defined
can use it to add a new column too

Question 10

Q

how to add another column to a data frame?

Answer

A

df.name$column <- c(…)

Question 11

Q

if you have already defined “seedlings,” how can you extract the number of times the count “0” was observed?

Answer

A

sum(seedlings == 0)

sum gives the count, otherwise would just give the elements

Question 12

Q

if you defined seedlings, and now you want to see how many times each count occurred, what would you do?

Answer

A

df.seedlings <- data.frame (seedlings = c(0,1,2,3,4,5), freq = c(sum(seedlings==0), sum(seedlings ==1), sum(seedlings==2), sum(seedlings==3), sum(seedlings==4), sum(seedlings==5)))

first create data frame and then create 2 columns. one for the seedlings and one for the frequency. and then use the sum command to count how many seedlings were equal to 1, 2, 3, 4, and 5

Question 13

Q

you’ve defined x, now you want to get the 3rd and 4th elements of x, how?

all but the 3rd and 4th elements?

Answer

A

x[c(3,4)]

have to use both the square brackets and the vector ones

x[-c(3,4)]

put the negative outside of the c

Question 14

Q

defined x, how to get only even numbers out?

to get the elements in reverse?

Answer

A

x[seq(2, length(x), by=2)]

x[10:1]

Question 15

Q

if x is defined, how to get the sum of the first and last element if there are 10 elements? the product of second and ninth?

Answer

A

x[1] + x[10]

x[2] * x[9]

Question 16

Q

how to get square of each element in x?

Question 17

Q

what does the sample command do and how to use it?

Answer

A

generates a random sample of numbers, used like this:

sample(1:100, 25, replace=FALSE)

1:100 is the range of which you want the numbers to be from

25 is the amount of numbers you want

and replace is always FALSE

Question 18

Q

what does set.seed() do?

Answer

A

makes the sample the same for sample1 and sample2

Question 19

Q

with the cdc file we used, how would you find how many of the subjects are male?

Answer

A

sum(gender==”m”)

Question 20

Q

list the 2 ways to find the median of the ages of nonsmokers vs smokers

Answer

A

many ways to find the median using the median command, here are 2:

median(cdc[smoke100==1,]$age)
smokers <- subset(cdc, smoke100=1)
& smokers_median <- median (smokers$age)

Question 21

Q

what does the ~ line mean in box plots?

Answer

A

used to specify relationships between variables in various functions

like:
boxplot(age ~ smoke100, data = cdc)

tells R to show the distribution of ages for smokers and nonsmokers separately, allowing you to compare the age distributions between the two groups visually

Question 22

Q

what is the comma used for in [smoke100==1,]?

Answer

A

used to separate row indices from column indices. In this case, we’re leaving the column part empty, meaning we’re selecting all columns

Question 23

Q

sapply()

Answer

A

simplify apply, applies the same function to each vector

Question 24

Q

how to solve this using sapply function:

Make a cumulative distribution of the ages of the males or females in the dataset by finding the number of subjects not older than each age in the sequence seq(from = 0,to = 100,by = 5). Use barplot() to make and attach a plot of your results.

Answer

A

df.males <- data.frame(Age = seq(0,100,by = 5), Cumulative_Frequency = sapply(seq(0,100,by = 5), function(X) sum(age[gender == “m”] <= X))) # males age cumulative distribution

Question 25

Q

what does adding a type =1 inside of quantile function do?

Answer

A

changes to empirical instead (kinda like rounding to the closest number)

Question 26

Q

how to solve this problem : What fraction of men have heights greater than the 90th percentile of height among women? What fraction of women have heights less than the 10th percentile of height among men?

Answer

A

quantile(height[gender == “f”], probs = 0.9, type = 1) # 90th percentile of height among females

sum(height[gender == “m”] > quantile(height[gender==”f”], probs = 0.9, type = 1)) / sum(gender == “m”) # fraction of males taller than the tallest females

and then you can go from there for the other one

Question 27

Q