Functions: Descriptive Statistics Flashcards

1
Q

mean()

A

mean(pirates$age)

mean “age” of pirates in this dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

max()

A

max(pirates$height)

max “height” of pirates in this dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

table()

A

table(pirates$sex)

generate a frequency table of the sex of pirates

## female male other
## 464 490 46

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

aggregate()

A

# Calculate the mean age, separately for each sex

aggregate(x = age ~ sex,
data = pirates,
FUN = mean)

 sex age ## 1 female  30 ## 2   male  25 ## 3  other  27
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

median()

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

quantile()

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

length()

A

The length() function takes a vector as an argument, and returns a scalar representing the number of elements in the vector

a <- 1:10
length(a) # How many elements are in a?
##[1] 10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

additional numeric functions

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

unique()

A

vec <- c(1, 1, 1, 5, 1, 1, 10, 10, 10)
gender <- c(“M”, “M”, “F”, “F”, “F”, “M”, “F”, “M”, “F”)

unique(vec)
##[1] 1 5 10

unique(gender)
##[1] “M” “F”

this function doesn’t tell you how often each of these values occurs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

table()

A

The function table() does the same thing as unique(), but goes a step further in telling you how often each of the unique values occurs:

table(vec)
##vec
##1 5 10
##5 1 3

table(gender)
##gender
##F M
##5 4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

na.rm = TRUE function

A

a <- c(1, 5, NA, 2, 10)

mean(a, na.rm = TRUE)
##[1] 4.5

this syntax in the code is needed to tell R to process these values even though there is an NA in them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

z score formula

A

a <- c(5, 3, 7, 5, 5, 3, 4)

a.z <- (a - mean(a)) / sd(a) which is the formula for making z scores and standardizing scores

a.z
##[1] 0.31 -1.12 1.74 0.31 0.31 -1.12 -0.41

calculating the mean of z scores should result in 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

summary()

A

Pring descriptive statistics of the piercing data

summary(american.bp)
##Min. 1st Qu. Median Mean 3rd Qu. Max.
##1.0 3.0 4.0 3.7 4.8 6.0

summary(european.bp)
##Min. 1st Qu. Median Mean 3rd Qu. Max.
##3.0 4.2 5.5 5.3 6.0 7.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

independent samples t-test code

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

p value definition

A

Assuming that there the null hypothesis is true (i.e.; that there is no difference between the groups), what is the probability that we would have gotten a test statistic as far away from 0 as the one we actually got?

It’s a bullshit detector aimed at the null hypothsis. If the p value gets too small, the bullshit detector goes off

17
Q

Does the p-value tell us the probability that the null hypothesis is true?

A

No!!! The p-value does not tell you the probability that the null hypothesis is true. In other words, if you calculate a p-value of .04, this does not mean that the probability that the null hypothesis is true is 4%. Rather, it means that if the null hypothesis was true, the probability of obtaining the result you got is 4%. Now, this does indeed set off our bullshit detector, but again, it does not mean that the probability that the null hypothesis is true is 4%.

18
Q

htest

A

R stores hypothesis tests in special object classes called htest. htest objects contain all the major results from a hypothesis test, from the test statistic (e.g.; a t-statistic for a t-test, or a correlation coefficient for a correlation test), to the p-value, to a confidence interval.

different h tests necessitate data to be loaded into the function in different formats (vectors/dfs or tables)

19
Q

names()

A

returns all of the elements in the h.test object

20
Q

one sample t-test

A

you can pull data from a df or from separate vectors, it doesn’t have to come from a table() function

21
Q

t tests compared to each other in bar chart form

A

you can pull data from a df or from separate vectors, it doesn’t have to come from a table() function

22
Q

Using subset to select levels of an IV

A

use the %in% argument to specify which levels of an IV you want to test

23
Q

cor.test()

24
Q

two ways to run a correlation test

A

To run a correlation test between two variables x and y, use the cor.test() function. You can do this in one of two ways, if x and y are columns in a dataframe, use the formula notation (formula = ~ x + y). If x and y are separate vectors (not in a dataframe), use the vector notation (x, y):

you can pull data from a df or from separate vectors, it doesn’t have to come from a table() function

25
Q

example correlation test

26
Q

using subset() in the cor.test() function

A

Just like the t.test() function, we can use the subset argument in the cor.test() function to conduct a test on a subset of the entire dataframe. For example, to run the same correlation test between a pirate’s age and the number of parrot’s she’s owned, but only for female pirates, I can add the subset = sex == “female” argument:

27
Q

chisq.test()

A

used to determine whether there is a significant association between two categorical variables

you must create a table of data to feed a chisq.test function

this example has one nomial variable, and we are testing to see if the likelihood is equal that a pirate would attend either school.

28
Q

2 sample chisq.test()

A

If you want to see if the frequency of one nominal variable depends on a second nominal variable, you’d conduct a 2-sample chi-square test.

29
Q

apa-style conclusions using the apa() function

A

you can have R take raw h.test results and extract only the relevant data in APA style for you using this function