Quiz 9 Flashcards

1
Q

When does it make sense to create a histogram for your R data?

A
  • you have only one variable AND
  • the variable has “continuous/numeric” data AND
  • you want to check if data are normally distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What would you type in R to make a histogram of a particular variable (ex. column “Mean_RT”) (mean reaction time)

A

ggplot(ldt_df, aes(x=Mean_RT))+
geom_histogram(bins=13, color= ‘black’, fill= ‘light blue’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the function to save a ggplot?

A

ggsave(‘my_histogram.png’,h=5, w=7, units=‘in’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the ‘aes’ function do?

A

Allows you to assign what will go on X and Y axes
ex. aes(x=Mean_RT)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How many variables do histograms need and how do you assign the variable/s to an axis?

A
  • only one (numeric) variable
    ex. aes(x=“Mean_RT”)
  • this assigns the “Mean reaction time” variable to the x axis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do you need to check in your data when you are assigning a variable to an axis when making a histogram?

A

That it is numeric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Besides assigning the variable using the ‘aes’ function in R, what is needed to make a histogram?

A

Need to tell R we want a histogram plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the different plot options we are using in ggplot?

A
  • histogram
  • box plot
  • scatter plot
  • bar plot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you specify what kind of plot you want in R?

A

use the ‘geom’ function
ex. geom_histogram() or geom_boxplot()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are “bins” in R (ggplot)?

A

bars in histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can you tell R to make the bins of your histogram different colors and fills?

A

ex. geom_histogram(bins=13, color=‘black’, fill=‘lightblue’)
(this also sets number of bins to be shown)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When is it better to use mean?

A

When the distribution is normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When would you choose to use a box plot?

A

You have 2 variables and one variable numeric (“continuous”) and one categorical
- categorical variable has more than one group/distribution
- interested in seeing whether groups/distributions are different from each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In boxplots, which variable goes on which axis?

A

x-variable is categorical
y-variable is continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we approach the data when doing descriptive statistics?

A

with the intention of summarizing the characteristics of the data
ex. calculate mean of a continuous variable (like avg duration of English unstressed vowels)
count how many times something occurs in the dataset (ex. frequency of a word in a corpus/book)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do we approach the data when doing inferential statistics?

A

with some predictions in mind, running tests to verify the predictions
ex. noticed that the mean duration of stressed vowels was different from that of the unstressed vowel in our sample
*so, we make a prediction (“infer”) that the 2 means are significantly different from each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is done during inferential statistics to verify whether predictions are valid?

A

run a significance test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an example of a null hypothesis?

A

ex. there is NO significant difference between stressed and unstressed vowels in terms of duration
(assumes no relationship between 2 variables and that controlling one variable has no effect on the other)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is done in a study that has a null hypothesis?

A

Make every effort to prove Null hypothesis wrong

20
Q

What is the abbreviation for a Null hypothesis?

A

H0

21
Q

What is an example of an Alternative Hypothesis and how is it abbreviated?

A

“There is a significant difference between stressed and unstressed vowels in terms of duration”
Alternative Hypothesis (H1)

22
Q

Why is a null hypothesis a good way to begin a study?

A

Because it is not starting with a bias (ex. starting with something you want to prove and setting about trying to prove it)

23
Q

What is the question we want to ask if we have data that our null hypothesis might not be true?

A

How thin do we want that probably to be so we disbelieve our H0?
ie. what is the probability of us having the data we have and also having our null hypothesis be true?
ex. if hypothesis is there is no difference between mean salaries of carpenters and electricians, if there is data that shows a large difference in their salaries, the probability would be very low.

24
Q

What is the standard threshold (=alpha level) in social science and related fields when it comes to what probability we want to reject a null hypothesis?

A

0.05 (aka “p-value”, 5%)
always between 0 and 1

25
Q

What happens if you get a p-value less than 0.05? What is this called?

A
  • decide that it is very unlikely to see such a difference (if there was no actual difference between two means)
  • thus, only possible alternative is that our beginning assumption was wrong; therefore, the 2 means must be different from each other
  • “accepting the H1 by means of rejecting the H0”
26
Q

What is it called when you don’t disprove the H0?

A

“fail to reject the H0”, can’t accept the H1

27
Q

If a study says something is “significant”, what measurement are they using?

A

less than 0.05 p value

28
Q

What does p-value stand for?

A

probability value
must be BELOW 0.05…can’t be 0.05 to be statistically significant (to disprove null hypothesis)

29
Q

Can you PROVE a null hypothesis?

A

No, you can only reject a null hypothesis or fail to reject it
ex. p-value is 0.1 (0.05 or more), fails to reject null hypothesis

30
Q

What are the steps in performing a hypothesis test?

A

(see slide 17 for flow chart)
1. look at the data type (continuous/categorical); compute summary stats (mean, median, counts, etc)
2. choose the test based on:
- how many DV’s and IV’s you have
- types of your DV’s and IV’s
3. check whether your data meets the assumption for a parametric test
- if it does, run the parametric test
- if it doesn’t, run the non-parametric test
4. See if you can reject the H0
- if yes, can accept your H1 to conclude that you see a significant effect/difference, etc
- if not, can conclude there was no significant effect/difference

31
Q

What are parametric statistics (vs. non-parametric)

A
  • parametric (when data normally distributed) ex. mean—more powerful, reliable than non-parametric, results more robust and can be reproduced
  • non-parametric (can work when data not normally distributed) ex. median—can still be useful if you can’t use parametric because your data isn’t normally distributed
32
Q

Memorize flow chart slide 17 (significance testing slides)

A

Need to know for exam

33
Q

What is the purpose of descriptive statistics?

A

to make a summary of the data we have at hand
to explore existing data for patterns

34
Q

What are the types of data involved in descriptive statistics?

A
  1. ratio— A x B, A/B
  2. interval— A + B, A-B
  3. ordinal— A>B or A<B
  4. nominal—A doesn’t equal B
35
Q

What is nominal data? Examples?

A

When values are simply names or labels—least precise and informative level of measurement
ex. speaker of a language can be:
- native or non-native
- male or female
ex. languages can be
- English vs. French

36
Q

When making a boxplot, which kinds of variables go on which axis?

A

x-axis is categorical variable
y-axis is continuous/numeric

37
Q

What would you type in R to create a subset of your data, for example only keeping the “short” and “long” values from the Length_type column?

A

subset_df<-filter(ldt_df,Length_type==‘short’ | Length_type==‘long’)

38
Q

What does this character | mean in R and what is it called?

A

A pipe
allows us to ask R to keep the rows in which the both the variable values that are listed are present

39
Q

When would you do a scatterplot?

A
  • you have 2 variables
  • both variables have “continuous” data
  • you are interested in checking if there’s a relationship between 2 variables (linear/non-linear, positive/negative, etc.)
40
Q

When making a scatterplot, which variable goes on which axis?

A

x axis- the “predictor” variable
y axis- the response/dependent variable

41
Q

What do you type in R to create a scatterplot?

A

geom_point()

42
Q

What is a paired t-test and when is it used?

A
  • compares means of 2 related groups to check for significant difference between them
  • considered paired because each observation in one group is directly related to an observation in the other group (ie. both groups tested under 2 conditions/pre-test-post-test)
  • assumes normal distribution
43
Q

What is an unpaired t-test?

A
  • compares means of 2 independent groups to determine if there is a significant difference between them
  • groups not related or matched
  • assumes normal distribution
44
Q

What is a paired U-test?

A
  • alternative to paired t-test when distribution not normal
  • non-parametric (uses median values)
  • used for pre-test-post-test experiments
    H0 ex. Median difference between paired samples is zero (no difference)
45
Q

What is an unpaired U-test and when would you use it?

A
  • non-parametric, used when data not normally distributed or small sample size
  • compares 2 independent groups to evaluate difference between distribution of a dependent variable (ex. scores, measurements)
  • variable of interest is either ordinal or interval/ratio that doesn’t meet parametric assumptions
    ex. Group A—teaching method 1, Group B—teaching method 2, look at difference between scores after teaching
46
Q

What is the ANOVA and when is it used?

A
  • Analysis of Variance
  • compare means of 3+ groups, otherwise like t-test
  • looks at variability inside groups and differences between groups
  • ex HO: all 3 groups means are equal
  • ex. 3 fertilizers for plants—plants growth measured, tests to see if significant difference between them