Quiz 9 Flashcards
When does it make sense to create a histogram for your R data?
- you have only one variable AND
- the variable has “continuous/numeric” data AND
- you want to check if data are normally distributed
What would you type in R to make a histogram of a particular variable (ex. column “Mean_RT”) (mean reaction time)
ggplot(ldt_df, aes(x=Mean_RT))+
geom_histogram(bins=13, color= ‘black’, fill= ‘light blue’)
What is the function to save a ggplot?
ggsave(‘my_histogram.png’,h=5, w=7, units=‘in’)
What does the ‘aes’ function do?
Allows you to assign what will go on X and Y axes
ex. aes(x=Mean_RT)
How many variables do histograms need and how do you assign the variable/s to an axis?
- only one (numeric) variable
ex. aes(x=“Mean_RT”) - this assigns the “Mean reaction time” variable to the x axis
What do you need to check in your data when you are assigning a variable to an axis when making a histogram?
That it is numeric
Besides assigning the variable using the ‘aes’ function in R, what is needed to make a histogram?
Need to tell R we want a histogram plot
What are the different plot options we are using in ggplot?
- histogram
- box plot
- scatter plot
- bar plot
How do you specify what kind of plot you want in R?
use the ‘geom’ function
ex. geom_histogram() or geom_boxplot()
What are “bins” in R (ggplot)?
bars in histogram
How can you tell R to make the bins of your histogram different colors and fills?
ex. geom_histogram(bins=13, color=‘black’, fill=‘lightblue’)
(this also sets number of bins to be shown)
When is it better to use mean?
When the distribution is normal
When would you choose to use a box plot?
You have 2 variables and one variable numeric (“continuous”) and one categorical
- categorical variable has more than one group/distribution
- interested in seeing whether groups/distributions are different from each other
In boxplots, which variable goes on which axis?
x-variable is categorical
y-variable is continuous
How do we approach the data when doing descriptive statistics?
with the intention of summarizing the characteristics of the data
ex. calculate mean of a continuous variable (like avg duration of English unstressed vowels)
count how many times something occurs in the dataset (ex. frequency of a word in a corpus/book)
How do we approach the data when doing inferential statistics?
with some predictions in mind, running tests to verify the predictions
ex. noticed that the mean duration of stressed vowels was different from that of the unstressed vowel in our sample
*so, we make a prediction (“infer”) that the 2 means are significantly different from each other
What is done during inferential statistics to verify whether predictions are valid?
run a significance test
What is an example of a null hypothesis?
ex. there is NO significant difference between stressed and unstressed vowels in terms of duration
(assumes no relationship between 2 variables and that controlling one variable has no effect on the other)