Research 2 Exam 1 Study Guide Flashcards
What key lessons did you learn from the statistics readiness quiz?
Stats isn’t super complex math, you only need to be able to do PEMDAS
Read Instructions
What are the levels of measurement? Give your OWN examples of each.
Nominal - categorical, not numerical, scores represent a category. Do you like pizza? yes or no
Ordinal - ordering/ranking things Rank these cereals: Fruity Pebbles, Rice Krispies, Chex Mix
Interval - no absolute zero, numerical answers/responses mean the actual amount of variable, differences between units are the same across number line. Likert scale. How much do you like cats? 1,2,3,4,5,6,7
Ratio - there is an absolute zero, same difference between units, numerical answers reflect actual amount of variable. How many naps did you take today?
Distinguish between a categorical and continuous variables. Give your OWN example of each that measures the same topic.
categorical means nominal numbers represent categories. continuous variables (interval and ratio) the value is numerical can be any value a fraction, decimal.
Do you like to eat sushi? On a scale of 1(not at all)-7(a lot) How much do you like sushi?
Which type of variable (categorical or continuous) is typically better to use? why?
Continuous is better because you get more specific data and it gives you more info/context.
What’s your favorite football team? Is that a nominal or a categorical variable?
My favorite football team is the Giants. That is a nominal and a categorical variable because they mean the same thing.
Our class has a PAL. What’s her name?
Julia Macey
Be able to generate survey question items based on key parameters (e.g., categorical variable with 4 levels, etc.)
What is your favorite brand? Nike Adidas Puma Nautica
How do you take a screenshot?
windows + shift + s
In your own words, what’s the difference between heuristics and algorithms? Give your own example of how you could use each to solve the same problem.
Heuristics are mental shortcuts representative and availability
Algorithms follows steps, systematic, slow, intentional
Putting together Ikea furniture. shortcut heuristic way would be to just look at the picture and try to figure it out get it together quickly. The Algorithm way would be to read and follow all the directions.
What are illusory correlations? How does research and stats help address them?
Correlations we make between two things that are not actually connected. Research and states help address this by finding out that the correlation we assumed was actually not true.
How would stats help you in your future career?
Stats will help me in my future career by making me more competitive for my skillset and they may pay me more. The skillset is important because of the medical files I will have to explain to the hospitalized child. There may be graphs and numerical results that I should be able to interpret. (for example, if given a graph, I can read it and make sure it isn’t a misleading graph.)
A group of 50 people act odd on the day of the full moon. The moon obviously causes them to do this right? Why/why not?
No it does not obviously cause them to do this because the number 50 has no context. I would need to know a lot more information such as 50 out of how many and how many people act odd on days other than full moons. The context is important because the answer could be 50/50 or it could be 50/1,256, or even 50/2,374,805. Additionally, if on new moons the number of people acting odd is 75/75, 345/879, or 7,809/5,683,583, it would again change the perception and significance of the number. I would also want a percentage and a variability. It would be better if I knew that 1% with a SD = 1.45 was the number of people acting on a full moon day and 1% with a SD = 1.51 on new moon days.
From, the video, she mentioned 3 questions to ask that help spot a bad statistic. Describe each in your own words.
Can you see uncertainty? Polls are not reliable they cannot accurately predict who will win. Charts are misleading. Averages are misleading. Can I see myself in the data? You need context. How much pee is a lot? change the scale., zoom out, did you know the male unemployment rate is higher than female? How was the data collected? methodologies differ such as how they operationalize a definition. that affects replicability. who is completing the survey is it the right representative group of people or did you let anyone answer even non-Muslims about jihad? Did you ask biased people in the company how much they like it here?
NON-complicated like a grandma.
Give your own example of how everything is counting/stats are all around.
When I wake up I check the time it is 7:20 a.m. Then I estimate how much longer I can stay in bed for, about 5 minutes. When the time reaches 7:25, I get up and go to the bathroom. Then I get dressed and I choose between two different outfits. Finally, I go to the kitchen and grab a 100% apple juice bottle and a bagel. The time is now 8:15 and I have to get to class so I get my headphones and the message tells me they the battery is about 70%. I select my playlist of 100 songs and walk across the grass to class. It takes me about 4 minutes which is equivalent to 2 songs, so I arrive at 8:19. Now I sit and wait for the rest of the class and the professor to arrive. By 8:25 the majority of the students are here. By 8:29, the professor walks in and sets up his PowerPoint. I finish up my last song, shut of my headphones and listen to the lecture.
Distinguish between descriptive and inferential statistics.
Descriptive statistics are a branch of statistics used to summarize the basic characteristics of a sample dataset. Inferential statistics are a branch of statistics used to make conclusions about a population based on the data from a sample. Descriptive statistics tell us what is happening but not why, we cannot make inferences about a population either. Inferential statistics have more context so they can explain why something is happening and is generalizable to the population.
Distinguish between Variables, vs. Values, Vs. Scores using your own example.
A variable is anything that can change How much do you like popcorn? (1-7) a value is any possible outcomes, in this case the possible values are 1, 2, 3, 4, 5, 6, 7. A score is a participant’s individual result on a variable. Participant #1 answered 3.
Be able to identify independent and dependent variables in a research question (Are people who like statistics more awesome than those who don’t?) AND know if they are categorical/nominal or continuous.
Are people who like statistics more awesome than those who don’t? More/less = continuous
DV = awesomeness = continuous(it’ll always be for our class) = Ratio? Could you have 0 awesomeness? IV = stats lovers = continuous (more/less is a continuous rating system/scale) = ratio sad but true a person could hate stats and rate it zero
When you see any kind of numbers reported, what do we need to know, or what questions do you have before making sense of these numbers?
From the TikTok example, I would want to first understand the question and what it is asking. What is the operationalized definition of followers? Are they physical real life people following another person? Because 10 would be a whole lot. Is it followers on youtube? Then it would be small. Additionally, I would like to know information such as out of how many, a measure of central tendency and a measure of standard deviation. If the average youtuber’s followers is 25, then 10 is pretty good. If the average is 100, then your number is small. Even smaller if 1,000. Then if we know that the average person has 10/100 followers or 10% with a standard deviation of 57, then that would also change how we think of the number, to being average.
What’s better, percentages or frequency counts? Why?
Percentages are better than frequency counts because percentages give us more context. A frequency count only gives us a number, we would want to know more like how much that is out of before placing value on what the number could mean. Like 50, means nothing. On the other hand, 50% would be pretty significant. We are given much more context. 50% isn’t perfect because it could be out of 2 people, out of 10 people, out of 800 people, etc. It definitely is better though.
a) Distinguish between unimodal and bimodal distributions. (i.e., what do these look like?) b) Distinguish between positive and negative skew.
Unimodal distribution is a distribution that has one “peak”, representing the most frequently occurring value. This would look like a normal distribution (bell curve).
Bimodal distribution is a distribution that has two “peaks,” indicating the two most frequently occurring scores. In a histogram, you would see two of the highest peaks.
A positive skew has values on the left side of a distribution are more frequent than values on the right.
A negative skew has values on the right side of a distribution are more frequent than values on the left.
What does it mean to cherry pick data?
Pull what you want, choose what supports your data not what goes against it. 5% got 100% show that but hide that 60& got under 75%. Only focus on what you want.
What are the common issues with misleading graphs?
Failure to use equal intervals & Exaggeration of proportions - They do not start at zero, the size matters meaning numbers have to add up to proportions/percents, they use the wrong graph
Find a bad graph, indicate what the problem(s) are. (On the exam I could also ask you to create a bad graph and label the problems or give you a bad graph and ask you to fix it)
Mar 2003, Jun 2004, August 2005
50,000, 60,000 70,000 (axis does not start at 0)
50%+13%+23%+24%
What’s your favorite season? fall 25% winter 25% spring 25% summer 25% (should be a bar graph not a pie chart)
A bar chart depicts what type of data? A histogram depicts what type of data?
A bar chart depicts categorical data. “A chart for displaying frequencies of nominal data. A histogram depicts continuous data. “A visual representation of data for a single variable that uses bars to chart values on the x-axis and shows frequencies on the y-axis.
Why do we create a codebook? b) On what page in the Guide to Writing can you read about how to do a codebook? c) on what page can you find an example? d) why am I asking these last two questions? Do we need to label levels for categorical or continuous? why? f) Why do we make some variables string in SPSS?
We create a codebook to organize and clearly label the questions. As a group it keeps us uniform in naming. It is used later for reference almost like a dictionary to know what each question or label means. When necessary, you can see the whole question not just selfesteem01, self esteem02. b) you can find creating a codebook on hyperlink 6 = page 7. c) you can find an example on the folowing page, 8. d) you are asking these questions so that we know where to find information on codebooks in case we need them. This proves we know how to use the guide. We need to label levels or categorical data because we need to know what each category means. Fo=r example when we need to make words into numbers for SPSS, yes could be 1 and no could be 2, that needs to be labeled. continuous data uses the number given therefore it does not need to be labeled. We need to make some variables string in SPSS to allow us to type in the words, which we would want to do because we do not yet know what we are looking for or what to expect from the results. It is helpful to keep all data together instead of leaving it out and we cannot simply label it 1 or 2.
In your own words, what does the mean do? What makes it theoretical?
The mean finds the statistical average. What makes it theoretical is the possibility of outliers that pull the mean towards it.
What is the formula for a mean? (In general, make sure you know what symbols stand for and what the formulas are doing) Which piece provides the context?
The formula for mean is M =( Σ(X) )/N In words, you would add up all scores and then you would divide by the number of scores.
The piece the provides the context is the N(number of scores) that Sigma(X) is divided by.(the denominator)
How are outliers a problem when using the mean?
Outliers are a problem because they influence the mean and pull it towards the outliers.
When are we more likely to use a mode, with categorical or continuous data?
We are more likely to use a mode with categorical data.
What’s better at handling outliers, medians or means? why?
Medians are better at handling outliers because the number does not matter as much as its order does. AS an example when calculating mean, you would have to add in the score which could be an additional 60 on its own, it significantly changes the mean. The median however, If the outlier is 1, well it goes to the left end of a numbered order least to most and is only moved to the left by one, so it is not moved much.
If you read that the average student has been to 3 football games. How can that number lie?
If the average number is 3, we need standard deviation. That would find if there are outliers which is one way the number can lie. Maybe 276 students have never been to a football game, but 1 girl has been to 129 games. The mean was just pulled because of the one outlier, which is why we would want the standard deviation to be able to notice those types of things.
A new movie gets 3 out of 5 stars. How could this be good? How could it be bad?
A new movie getting 3 starts (out of 5) could be good if 4578 people voted it 4 or 5 stars and 23 people gave it 1 star. It could be bad if 3782 voted it one or two stars and 14 gave it five stars. With the same logic, the mean is easily influenced by outliers, so although most people give it a high review, a couple outliers hated it and influenced the average. The same is true vice versa where most people said the movie was terrible but some outliers like low budget movies and thought it was hilarious and deserved five stars. It influences mean. What you would want to ask for would be median, so see what the central tendency would look like regardless of outliers, and a variability.
Give your own example of a “flaw of averages” or when a mean is misleading.
One example of a “flaw of averages” or when a mean is misleading is rainfall in the U.S. The average rainfall may be 3 inches but some states like Maine actually get 12 inches of rain and Arizona gets 0 inches. Or snow but it all falls in Colorado not NJ
When reporting income levels, why is it common to use medians?
It is common to report income levels with medians because they are not as influenced by outliers like means are.
What measure of central tendency is most common for categorical data?
The measure of central tendency most common for categorical data is mode.
(Use what you know about means and the examples we did in class to answer this) You’re thinking of going to grad school, and the school says that the average student gets $20,000 dollars of financial aid a year. Based on what you know about averages, what questions should you have based on that figure?
What is the median, what is the variability, how many students were surveyed, were there any outliers.
The book section of Becoming a Better Consumer is a nice review of much of what we discussed in class.
Page 94!!mediandoes not equal mean, the flaw of averages, averages are theoretical (does NOT reflect reality) no one has 1.83 kids. What variability does and does not tell you, (indicate how much scores vary. we know the WHAT (NOT why) interpreting graphs and charts from research articles (do not only read the abstract, need to see tables and figures more info more context it is all important.
Fun with symbols X=
score
Fun with symbols N=
number of scores
Fun with Symbols M=
mean