Chapter 2 - Frequency Distributions Flashcards
What are raw scores?
The data that is gathered from participants. All the numbers that have not been organized or graphed or cleaned up.
WHY not use raw data?
* Finding a pattern in raw data is difficult
* We want to visualize and summarize the data
* Need to also inspect for outliers and for data entry errors.
What are the steps to create a frequency distribution table and a grouped frequency table?
- Frequency Distribution Table = a visual depiction of data that shows how often each value occurred (how many scores were at a certain value – how many students got exactly 7 hrs sleep? 5 hours of sleep?) SEE PIC BELOW for how it’s done.
-
Grouped Frequency Table: (Groups the data) 2 reasons -
1. when data has a large range of potential values (like IQ going from 70 - 149 ) see table on next card
2. When the data has decimal points (is continuous)
Principles to keep in mind for a Grouped Table:
a) you need to determine the full range of data and include the points that have zero frequency (Top Value - Bottom Value: 8 - 3.5 (then + 1) = 5.5)
b) aim for between approx. 5-10 intervals (no less than 5, no more than 15)
c) for continuous data, use lower and upper limits (the lowest and highest possible values)
Frequency Distribution Table
GROUPED FREQUENCY DATA
GROUPED FREQUENCY TABLE (the data initially)
GROUPED FREQUENCY TABLE - for continuous data
HISTOGRAM - for continuous data
What is PIE CHART?
When you want to show proportions of the whole picture.
What is a BAR GRAPH?
2nd way
Visual depictions of data when the independent variable is nominal and the dependent variable is interval (specifically, scale) :
TWO WAYS:
- Present frequency or proportion Data. EX: graph showing the % of girls and boys getting over 9 hours of sleep per night.
- Present mean or average values EX: the previous graph shows the mean score of the two variables, neutral and emotional. The black stick bars on top are ‘standard error bars’.
EX: develop a chart demonstrating the cost of tuition (dep. variable) for 3 types of schools - public, semi-public, & private (indep. variable)
1st way
What is a SCATTERPLOT?
Used to depict the relationship between 2 scale variables
ex: amount of abdominal fat & dementia symptoms
What is a HISTOGRAM?
Histogam bar graph
A histogram is a bar graph of data that shows the frequency of each value of a variable. Same info as a frequency table, but visualised differently.
What is the Biased Scale Lie?
What is the Sneaky Sample Lie?
What is an Interpolation Lie?
What is an Extrapolation Lie?
What is an Inaccurate Value Lie?
- When the choices are biased towards an outcome, such as when a scale has ‘Not Satisfactory, Good, Excellent, Truly Superior’…… and there’s no negative ratings on there! Another example is ‘Rate Toronto as 1st, 2nd, 3rd. or 4th’ and then the person reports ‘Toronto is in the top 4 cities in Canada!’. It is set up to have a biased outcome.
- sometimes there is a dichotomy amoung the data because either people had very good experiences or very bad experiences (Travel Advisor, Rate my Professor, Yelp). People self-select to participate and it’s not randomized sampling!:)
- When a line is drawn between data points that have been selectively placed on the graph
- When a line is drawn outside of the data points and the graph assumes the model line will go down, up or across.
- Uses scaling to distort the graph data. Looking at the pic below, the Tim Hortons and the Starbux uses different scales so the whole thing is hard to read at a glance! (Should start at 0 and label the scales)
All of these need to have representative sampling.
#5
What is a normal distribution?
is a graph showing the typical bell curve in the middle – meaning most of the participants scores were in the middle of the graph.
How do positively skewed distributions and negatively distributions deviate from a normal distribution?
Instead of being a ‘normal’ graph with the bell graph in the middle, there is a tail to one side. It is non-normal and non-symmetrical.
POSITIVE — generally has ‘floor’ effects
NEGATIVE — generally has ‘ceiling’ effects
What is the benefit of creating a visual distribution of data rather than simply looking at a list of the data?
to look at the shape of the distribution
What is a floor effect and how does it affect a distribution?
A situation in which a constraint prevents a variable from taking values below a certain point. Pushes the distribution to the LEFT side of the graph (positive skew)
CALCULATING STATS:
What is 63 out of 1264 in %
What is 2 out of 88 in %
What is 7 out of 39 in %
What is 122 out of 300 in %
What type of variable (nominal, ordinal, scale) are these data as counts?
What kind of variable are they as percentages?
Report these to only 2 decimal places?
1888.999
2.6454
0.0833
On a test of marital satisfaction, scores could range from 0 to 27:
1. What is the full range of data, according to the calculation procedure described in this chapter?
2. What would the interval sie be if we wanted six intervals?
3. List the 6 intervals
If you have data that range from 2 - 68 and you want seven intervals in a grouped frequency table, what would the intervals be?
A grouped frequency table has the following intervals:
30-44
45-59
60-74
If converted into a histogram, what would the midpoints be?
Referring to the grouped frequency table (2.6), how many countries had at least 30 volcanoes?
Referring to the histogram (2.1), how many countries had one or two volcanoes?
If the average person convicted of murder killed only 1 person, serial killers would create what kind of skew?
Would the data for number of murders by those convicted of the crime be an example of a floor effect or a ceiling effect?
A researcher collects data on the ages of university students. As you have probably observed, the distributions of age clusters around 19 - 22 yrs, but there are extremees on both the low end (high school prodigies) and the high end (non-traditional students returning to school):
- What type of skew might you expect for such data?
- Do the skewed data represent a floor effect or a ceiling effect?
If you have an instagram account, you are allowed to follow up to 7500 other accounts. At that point, Instagram cuts you off, and you have to unfollow ppl to add more. Imagine you collected data from Instagram users at your university about the number of accounts each one follows:
- What type of skew might you expect for such data?
- Do the skewed data represent a floor effect or ceiling effect?
APPLYING THE CONCEPTS:
Frequency tables, histograms, and the National Survey of Student Engagement: The National Survey of Student Engagement (NSSE) surveys U.S. first-year university students and seniors about their level of engagement in campus and classroom activities that enhance learning. Hundreds of thousands of students at almost 1000 schools have completed surveys since 1999, when the NSSE was first administered. Among the many questions, students are asked how often they have been assigned a paper of 20 pages or more during the academic year. For a sample of 19 institutions classified as national universities that made their data publicly available through the U.S. News & World Report Web site, here are the percentages of students who said they were assigned between 5 and 10 twenty-page papers:
0 5 3 3 1 10 2
2 3 1 2 4 2 1
1 1 4 3 5
a. Create a frequency table for these data. Include a third column for percentages.
b. For what percentage of these schools did exactly 4% of the students report that they wrote between 5 and 10 twenty-page papers that year?
c. Is this a random sample? Explain your answer.
d. Create a histogram of grouped data, using six intervals.
e. In how many schools did 6% or more of the students report that they wrote between 5 and 10 twenty-page papers that year?
f. How are the data distributed?