Module 5- Descriptive Statistics Flashcards

1
Q

Descriptive Statistics

A
  • summarizing our data set to better understand and communicate important information
  • helps researchers identify and communicate important characteristics about the empirical data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Raw Scores

A
  • data resulting from our measurement procedures
  • not informative
  • ex. listing all the scores from the quiz

instead using descriptive statistics we could communicate performance on quiz by a class average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Frequency Distribution

A
  • vital for describing data
    -quick way to summarize how many scores were observed at each data point
  • type of freq dis used depends on the level of measurement
  • x axis; observations of the variable in question
  • y axis; frequency of each observation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Bar Graphs

A
  • used for data representing discrete categories (distinct/ non overlapping categories)
  • summarizes nominal or categorical data
  • can also be used for interval and ratio data but not often
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Frequency Polygon

A
  • graph continuous data
  • interval and ratio data
  • not used for nominal data bc no assumption of equal intervals ^ cannot connect data points using a continuous line
  • line to connect points represents equal intervals bw each data point
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Grouped Bar Graph

A
  • taking ratio data and grouping it into categories
  • grouping continuous data into categories
    ex. scores of quiz, group people scores of 70-79% together into a bar
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Frequency distribution tells us

A
  • number of observations at each data point
  • normal vs skewed data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Normal Distribution

A
  • symmetrical bell curve
  • IQ, Height, Weight
  • represents majority of scores are in the middle with fewer observations at the ends/ extremes
  • most observations around the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Skewed Distribution

A
  • scores are bunched at one end bc the extremes are pulling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Positive Skew

A
  • mean greater than the median (mean is pulled by higher scores)
  • more values are clustered to the left (lower end of the scale)
  • right end of the distribution (high end of the scale) gets pulled to the right and has a longer tail
  • this happens when have a few extremely high observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Negative Skew

A
  • mean less than the median ( mean is pulled by low scores)
  • more values clustered to the right (higher end of the scale)
  • left end of the distribution is pulled (lower end of the scale) and has a longer tail
  • this happens when we have a few extremely low observation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Measures of central tendency

A
  • Mean
  • Median
  • Mode
    convey info about the typical observation of our data set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mean

A
  • most used MCT
  • mathematical average of our data set
  • mean= sum of scores/ number of scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Mode

A
  • Most frequent score/ observation in the data set
  • peak of the frequency distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Bimodal Distribution

A
  • when have 2 peaks in the distribution or 2 scores tied for the most frequent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Median

A
  • middle point of the distribution
  • to find; list all the scores in order of magnitude and the score that is in the middle= median
  • cuts distribution in half; 50% of observations fall above and 50% fall below
  • not used often
  • use median when data is skewed bc gives more information bc mean is very much impacted by extreme scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Mean and Median can only be calculated for…

A

Interval and Ratio Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

MCT and normal distribution

A
  • Mean, median and mode are all equal
19
Q

MCT and Skewed distribution

A
  • Mean is very much impacted by extreme scores/ outliers
  • Median is more informative and representative of the distribution
  • positive skew; Mean > Median bc higher scores pull the mean
  • negative skew; Mean < Median bc lower scores pull the mean
20
Q

Variability

A
  • provides us with an index of how spread out the scores are around the MCT
21
Q

measures of variability

A
  • range
  • variance
  • standard deviation
22
Q

Range

A
  • most basic way to represent dispersion of scores
  • difference bw the largest and smallest score
  • not always informative
  • sensitive to outliers; one extreme score can drastically have an impact on the range of the data set
23
Q

Variance

A
  • how much each score in the distribution varies from the mean of the distribution
  • average squared deviation from the mean
24
Q

Problem with Variance

A
  • is the sum of squares ^ different unit of measurement than the observations
  • makes it hard to interpret
  • ex. if looking at quiz grades the varience would be
    % squared
25
Q

Standard Deviation

A
  • Measures the dispersion of the data set relative to the mean.
  • determines the percentage of points that will fall around the mean
  • solves the problem of variance
  • is the square root of the variance
  • therefore converts the scores back into the same scale as the observations
26
Q

Properties of Normal Distribution

A

-68% of all observations/ scores will fall w/in (+/-) 1 SD of the mean
-95% of all observations will fall w/in (+/-) 2 SD of the mean
-99% of observations will fall w/in (+/-) 3 SD of the mean

27
Q

Smaller the Standard Deviation…

A
  • the smaller the interval and scores vary less around the mean
28
Q

Larger the Standard Deviation…

A
  • the larger the interval and scores vary more around the mean
29
Q

If you know the mean and standard deviation, can calculate

A
  • the interval in which 68, 95 or 99% of the scores will fall
30
Q

Data Transformation

A
  • transform data from its OG state to compare to data that has different measures
  • cannot compare different measures therefore have to transform the data into the same units
31
Q

Z scores

A
  • most common transformation of data
  • expresses each of the scores or observations in the data set in relation to the mean or standard deviation of the entire distribution
  • measures exactly how many standard deviations above or below the mean a data point is
32
Q

when z scores are used

A
  • data did not form a normal distribution and have to do infernal stats
  • want to compare 2 data sets of diff measures
33
Q

Z score mean

34
Q

Z score Standard Deviation

35
Q

When can Z score not be used?

A
  • Nominal or Ordinal Data
  • bc they do no have a meaningful mean
36
Q

equation for z score

A

(score- mean)/ Standard Deviation

37
Q

Z scores tell us

A
  • Valance
  • Size
38
Q

Z score Valance

A

+Z; observed score is larger than the mean
-Z; observed score is smaller than the mean
ex. if Z=-1.8, we know the student fell below the class average

39
Q

Z score size

A
  • tells us with more precision where on the distribution the score fell
    68% of all Z scores fall between -1 and +1
    95% of all Z scores fall between +2 and -2
    99% of all Z scores fall between +3 and -3
  • ex. Z= -1.8, close to -2 so we know the score fell more towards the left of the distribution
40
Q

Pearson Product Moment Correlation Coefficient (r)

A

-type of descriptive data
- describes the relationship bw 2 variables based on how much they vary together
- used for interval or ratio data
- can use the analogy of 2 overlapping circles. the amount 2 circles overlap is how much variance the 2 variables share
- small overlap; correlation coefficient is small
- big overlap; correlation coefficient is big

41
Q

Coefficient of Determination

A
  • r^ Squared
  • Proportion of variance accounted for in one variable by knowing the other variable
  • allows to make predictions. if highly correlated can make predictions about the other variable
  • HIGHER THE R2 THE BETTER OUR PREDICTIONS WILL BE
  • r2= 0.45; proportion of variance accounted for is 45%
42
Q

if SD is small

A
  • tall and skinny graph
  • little dispersion of scores around the mean
  • what we want
43
Q

If SD is large

A
  • flat and wide graph
  • large dispersion of scores around the mean