Module 14: Descriptive statistics Flashcards

1
Q

2 most common ways to sumamrize data

A
  1. measure of central tendency
  2. Measure of variability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Measure of central tendency (4)

what it is + represented by

A

A measure of the typical value in a collection of numbers or a data set
- measured by mean, median and mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Mean (2)

+ how to find?

A

The average
Sum of all the scores divided by the total number of scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Population mean

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sample mean

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Median (2)

How to find?

A

The value that lies in the middle of the data when the data set is ordered
- First rank the data, then the position of the median is equal to the number of enteries plus one divided by 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Odd number of entries when caculating median:

A

median is the middle data entry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Even number of entries when calculating median:

A

Median is the mean of the 2 middle data entries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mode

A

The most frequent value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If no data set is repeated then the data has no

A

mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If two entries occur with the same greatest frequency each entry is a — and is called

A
  • mode
  • bimodal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Finding the mode

A

finding the greatest frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Advantage of using the mean (2)

A
  • most common statistic
  • Takes into account every entry of a data set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Disadvantage of using the mean (2)

A
  • greatly affgected by extreme scores (outliers)
  • Knowledge about individual cases is lost with averages
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Advantages of using the median (2)

A
  • Little influence by extreme scores
  • Reasonable estimate of what most people mean by the center of a distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Disadvantage of using the median

A
  • may not be good to ignore extreme values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Advanatges of using the mode (2)

A
  • the most frequently obtained score
  • not influenced by extreme score
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Disadvanatge of using the mode (2)

A
  • may not represent a large proportion of the scores
  • ignores extreme values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Variability

A

numbers which describe how spread out a set of data is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Examples of variability meausres (4)

A
  • range (interquartile range)
  • deviation
  • variance
  • standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Range+ formula (2)

A

length of the smallest interval that contains all the data

range= largest value - smallest value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

range is sensitive to

A
  • sample size: small samples= less range (less respresentative range)
  • extreme scores (tells you smallest and largest but not bulk)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Interquartile range (2)

+ formula

A

Measure of distance between first and third quartiles
- IQR= Q3-Q1

23
Q

second quartile is the

24
benefits of IQR (2)
- less affected by extreme values - helpful for identifying outliers
25
Quartile (2) | What it is+median
- positions in a range of values representing multiples of 25% - 50% of scores fall below median, 50% scores above
26
First quartile (Q1)
25% of scores fall below Q1, 75% above
27
Third quartile (Q3)
75% of scores fall below Q3, 25% above
28
deviation
The diference between each score and the mean of the data set | How far you are from the mean
29
deviation formula
xi= xi-u
30
deviation scores always sum to
0
31
Difference between deviation and IQR/boxplots
Deviation scores show dispersion around the mean, IQR and boxplot show dispersion around the median
32
Variance
single number representing the average amount of variation in a set of scores/ how spread out the scores are
33
Steps for finding the sample variance (5)
34
Standard variation
Measure of the spread of scores out from the mean of the sample
35
How to cauclate standard deviation
1. calculate the variance 2. find the square root
36
Population standard deviation formula
37
Standard deviation is a measure of the typical amount an entry deviates from the mean, thus the more entries are spread out, the
greater the standard deviation
38
Descriptive statistics (2)
- cannot make predictions or generalizations - only drawing conclusions about current sample and not extrapolating or going beyond
39
inferential statistics (2)
- can make predictions or generalizations - allow conclusions about the population based on data from a sample
40
Data matrices
a table or worksheet that organizes the data together with all the variables of interest
41
Frequency distributions
A table indicating the frequency of each value in a data set
42
Histogram (3) | What it is+ illustrates+can help identify
- A graphical representation of the frequency of a variable - illustrates the distribution of scores - can help identify outliers or violations of normal distribution assumptions
43
symmetrical
44
Negative skew or left skew
45
Positive skew/right skew
46
Central tendency
helps identify the typical or most common value in data
47
Measures of central tendency
Mean median mode
48
measure of central tendency for symmetrical distribution/ skewed
49
If the average is 100 and the standard deviation is 10, then there is
2/3 of the data that falls between 90 and 110
50
for data that is skewed or has outliers, ---- may be better choice to describe the centre of the distribution
median
51
Q position
Qposition= [(Q#)(n+1)]/4 | Q#= number of quartile your trying to find
52
Round Q position to
the median
53
How to find outlieers with IQR
54
Scatterplots
- visualize the form, direction and strength of 2 variable relationships
55
correlation coefficients
indicate the degree of covariance between variables: how much one variable changes in relation to another
56
Data points that are more closely positioned around the best fit line represent
a stronger relationship than when data points are further from the lines