Lecture 4 REVISED Flashcards

1
Q

continuous variable

A

can take on any value in an interval

e.g., worker’s hourly income can take on any value between 0 and infinity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

discrete variable

A

can only take on set, distinct values within in interval

e.g., how many people chose blue as their favourite colour can only be whole number values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what levels of measurement are required for continuous variables?

A

interval or ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

rectangle in a histogram is called a…

A

bin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how does a discrete/continuous data distribution look on a graph?

A

discrete: bars
continuous: curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

frequency distribution is…

A

a tabular summary of a dataset showing the frequency of items in each class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

symmetric, skewness, kurtosis in frequency distributions?

A

symmetric: distribution is split into two identical halves

skewness: level of asymmetry in which an elongated tail extends

kurtosis: degree of peakedness/steepness in a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

when a distribution is perfectly symmetrical, what is the relationship between the mean and median?

A

mean and median are the same values

when a distribution is skewed, this isn’t the case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

why does the median tend to be more representative than the mean?

A

because if a distribution isn’t symmetrical, an outlier may skew the mean/average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

where is the mode in a frequency distribution?

A

the peak

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what formula is used to find the position of the median value?

A

(n+1) / 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the formula to calculate standard deviation?

A
  • subtract the mean from each value
  • square all the deviations and add them together
  • divide this by (n-1)
  • square root this figure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does standard deviation tell us about the dataset?

A

how close each value is from the mean

small standard deviation = low amount of variability, values are close to the mean

high standard deviation = high variability, values are far from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

variance relationship with standard deviation?

A

standard deviation is the square root of the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

density curve

A

an idealised description of a data distribution

describes the overall pattern of a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

disadvantage of variance for practical applications?

A

its units differ from the units of the variable

hence why standard deviation is more commonly reported as a measure of dispersion

17
Q

if the dataset is a sample/population, how is the standard deviation denoted and calculated??

A

sample: denoted s, calculated by dividing the squared deviations by n-1

population: denoted sigma, calculated by dividing the squared deviations by n

18
Q

mean absolute deviation (MAD)

A

measures the absolute distance/deviation of values in a dataset from the mean

19
Q

how is MAD calculated in a sample/population?

A

divide the sum of the deviations by the number of data points

20
Q

what does MAD indicate?

A

how spread out data is

21
Q

percentile

A

describes the percentage of data values that fall at or below another data value

22
Q

how to calculate percentiles?

A

(p/100)n

percentile in question divided by 100 multiplied by the number of variables in the dataset

23
Q

quartiles

A

specific percentiles dividing the data into four parts

first/lower quartile corresponds to the 25th percentile (Q1)

second quartile (median) corresponds to the 50th percentile (Q2)

third (upper) quartile corresponds to the 75th percentile

fourth quartile corresponds to the maximum

24
Q

interquartile range

A

the difference between the third and first quartile

Q3 - Q1

the range for the middle 50% of the data

overcomes the sensitivity to extreme data values