Central Tendency, Variability, & Z-scores Flashcards

1
Q

What data classes are best for mean?

A

interval, ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What data classes are best for mode?

A

nominal, ordinal, interval, ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What data classes are best for median?

A

ordinal, interval, ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Under what conditions might a median be a better measure of central tendency than the mean?

A
  • when the data is ordinal (mean does not apply)
  • interval/ratio data if there are extreme values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

It should seem clear how the mean and the median are measures of the central tendency of the data since the mean is is a familiar average and the median is the middle. However, explain why mode is also considered a measure of central tendency?

A

most data sets peak in the middle (bell shape). The mode is the highest frequency so it’s usually in the middle somewhere.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For any data set, what is Σ(X − X ̅)?

A

Σ(X − X) may be written as:
ΣX − ΣX ̅ = ΣX − nX ̅ = ΣX − n(ΣX)/n = ΣX − ΣX = 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The following data represent a sample of the time to complete a certain task in minutes and seconds (mm:ss).

6:30, 11:15, 6:22, 11:32, 8:12, 5:02, 9:17, 6:51, 8:44, 7:45, 9:37, 7:28, 4:29, 7:42

compute the mean:
compute the std. dev.:

A

Since the values are given in minutes and seconds they first need to be converted to either minutes and decimal parts (eg. 6:30 = 6 + 30/60 = 6 + 0.5000 = 6.5000min) or to seconds (eg. 6:30 = 6*60 + 30 = 360 + 30 = 390s) so that they can be easily added.

mean: 7:55
std. dev.: 2:04

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For a certain set of data, the mean and standard deviation are computed.

How does X ̅ (data treated as sample) compare to μ (data treated as a population)?

How does s (data treated as sample) compare to σ (data treated as a population)?

A

X ̅ is the sample mean, μ is the population mean; they are calculated the same way.

standard deviation of sample (N-1) vs. population (N)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Given the following sample data set:

        6, 12, 9, 7, 8, 4, 3, 12, 15 

Compute the mean.
What is the median?
What is the mode?
Compute the variance.
Compute the standard deviation.

A

mean: 8.44
median: 8
mode:12
variance: 15.77
standard deviation: 3.97

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

For the following sample data set:
X frequency
52 5
54 8
57 2

Compute the mean.
Compute the variance.
Compute the standard deviation.

A

mean: 53.73
variance: 2.635
standard deviation: 1.62

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The following sample data of the number of communications are taken from logs of communication with Distance Education students:

5, 9, 5, 23, 27, 55, 34, 7, 30, 15, 22, 60, 14, 52, 297, 8, 51, 15, 51, 35, 15, 39, 137, 43, 38, 14, 93, 7 

Compute the mean.
Compute the standard deviation.
Draw a boxplot with the minimum, Q1, Q2, Q3, and maximum.
Which is a better representation of the central tendency: mean or median? Explain.

A

mean: 42.89
std. dev.: 57.28
Minimum: 5
Q1: 14
Q2: 28.5
Q3: 51
Maximum: 297

The mean is; this is due to extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If the two largest values in the sample data set of the previous problem were omitted,

Compute the mean.
Compute the standard deviation.
Draw a boxplot with the minimum, Q1, Q2, Q3, and maximum.
Which is a better representation of the central tendency: mean or median? Explain.

A

mean: 29.50
std. dev.: 21.68
minimum: 5
Q1: 14
Q2: 25
Q3: 43
Maximum: 93

Mean may now be a better measure because extreme outliers have been removed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Consider the following data set:
21, 34, 18, 26, 30, 35, 24, 29, 25

If this is a population, compute the mean.
If this is a sample, compute the mean.
If this is a population, compute the standard deviation.
If this a sample, compute the standard deviation.

A

μ=26.9
X ̅= 26.9
σ= 5.34
s= 5.67

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If we had a set of ordinal values (not interval/ratio), could you create a boxplot?

A

Technically yes, because quartiles depend only on the position in the ordered data set. Thus, one could determine the positions in the ordered set for Q1, Q2 (median), and Q3 and the first and last position for the minimum and maximum. However, without interval/ratio data, visualizing this with a boxplot would not make sense.

For example, imagine you ask 9 people what size drink they ordered, small, medium, or large. The ordered data might be: small, small, small, medium, medium, large, large, large, large. Q1 is position 2.5 (small), Q2 is position 5 (medium), and Q3 is position 7.5 (large), minimum is position 1 (small) and maximum is position 9 (large).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Typically we consider quantitative data that is symmetric about the mean. If we have a data set that has a few extreme high values, then

a. How is it skewed?
b. Would you use a mean or median? Why?

A

It is positively skewed (right-skewed)

You would use median since it is less sensitive to extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
  1. For the MCAT, µ = 500 and σ = 10. What is the probability of an individual getting a score greater than 502.5?
A

z=0.25
p=0.413

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
  1. For the MCAT, µ = 500 and σ = 10.

What is the minimum score would you have to obtain to be in the top 5%?

What is the minimum score you would have to obtain to be in the top 2.5%?

A

95%
500 + (1.64 x 10) = 516.4

97.5%
500 + (1.96 x 10)= 519.6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Correlational method

A

looking for relationships between variables (correlation or regression)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Experimental method

A

manipulating one variable to determine if this causes changes in another variable

20
Q

independent variable

A

what we control/manipulate

21
Q

dependent variable

A

what we measure (is influenced)

22
Q

confounding/extraneous variables

A

other things impacting (things impacting dependent that aren’t independent)

23
Q

random assignment

A

equal chance to end up in group (bigger the better)

helps decrease extraneous variables

24
Q

experimental vs control groups

A

experimental: at least 2 different groups
control: group with no treatment (placebo)

25
Q

Placebo

A

any treatment that has no active properties

26
Q

hypothetical constructs

A

an explanatory variable which is not directly observable

we must find ways to operationalize these

27
Q

operational definition

A

how do we assign a number?

28
Q

population

A

all the people we want to apply results to (we control/decide)

29
Q

sample

A

we find a subset of the pop. that is representative of the whole pop. (random)

30
Q

random sample

A

random group from population

31
Q

descriptive statistics

A

summarizes data

32
Q

inferential statistics

A

trying to infer back to population (generalize)

33
Q

parameter vs. statistics

A

statistics for sample; parameter for population

34
Q

sampling error

A

the difference between stat. from sample and it’s parameter

35
Q

discrete vs. continuous variable

A

discrete: categories w/ nothing in between

continuous: infinite values between any two categories

36
Q

quantitative vs. categorical data

A

quantitative: directly measuring something (continuous data)

categorical data: counts of things (discrete variables)

37
Q

scales of measurements

A

nominal: no inherent order of different categories (weakest)

ordinal: one group is above other, not evenly spaced (can use median)

interval: there’s equal interval, but no true zero

ratio: there’s equal interval, but is true zero

38
Q

frequency distributions

A
  • real lower limit
  • real upper limit
  • midpoint
39
Q

visualizing data

A
  • histogram: frequency distribution turned into a graph. We can see shape o destitution and the spread of the data.
  • line graph: good for looking at change/time
  • scatterplot: tells us about relationships between variables. shows pos. & neg. relationships. strength of relationship based on how linear.
  • boxplot: box represents 50% of data.
40
Q

shapes of distributions

A

symmetrical
- unimodal: bell-shaped (normal dist.)
- bimodal: clear 2 peaks (one can be higher)
- rectangular: data of equal freq. for all values

asymmetrical
- pos. skewed: skewed to right (not norm. but unimodal)
- neg. skewed: skewed to left (not norm. nut unimodal)

41
Q

central tendency

A

mean: avg. of all numbers
median: middle number in list
mode: most freq. number

42
Q

variability
-range
-interquartile range
- variance
- standard dev.

A

range: x(max)- x(min)
interquartile range:
Q1=.25 x (# in data data set)
Q2=.50 x (# in data data set)
Q3=.75 x (# in data data set)
IQR= Q3-Q1
variance: avg. squared dev. of each number from mean
std. dev.: sqrt var. (takes away squared unit)

43
Q

z-scores

A

raw score to z-score: x=u+2o
z-score to raw score: z=(x-u)/o

44
Q

standardize a dist.

A

shape of standard distribution: the shape of the distribution of z-scores will be the same as the shape of the original dist. raw scores
mean: z-score dist. always have mean of zero so and above=+ and any below=-
standard deviation: the z-score dist. will always have a standard dist. of 1. The numerical val. of z-score is exactly the same number of standard deviation from the mean

45
Q

normal distribution

A

empirical rule: the following apprrox. holds
- 68% of obs. fall between u-o & u+o
95% of obs. fall between u-2o & u+2o
99.7% of obs. fall between u-3o & u+3o

46
Q

un-biased stat

A

a statistic whose long range average is equal to the parameter it estimates