Exploring Data Flashcards

1
Q

What is a categorical varialbe?

A

variables that take on values that are names or labels, such as color, or breed of dog.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a quantitative variable?

A

variables that are that are numerical, and represent a measurable quantity, like salary or height.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we represent categorical variables?

A

With bar charts or pie charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we represent quantitative variables?

A

With histograms, stem and leaf plots, or boxplots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When do we use the mean to describe a distribution?

A

When the distribution is unimodal and symmetric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When do we use the median to describe a distribution?

A

When the distribution is not unimodal and symeetric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is standard deviation?

A

The average distance from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a z score?

A

The number of standard deviations away from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is percentile?

A

percent to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the five number summary?

A

min, Q1, median, Q3, max

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is IQR

A

interquartile range: Q3 - Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the empirical rule?

A

mean-68-95-99.7–yes!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What percent of data lies above the median?

A

50%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you determine “outliers”

A

1.5 IQR’s above Q3 or below Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If a distribution is skewed right, which is higher, the median or mean?

A

the mean–the mean chases the tail!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you know whether to use the mean and s.d. to describe data, or median and IQR?

A

If the data is unimodal and symmetric, use the mean and s.d., otherwise use median and IQR

17
Q

What should you remember when making graphs?

A

Label your axes, give a key if needed, and give the graph a name!

18
Q

How much data is between Q1 and Q3?

A

50%

19
Q

what is a contingency table

A

shows distributions across 2 variables like gender and music pref. AKA 2-way table

20
Q

How can you tell if variables in a contingency table are independent?

A

If the distributions are the same across the variables.. Then it doesn’t DEPEND.. so INDEPENDENT

21
Q

marginal distribution

A

overall distributions of a single variable in contingency table (out in margins)

22
Q

conditional distribution

A

A distribution within the table, along only one row or one column? NOT IN THE MARGINS

23
Q

How do you describe distributions (histograms)?

A

Shape-Cener-Spread- and STRANGE (Outliers and gaps) some say GSOCS. where’s yo GSOCS?

24
Q

If asked to compare distributions, what should you write about?

A

Compare Shapes, Centers, Spreads, and Stranges.. The GSOCS

25
Q

Give a simple example showing that adding a constant doesn’t change the spread, but changes the center. (this always happens)

A

Data set: 1,2,3,4,5 Spread(range): 5-1=4, Center: 3
add three and get new data set: 3,4,5,6,7 spread: still 4 Center: 5 (center went up, spread stayed the same). The IQR and SD will stay the same, but median and mean +3

26
Q

Give a simple example showing that multiplying by a constant changes both the spread and the center. (this always happens)

A

Data set: 1,2,3,4,5 Spread(range): 5-1=4, Center: 3
mult by three and get new data set: 3,6,9,12,15 spread:12 Center:9 (both center and spread were multiplied by three) IQR and SD will be multiplied by 3 and all values including Q1, median, etc.

27
Q

How do you describe center?

A

Talk about the mean (balance), median (splits area in half), mode (peaks? if bimodal, talk about both modes) or simply say: “centered around ____”

28
Q

How do you describe shape?

A

unimodal, bimodal, multimodal, uniform AND symmetric, skewed

29
Q

Spread description?

A

range, IQR, stand dev, variance, or simply say: “ From here to about here”

30
Q

what happens if you ADD a constant to each value in a data set?

A

it is SHIFTED only. This effects all of the data values and measures of center (mean, med) and quartiles, deciles, etc… IT DOES NOT CHANGE THE SPREAD! (IQR, St Dev, Range all stay the SAME).

31
Q

what happens if you multiply all of a data set by a constant?

A

it is scaled.. Everything is effected. Mean/ median/ stand dev/ iqr/ quartiles all multiplied by that constant. Center, spread and all individual values are changed.

32
Q

If you want to calculate % above a value, what do you put into normcdf(? ?)

A

find z score for value, and then normcdf (Z left, 999)

33
Q

Which calculator function gives you a z score?

A

invnorm(%ile)… YOU MUST USE PERCENTILE (%to left)

34
Q

What does normcdf do?

A

It gives you the area under the normal curve between any two z scores