Chapter 1: Statistical Thinking Flashcards

1
Q

What is a Median?

A

The number in the middle of a data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a mean?

A

Numerical average of the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between mean(mu) vs mean (x bar)

A

mean (x bar) is the average for a sample while mean (mu) is the average for a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a mode?

A

The most common data in a data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is central tendency?

A

a value that describes the center of a center data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 3 ways to describe a central tendency of a data set?

A

average, mode, median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What type of distribution has only one mode?

A

Unimodal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a “camel” distribution?

A

A bimodal distribution is a distribution that has 2 modes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 4 data types?

A

Nominal, Ordinal, Interval, Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is nominal data? give an example

A

a label type data that is not quantitative. It can be grouped together. eg, genders.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What Central Tendency is preferred for dealing with nominal data? Give an example

A

Mode - is the preferred CT for Nominal data. For Example, how many people voted for the Hershey brand? vs Walmart brand? And what is the mode? It is preferred for data sets with outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

With what data type(s) is Mean a preferred tool to describe Central Tendency in Statistics?

A

With Interval or Ratio data when data is not excessively skewed. What is the average salary of data scientists? What is the average temperature in Texas in June?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

With what data type(s) is Median a preferred tool to describe Central Tendency in Statistics?

A

Ordinal Data. Skewed data is fine. e.g., how would you rate the quality of the class?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does Standard Deviation and Variance measure about a dataset?

A

Both measure the spread of a data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the relationship between standard deviation and the mean of a data set?

A

Standard deviation measures the average distance from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the 2 differences between the formulas for population and sample standard deviation?

A

We use mu average and 1/N for population standard deviation, vs x bar average and 1/(N-1) for sample standard deviation.

17
Q

What is the 95% rule?

A

approximately 95% of the observation falls within 2 standard deviations of the mean on a normal distribution.

18
Q

What does it mean to when 95th percentile male is 216 lbs with a mean of 171 lbs ?

A

It means 95 percent of the us male population falls between 126lbs and 216lbs.

19
Q

What does a z score describe?

A

On the normal distribution curve, the z score describes the location of a datapoint in relationship to the mean, and the standard deviation of the dataset.

20
Q

why is a z score important wrt a Normal Distribution?

A

Z score helps us transform a normal distribution to a Standard Normal Distribution.

21
Q

What is the mean and Standard deviation for a for Standard Normal Distribution?

A

mean = 0, standard deviation = 1

22
Q

Why does sample standard deviation use N-1 vs N?

A

We do this to overestimate our margin of error since the sample is a subset of the population.

23
Q

When do we use a z distribution instead of t distribution?

A

When sigma(population standard deviation) is available. and sample size is greater than or equal to 30. Else, use T distribution.

24
Q

What is covariance?

A

measurement of of leaner relationship between 2 variables.