Statistics Chapter 3 (and a bit of 2) (Variability and Association) Flashcards

Learn about associations and variablilty

You may prefer our related Brainscape-certified flashcards:
1
Q

What is a range in statistics

A

A range is the difference between the largest and smallest observation.

It is very affected by outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is deviation and how is it measured

A

The deviation of an observation x is the difference between x and the sample mean __
X

The formula is (observation value - sample mean)

It can be positive and negative.
The sum of all deviation is always 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is variable and how is it measured.

What is it’s relation to deviation

A

Variance is the average of all the squared deviations.
The square has no direct meaning.
It is measured by:
__
[Sign of sum of] (x - x ) ^2
—————————————– = s^2
n - 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Standard deviation and how is it measured.

How is it related to variance

A

Standart deviation is the square root of variance.
It displays the the average distance of an observation from the mean.

It is called S.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does it mean if s = 0?

What is a disadvantage of s?

A

If s=0 all observation have the same value.

A disadvantage of s is that it is very affected by outliers because it uses the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the empirical rule.

A

The empirical rule states that in a bell shaped distribution

  • 68% of all observations fall within 1 Standart deviation s from the mean, so one value below and one above the mean.
  • 95% of all observations fall within 2 standard deviations s from the mean.
  • almost 100% fall within 3 standard deviations from the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are quartiles and how do you find them.

A

Quartiles divide the range into 4 quarters, which means the first quarter represents the lowest 25% etc…
The median is always the second quartile, as it is the 50th percentile, the median between the lowest and normal median score is the first quarter etc…
These quartiles tell you about the shape of the curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the interquartile range IQR?

A

The interquartile range IQR is the range between the third and the first quartile, so the middle half of the data.
It is resistant to outliers, as it does not include any data of the first or fourth quartile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you find out if a value is an outlier?

A

An observation is an outlier if it falls more than 1.5 IQR below the first or over the third quartile.

Another way is by the z score, where you calculate the number of standard deviation a observation is away from the mean. If it is more than 3 standard deviations, it can be considered an outlier.

Observation - mean
——————————- = z
Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a box plot and what does it contains.

A

The box plot is a diagraphical display of data which describe median and variability.
It contains the minimum value and maximum value as a line (expect outliers), the first and third quartile as a box and the median as a line in the box.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the disadvantages and advantages of a box plot compared to a histogramm.

A

A box plot does not contain mounds and gaps, but it displays outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the explanatory and response variable

A

The explanatory variable is the variable that is used to predict and which you can change (independent variable).
The response variable is the variable that you want to study (dependent variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define association

A

Two variables are associated if a particular value for one variable is more likely to occur with certain values of the other variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you summarize the association of two categorical variables?

A

You use contingency tables to collect all the data of each variable. You can then use bar graphs to plot conditional proportions, which shows the proportion of the response variable for one level of explanatory variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you summarize the association between two quantitative variables? Explain the associations

A

A common way is the scatterplot, where the x and y axis represent a variable.
The association is positive if both variables go up the same and negative if they go different ways.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the value r? Explain it and how it is calculated.

A

r means the correlation value. It can vary between -1 and 1.
On a scatterplot the points would form a straight line if r is 1 or -1.
To calculate r you use:
1
———– [sum of all values of] z(x) * z(y) = r
n - 1

17
Q

How do you predict the outcome of a variable?

A

You can predict the outcome by the regression line, which is calculated by

y = a + bx

a and b are calculated mostly by computers

18
Q

What is the absolute value of residual?

A

It is the vertical distance between a point and the regression line on a scatterplot. This means it is the difference between the predicted and actual value

19
Q

How do you calculate a and b on a regression line

A

b (slope) = r * (s{y} /s{x})

                         _.   _ a (y intercept) = y - b(x)
20
Q

For what three things do you need to watch out when working with associations?

A
  1. Extrapolation: the graphs may not be associated in a straight line forever, they may change direction.
  2. Outliers: some very high or low values may Influence values like correlation and regression lines.
  3. Causation: association never makes causation clear, there may be lurking variables and confounds that alter the causation.