Week 1: Central Tendency, Varibality Measures, Z-Score, Linear Regression, Correlation, R Squared, Predicted values, Residual Flashcards

Question 1

Q

1. What type of variables are these?
Country
Income
Temperature
IQ
PH
Cancer types
Hair color
Socio-economic statuscolor
Number of Pets
2. What central tendency and measures of variability can we calculate for each?

Answer

A

1. Country - Nominal (Mode only)
Income - Ratio (median, mode, mean, IQR, Var, SD)
Temperature - Interval (all of them)
IQ - Interval (all)
PH - Interval(all)
Cancer types - Nominal (mode only)
Hair colour - Nominal (mode)
SES - oridnal (median, mode)
nr of pets - ratio (median, mode)

Question 2

Q

What graphs do we use for numerical/qualitative variables?

Answer

A

Qualitative - bar chart

Numeric - histogram, boxplot

Question 3

Q

Central Tendency
What is 
1. Mean
2. Median
3. Mode

Answer

A

Mean - average, typical (sum of var/nr of var)
Median - the middle, order it ten pick middle
Mode - the most frequent value

Question 4

Q

Measures of Variability
What is: (formulas)
1. Variance
2 Standard Deviation
3. IQR

Answer

A

Variance: how much the subject differ from each other
Population Var: sigma^2=x1-miu^2/population size
Sample Var: xi-mean value of all observ^2/n-1
SD: measure the number of variations/ dispersion of a set of values
Formula same as Var(x) but all with root square
IQR: spread of data, also midpsread
1s quartile- 2nd quartile

Question 5

Q

What is a normal distribution?

Answer

A

mean=median=mode
empirical rule: 68/95/99.7% 1/2/3 SD
wel discribed by its SD
unimodal
symmetrical
centered
fixed score distirbution

Question 6

Q

The Standard Normal Distibution is…

Answer

A

a ND with mean: 0 and Variance: -1

Question 7

Q

Describe the +/- skewness

Answer

A

+ right skewed - mode>median>mean

- negaive skewed - mode>median>mean

Question 8

Q

What is the Z-Score?

Answer

A

How far is an observation from the mean in terms of SDs
The nr of SDs by which the value of raw score is above or below the mean value of what is being observed
The standardized score
Z =(observed value-men)/SD
if we extend 1 SD above the mean and 1 SD below=> approx 68% of the observations are within the interval
Approx 95% of the populations would be between 2 SD above the mean and 2 SD below for a ND
Also, if x is normally distirbuted, then 1 is ND, with mean=0 and SD=1

Question 9

Q

What is the Correlation Coefficient?

Steps

Answer

A

Way of sum a scatter plot into an nr between -1 and 1
Steps
1. fits a straight line to the data
2. the cc rememebrs if the slope of the striaght line points downwards or upwards
if slope + => coeff (0-1)=> positive
if slope - => coeff (1-0)=> negative
is flope striaght => coeff is 0 => closer to 0 the weaker it is
3. looks at the quality of the fit of the straight line of the data

Question 10

Q

What is Pearson Correlation (Formula)

IN LINEAR REGRESSION ONLY !

Answer

A

summ the strength and direction of a straight-line relationship
1. strength - the closebess of the points to a straight line
2. direction - if one var generally increases or decreases
rxy= (xi-x mean values)(yi- y mean values)/squar root (xi-x)(yi-y)^2

Question 11

Q

What is Linear Regression Analysis?

Answer

A

used to predict the value of a variable based on the value of another variable
describes the average relation between y-values and x-values
the points on the regression line are predicted by y-values and denoted by y hat
explores the relation btw a quantitative response var and oneor more explanatory

Question 12

Q

Regression Line is fully determined if:

Answer

A

> the intercation with te y-axis is known–> intercept

- > it is known how steep the line is–> slope

Question 13

Q

Formulas for:
Regression Line
Regression Model

Answer

A

RL: Y hat=b0(intercept)-b1(slope) x Xi
RM: Yi= Y hat i+ ei=b0+b1 x Xi+ei (residual)

Question 14

Q

How R squared and rxy are related?

Question 15

Q

About simple linear regression

Answer

A

One explanatory variable→simple regression
Multiple explanatory variables→multiple regression
- describes the average relation between Y values and X values
–> used whe y is numeric or continuous, x var as well
limited because it is useful for summ associations only

Y Hat = estimated value
Y Line = predicted value for an individual

Question 16

Q

What is R squared? Coefficient of Determination

Answer

Study These Flashcards

A

indicates the percentage of variability of y explained by the variable x
tells us how good a regression line estimates/ predocts actual values

Question 17

Q

What is an Error? What is Residual?

Answer

Study These Flashcards

A

both are obersed errors for a set of data
difference between observed y and estimated y
= (yi-y hat)
distance between regression line and theactual observed value

Question 18

Q

What is the independ and dependent var?

Answer

Study These Flashcards

A

x- independent
y - dependent
x- explained var
y - response var
e.G 
x - type of treatment (indep)
y - blood pressure (dep)
treatment --> effect--> on blood pressure

Question 19

Q

What does a low/high variance indicate?

Answer

Study These Flashcards

A

A small variance indicates that the data points tend to be very close to the mean, and to each other. A high variance indicates that the data points are very spread out from the mean, and from one another.

Question 20

Q

What is the difference between the predicted and actual Y score?

Answer

Study These Flashcards

A

The residual

Question 21

Q

How high can R squared go

Answer

Study These Flashcards

A

up to 1 only!

Question 22

Q

If the dots on a scatter plot are spread out randomly, the researcher would report the correlation as

Answer

Study These Flashcards

A

close to 0

Question 23

Q

What is a negative correlation

Answer

Study These Flashcards

A

A negative correlation is a relationship between two variables in which one variable increases as the other decreases, and vice versa.
e. g:
The more often a person visits the dentist, the fewer cavities she/he will have.

Week 1: Central Tendency, Varibality Measures, Z-Score, Linear Regression, Correlation, R Squared, Predicted values, Residual Flashcards

(23 cards)