Linear Models Flashcards

1
Q

What is the equation used to demonstrate a linear relationship

A

response ~ explanatory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three parts of ols

A
  1. Formula
  2. Fit
  3. Summary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do we test data for normal distribution?

A

Certain statistical tests assume that data is normally distributed, such as regression analysis and t-tests. If the data is not normally distributed, than non-parametric tests (that dont rely on the normal curve of distribution) can be used like the Mann-whitney U test. OR MORE COMMONLY we will log transform the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can you tell if data is normally distributed?

A

Shapiro-Wilk Test: Tests the null hypothesis that the data is normally distributed. A p-value > 0.05 indicates the data is likely normal.

stat, p_value = shapiro(dataframe[new_col_name])
print(f’Shapiro-Wilk test for {new_col_name}:’)
print(f’Statistic={stat}, p-value={p_value}\n’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you log-transform data?

A

def log_transform_and_test(dataframe, columns):
# Apply log transformation to specified columns and add new columns to the dataframe
for col in columns:
new_col_name = f’log{col}’
dataframe[new_col_name] = np.log(dataframe[col])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly