Linear Models Flashcards
What is the equation used to demonstrate a linear relationship
response ~ explanatory
What are the three parts of ols
- Formula
- Fit
- Summary
Why do we test data for normal distribution?
Certain statistical tests assume that data is normally distributed, such as regression analysis and t-tests. If the data is not normally distributed, than non-parametric tests (that dont rely on the normal curve of distribution) can be used like the Mann-whitney U test. OR MORE COMMONLY we will log transform the data.
How can you tell if data is normally distributed?
Shapiro-Wilk Test: Tests the null hypothesis that the data is normally distributed. A p-value > 0.05 indicates the data is likely normal.
stat, p_value = shapiro(dataframe[new_col_name])
print(f’Shapiro-Wilk test for {new_col_name}:’)
print(f’Statistic={stat}, p-value={p_value}\n’)
How do you log-transform data?
def log_transform_and_test(dataframe, columns):
# Apply log transformation to specified columns and add new columns to the dataframe
for col in columns:
new_col_name = f’log{col}’
dataframe[new_col_name] = np.log(dataframe[col])