lecture 4-7 Flashcards
Dependent samples:
If the values in one sample affect the values in the other sample, then the samples are dependent. Example: measuring blood pressure before and after taking medication
Independent samples:
If the values in one sample reveal no information about those of the other sample, then the samples are independent. Example: one group gets an active drug, the other placebo, and blood pressure is compared between the groups. Different people = independent samples
Pooled variance:
is a way to estimate common variance when you believe that different populations have the same variances.
Dependent sample:
Matched pairs:
Repeated measurements:
Matched pairs: the members of these pairs should resemble one another as closely as possible. For example matched on factors such as weight, height or age.
Repeated measurements: two measurements. Fx: before and after of one individual
Pooled proportion p^:
is a single estimate of a proportion that combines the data from two or more samples. It is used when comparing proportions from different groups, assuming that the population proportions are equal.
ANOVA
Used to compare means across three or more groups.
If means are different, there should be more variability.
If F statistics is significantly large, reject H0 (greater than the critical value at the chosen significance level)
Degrees of freedom:
It’s the number of values in your data that are free to vary while calculating a statistic.
- One sample: df=n−1 (where n is the sample size).
- Chi-square test: df=number of categories−1.
- T-test or ANOVA: df depends on the number of groups and observations.
Linear model:
- A Linear Model describes the relationship between two variables by fitting a straight line to the data. This line represents how one variable (dependent) changes in response to the other variable (independent).
Regression coefficients (b0 and b1):
- are the parameters in a regression model that define the equation of the line used to describe the relationship between the independent (xx) and dependent (yy) variables.
- B0 = the value of y, when x = 0 (line crosses the y-axis)
- B1 = slope, positive or negative
- Y = b0 + b1x (where x = independent variable)
Residual
- A Residual is the difference between the observed value (y) and the predicted value (y^y) for a dependent variable in a regression model. It measures how far off the model’s prediction is for a given data point.
Least squares method (OLS):
The Least Squares Method is a technique used in regression analysis to find the line that best fits the data.
The line that best explains the relationship between x and y. And the way it does that is by minimizing the sum of the squared residuals.
Residuals:
Difference between the predicted value (of the model) and the actual value
Value below predicted = negative residual.
Hypothesis test:
A statistical method used to determine if there is enough evidence in a sample data to draw conclusions about a population. Involves formulating two hypotheses, the null and an alternative and then collecting data to assess the evidence
Multiple regression:
- Multiple Regression is an extension of simple linear regression that uses multiple predictor (independent) variables to explain or predict the outcome of a single dependent variable.
Error term (e)
- The Error Term (e) represents the part of the dependent variable (Y) that is not explained by the predictors in the regression model, capturing random noise and other unmeasured factors.
Independent variables (x1, x2)
- Independent Variables (X1,X2,… are the variables in a regression model that are used to explain or predict the dependent variable (Y). They are also called predictors, explanatory variables, or inputs.