Formulas and Definitions Flashcards

Question

Confidence interval

Answer 1

95% confidence interval for μ

Answer 2

Degree of freedom = sample size minus 1 = **n-1**

Answer 3

We can use S as an estimate of σ. But the distribution now changes: it is not Normal but Student’s t: The shape of the t distribution depends on the *degrees of freedom*: this is *n − 1* in this case. It is symmetrical about the origin.

Answer 4

Example: in a sample of 100 people, 55 said they were opposed to the Euro. Find a CI for the population proportion. If X is the number of people in a sample of size n who agree with a given statement, then for large n so that: X ∼Bin(n,p) ≈ N(np,np(1−p)) * **p is unknown so the best estimator is X/n** * **The df is "infinity"** then

Answer 5

**H₀**= Null hypothesis = opposite of statement made **H₁** = Alternative hypothesis = assumption being tested

Answer 6

**F:** Testing whether two independent normal samples come from populations with the same **variance**. **t:** Testing whether two independent normal samples come from populations with the same **mean** The results found with the formulas should be compared to those found in the t or F tables. t_critlower limit is found by adding "-" in front of the number in the table

Answer 7

* The **2-sample** **t test** **can only be performed if σ₁² = σ₂²**. The F test is often used to see whether it is permissible to use the 2-sample t test. (**Place the larger variance on top**) * It has many other uses as well: see **ANOVA**, **Linear Regression**.

Answer 8

**t-distribution special characteristics:** * Sample size **n≤30** and/or * We don't know the variance/standard deviation

Answer 9

Poisson distribution focuses on the number of discrete events or occurences over a specified interval or continuum (time, length, distance...)

Answer 10

Use the chi table to compare the results. **df**= Number of categories - Number of restrictions **Number of restrictions =** 1 + Number of estimated parameters **For Binomial, df=** number of observation in one group - 1

Answer 11

* To compare the means of more than 2 populations * To compare populations each containing several levels/subgroups H₀: μ₁ = μ₂ = µ₃

Answer 12

**Treatment** = Between = Distance of each mean from overall mean **Error** = Within = Internal spread/standard deviation of each sample distribution **Sum of Squares** = Sum of all squared distances from the means (SSE) **/** overall mean (SSTr)

Answer 13

Mean Square Treatment SSTr/df_treatment

Answer 14

Mean Square Error MSE = SSE/df_error

Answer 15

**p value \< 0.05** : **reject** null hypothesis **p value \> 0.05** : **don’t reject** it, (but can say “There is some evidence against H0” if 0.05 \< p \< 0.10)

Answer 16

* Each sample comes from a population that follows a normal distribution * Equal Variances * All sample are independent and randomly selected

Answer 17

The variations from the mean were attributed to the colums or the error with one-way ANOVA. With two-way ANOVA we want to know which proportion of the error's variations can be attributed to the row variations. We want SSE to be as small as possible as we compare it to the SSC for the F-ratio

Answer 18

**row mean + column mean - grand mean**

Answer 19

**actual value - fitted value**

Answer 20

**sum** **of squared residuals**

Answer 21

* **SSR / SST** * **SSTr / SST** Interpretation: * **R²= 0** ⇒ The dependent variable **cannot** be predicted using the independent one * **R²= 1** ⇒The dependent variable **can** be predicted using the independent one * **0 \<** **R²\< 1** ⇒ A coefficient of determination that falls within this range measures the extent that the dependent variable is predicted by the independent variable. An R-squared of 0.20, for example, means that 20% of the dependent variable is predicted by the independent variable A coefficient greater than 0.85 is a GOOD FIT

Answer 22

An interaction means that the main effects can not be relied upon to tell the full story. When there is an interaction effect, it means the main effects do not collectively explain all of the influence of the IndependentVariables on the DependantVariable. The IVs have an interactive effect on the DV, which means the cell means must be examined for each sub-group -- this is where the nature / direction of the interaction can be found.

Answer 23

**H₀= *y* does not depend on *x***

Answer 24

* Distance between "raw best-fit line" (mean of dependant variable) and the observed value. Also called **Error**. SSE= Sum of Squared Errors Raw SSE = SST * After conducting a regression, the SSE should be greatly diminished (becomes distance between the calculated best-fit line and the observed values). The difference SST - newSSE = SSR (regression)

Answer 25

Coordinate plan, un repère. The dependent variable is on the left

Answer 26

***y = α + ßx*** * x = random variable * ß = slope of the line (rise over run) * α = y-intercept (cross y-axis)

Answer 27

* r is always between -1 and 1 * Values close to **1** show a strong **positive linear relationship**; * Values close to **-1** show a strong **negative linear relationship**; * Values close to **0** indicate **no linear relationship**.

Answer 28

* Z is roughly N(0,1) if H0 is true. So **we reject H₀ if the observed value of Z is \> 1.96 or \< −1.96.**

Answer 29

**(1 - p(z)) \* 2**

Answer 30

Notice that the denominator, SST/(n−1), can alternatively be called MST.

Answer 31

The Independent values are potentially related to each other. There is a relationship among them. Ideally, we don't' want IV to be correlated with each other. When they are, we don't want to use them both in the multiple regression, they are redundant. It can be tested using scatterplots and correlation

Formulas and Definitions Flashcards

(56 cards)