Stats Flashcards

Question

What is standard error and how is it calculated?

Answer 1

The standar error (aka SE, SEM) is the standard deviation of a statistic (in this case mean) and is calculated as:

Answer 2

We can estimate SE as: This is because the best estimate for population variance *σ* is sample variance *s.*

Answer 3

The variance in the distribution of means will decrease as *n* increases.

Answer 4

1) In the t-distribution, there is more area at the distribution tails. Also the t-dsitribution is "pushed at the top". 2) a t-critical value is more extreme than a z-critical value (see bars in the figure)

Answer 5

s = sqrt(3.4/30) = 0.3366502 t = (38.5 - 37)/0.3366502 = 4.456 Then we look up the t critical value in a table using: two tailed, 29 df, and an α of 5%, and get a value of 2.045. Because 4.456 \> 2.045 we reject the null hypothesis and cunclude that drug A significantly changes body temperature.

Answer 6

We can't use *z* if we are estimating *σ* from *s*.

Answer 7

*v = n* - 1

Answer 8

For a one-tailed test, we put the entire rejection region into one tail of the t-distribution, instead of splitting it between the two tails.

Answer 9

Like z, t is the number of standard deviations from the mean.

Answer 10

1) Type I error (α) = rejecting true *H₀* 2) Type II eror (β) = failing to reject a false *H₀*

Answer 11

The degrees of freedom are needed because the distribution shape changes for different degrees of freedom.

Answer 12

In R, you can use: 1 - pt(4.456,29)

Answer 13

You would use a one-tail test if you're only interested in whether body temeprature is either increasing or decreasing as a result fo the drug.

Answer 14

1) We want to know whether the drug increases body temperature * H₀*: µ - µ₀ ≤ 0 * H_A*: µ - µ₀ \> 0 2) We want to know whethr the drug decreases body temperature * H₀*: µ - µ₀ ≥ 0 * H_A*: µ - µ₀

Answer 15

where SS is sum of squares.

Answer 16

In the formula for one-tailed t-tests, the values of the means are not absolute.

Answer 17

We need to accept some level of error because the t-distribution asymptotes at the x axis and there is no value of t that corresponds to a proability of 0%.

Answer 18

Statistical power (1 - β) is the probability of correclty rejecting a false *H₀*

Answer 19

The greater the difference between *µ* and *µ₀*, the greater the power we have to deect the difference.

Answer 20

The probability of a type II error depends on: 1) what *H_A* is 2) how large an effect we hope to detect 3) sample size 4) how good the experimental design was

Answer 21

* *s²_f** = SS_f / (n_f - 1) = 261/29 = 9 * *s²_u** = SS_f / (n_u - 1) = 320/28 = 11.43 **t'** = (x̄₁ - x̄₂)/sqrt(s²₁/n₁ + s²₂/n₂) = (24 - 21.8)/sqrt(9/30 + 11.43/29) = **2.6406** Wilsch has a different distribution, so we need to use a special formula to calculate the degrees of freedom: v' = (s²_x̄1 + s²_x̄2)²/(s²_x̄1)²/(n₁ - 1) + (s²_x̄2)²/(n2 - 1) but first: * *s²_x̄f** = s²_f/n_f = 9/30 = 0.3 * *s²_x̄u** = s²_u/n_u = 11.43/29 = 0.3941 * *v'** = (s²_x̄1 + s²_x̄2)²/[(s²_x̄1)²/(n₁ - 1) + (s²_x̄2)²/(n2 - 1)] = (0. 3 + 0.3941)²/(0.3)²/29 + (0.3941)²/28 = 55.6939 Now that we know v' we check the t-table and find t_{0.05(1),55.6939} = **1.672677** ⇒ *N₀* rejected

Answer 22

These elements increase statistical power: 1) greater difference between *µ* and *µ₀* 2) larger *α* 3) larger *n* 4) smaller *σ*² 5) one-tailed tests

Answer 23

1) Data are a **random** sample 2) Each data point is **independent** from each other 3) Data come from a **normally-distributed** population

Answer 24

*H₀*: *µ₁*= *µ₂ HA*: *µ₁*≠ *µ₂* OR *H₀*: *µ₁*_- µ₂= 0 HA: *µ₁* - *µ₂*≠ 0

Answer 25

1) data are random and independent 2) Both samples come from normally-distributed populations 3) Both populations have equal variances

Answer 26

* H₀*: *µ*1 - *µ*2 = 0 * H_A*: *µ*1 - *µ*2 ≠ 0 *s²_p* = (SS_f + SS_u)/df_f +df_u = (261 + 320)/(29 + 28) = 10.193 g² *s_x̄f-x̄u* = sqrt(*s²_P*/*n_f* + *s²_p/n_u*) = sqrt(10.193/30+ 10.193/29) = 0.8314 g *t* = (*x̄_f - x̄_u*)/*s*_{x̄f - x̄u}= (24 - 21.8)/0.8314 = 2.645 v = 57, t-critical = 2.0. absolute value \> critical value, so we reject the null hypothesis.

Answer 27

1) *H_A*: *µ*₁ - *µ*₂ \< 0 * H₀* is rejected if t ≤ t_α(1),v' 2) *H_A*: *µ*₁ - *µ*₂ \> 0 * H₀* is rejected if t ≥ t_α(1),v' Note that for a two-tailed test, For a two-tailed test, *H_A*: *µ*₁ - *µ*₂ ≠ 0 we reject *H₀* if | t | ≥ t ≥ t_α(2),v'

Answer 28

* H₀*: *µ_f* - *µ_u**H_A*: *µ_f* - *µ_u* ≥ 2 * s²_p* = (SS_f + SS_u)/df_f +df_u = (261 + 320)/(29 + 28) = 10.193 g² *s_x̄f-x̄u*= sqrt(s²_P/n_f + s²_p/n_u) = sqrt(10.193/30 + 10.193/29) = 0.8314 g *t* = (x̄_f - x̄_u)/s_x̄f-x̄u = (24 - 21.8 **- 2**)/0.8314 = 0.240558 v = 57, t-critical = 1.67. Because out t-value is less than the t-critical, we cannot reject out null hypothesis that there is a difference of at least 2 g between both treatments.

Answer 29

T-test is quite robust to considerably non-normality, but violation of random/independence and homogeneity of variances is serious.

Answer 30

a) A * *b) B**

Answer 31

A) A B) B **C) C** D) D E) A + D F) C + B

Answer 32

A) A **B) B** C) C D) D E) A + D

Answer 33

A) 0.05 B) 0.975 C) 0.01 D) 0.0005 **E) 0.005**

Answer 34

T-tests are a little bit more robust against variance heterogeneity if: 1) sample sizes are similar 2) sample sizes are above 30 2) the test is two-tailed

Answer 35

1) data are random and independent: cannot be checked. Done from experimental design. 2) Both samples come from normally-distributed populations: Visual inspection and Shapiro-wilk test 3) Both populations have equal variances: Visual inspection and Fligner - Killeen test

Answer 36

Paired data can be combined into a new sample by calulating their differences and this will now make data points independent.

Answer 37

1) Both samples normal and equal variances: two-sample t-test with pooled variance 2) Both samples normal but unequal variances: Welsch's two-sample t-test (no pooled variance) 3) Both samples non-normal but equal variances: Mann-Whitney or Wilcoxon rank test 4) Both sampes non-normal and unequal variances: Transformation and re-assessment

Answer 38

It's a non-parametric test. Because of this: 1) It does not require estimation of population paameters 2) Hypotheses are not statements about population parameters However, 3) it assumes that the data are random

Answer 39

Data are ranked wither from high to low or from low to high. Convertion of data into ranks causes a loss of information and therefore power.

Answer 40

Step 1: assign ranks to all numbers. If a number is repeated, they still get ranks n+1 where n is the previous rank. Step 2: average the ranks in the repeated numbers.

Answer 41

u = n₁n₂ +[n₁(n₁ + 1)]/2 - R₁ u' = n1n2 - u

Answer 42

* H₀* = Male and female students are the dame height * H_A* = Male and female students are not the same height Not that no hypothesis is made on any population parameters. u = n₁n₂ + n₁(n₁ + 1)/2 - R₁ = (7)(5) + (7)(8)/2 - 31 = 35 + 28 - 31 = 32 u' = n₁n₂ - u = (7)(5) - 32 = 3 Then you compare either u or u', whichever is larger to the u critical (u_α(2),n1,n2). If greater, reject *H₀*. This calculaton is not done by hand for in the exam.

Answer 43

1) make a string will all the data: height 2) make a string corresponding to sex for each data point sex 3) test: wilcox.test(height~sex)

Answer 44

a) A variance **b) A number of standard errors from the mean for a t-distribution with a given number of degrees of freedom** c) A statistic that, without any other information, tells you whether your alternative hypothesis is true d) A non-parametric test statistic

Answer 45

a) 1 b) 1.645 **c) 2** d) 2.5 e) 3

Answer 46

a) Both samples are non-normally distributed, sample variances are equal, and sample distributions are similar b) One sample is non-normally distributed and variances are unequal c) One sample is non-normally distributed and variances are not equal d) Both samples are normally distributed, and variances are equal **e) Both samples are normally distributed and variances are unequal**

Answer 47

**a) The dependent variables are growth and seed set.** b) In a graph of the seed set results, seed set should be plotted on the x-axis. c) A one-sample test is appropriate for this situation. d) A paired-sample test is appropriate for this situation

Answer 48

a) A statistical hypothesis is a statement about a cause-and-effect relationship between 2 or more variables. **b) A scientific hypothesis is a statement about a cause-and-effect relationship between 2 or more variables.** c) A statistical hypothesis must be proved to accept or reject a scientific hypothesis d) “Descriptive statistics” refers to testing how much variation in an observed variable is due to a predictor variable, versus how much is due to chance alone.

Answer 49

* *a) Decreases** b) Increases c) Is not affected

Answer 50

**a) Standard deviation is a measure of sample variability, whereas standard error of the mean is an estimate of the standard deviation of the distribution of sample means from which that sample is assumed to have come, and distributions of sample means are always narrower than the sample distribution from which they are estimated.** b) Standard deviation is not always smaller than the estimate of standard error derived from the same sample. It is bigger when sample size is large (\>30). c) Because the standard deviation represents the 95% confidence interval, whereas standard error represents one standard deviation of the distribution of sample means. d) Standard deviation is the width of the distribution of sampling means, whereas standard error is a measure of sample variability, and the distribution of sample means is always more variable than a single sample.

Answer 51

States the following: The distribution of means taken from a population which is or not normal will approximate normality as sample size increases.

Answer 52

The probability of collecting certain data if *H₀* was true.

Answer 53

It tells us the t-value below which there is a 2.5% or lower chance of having gotten a sample t-value that small if the null hypothesis was true.

Answer 54

Type I error

Answer 55

Type II error

Answer 56

Power = 1 - betta

Answer 57

1) data is random and independent 2) both samples are normally-distributed 3) both samples have equal variances

Answer 58

H_0: µ_d = µ0 H_A:µ_d ≠ µ0 where µ_d is the mean difference between pairs

Answer 59

one-tail tests do not make sense when we have more than two samples.

Answer 60

We use a test-statistic to see if the differences in the means that we found are statistically significant (i.e. if we would observe similar differences if the experiment was repeated.

Answer 61

To measure effect we measure the distance of each group mean from the overall mean of all samples. Differences (X̄_i - X̄), where i is the group identifier and X̄ is the overall mean: X̄₁= - 1. 69 X̄₂ = 0.22 X̄₃ = 1.47

Answer 62

* H₀*: Mean fish size is the same in all pond sizes * H_A*: Mean fish size is not the same in all pond sizes An often seen but incorrect way to write this: * H₀*: μ_L = μ_M = μ_S * H_A*: μ_L ≠ μ_M ≠ μ_S(not correct)

Answer 63

* n* = number of observatons (*j* = 1 to *ni*) within each group, n_i * N* = total number of observations * K* = number of groups (in this case ponds) * n₁* = 12; *n₂* = 12; *n₃* = 12 * N* = 36 * k* = 3

Answer 64

1) deviation of each observation from its group mean (SS_within) 2) deviation of each observation from the overall mean (SS_total) 1) Xi - X̄i 2) Xi - X̄, where Xi is each **observation**, X̄i is group mean, and X̄ is overall mean

Answer 65

The formula for SS_total is:

Answer 66

to avoid differences calcelling each other pout, we use squares _AND multiply by sample size_ to weigh; Squared differences **n_i(X̄i - X̄)²** X̄1 = 34.45 X̄2 = 0.59 X̄3 = 26.01 Then, summign that, we get: **SS_{among groups} = ∑n_i(X̄i - X̄)²** = 61.06

Answer 67

Error is *_any deviation of an observation from the true mean_ of its population*.

Answer 68

The F-statistic is composed of: variance due to deviation of _group means from overall mean_ (Effect), divided by variance due to deviation of each _observation from its group mean_ (Error) Note we're dividing variances.

Answer 69

The shape of the F-distribution depends on the DF of the numerator and the DF of the denominator.

Answer 70

``` SS_total = *N* -1 SS_among = *k* - 1 SS_within = *N* - *k* ``` Note that also **DF_total = DF_among + DF_within**

Answer 71

MS_among = SS_among/DF_among

Answer 72

Another way of plotting is response variable on y-axis and predictor variable on x-axis.

Answer 73

MS_error = SS_within/DF_within = SS_error/DF_error MS_error is also called MS_within interchangeably. MS_error also called residual.

Answer 74

DF represents the number of observations available to estimate a parameter.

Answer 75

F-statistic aka F-ratio

Answer 76

The F-statistic is a ratio between two variances.

Answer 77

F_crit = F_{α(1)DFamong,DF2within}

Answer 78

F assumes that the variances come from normally-distributed populations.

Answer 79

F = MS_among/MS_within ``` F = 30.53/3.62 = 8.43 F_0.01(1),2,33 = 3.28 - (for some reason we used α of 0.01) ``` We reject the null hypothesis and conclude fish size is not equal across ponds.

Answer 80

To know which means are signifficantly different from one another.

Answer 81

Multiple t-tests inflate type I error.

Answer 82

Tukey-Kramer test Different from regulat Tukey beacuse it uses a different SE term:

Answer 83

Tukey is sensitive to different variances, so you can use the Welsch approximation for the Tukey test:

Answer 84

the probability of incorrectly rejecting at least one of the H0’s is: 1− (1 − α)^C = 1 − (1 − 0.05)³ = 0.14, where C is the number of possible different pairwise combinations of k samples The problem is that 0.14 is much larger than 0.05

Answer 85

Multiple comparisons control for the _experimentwise type I error_ by keeping it at α.

Answer 86

for multiple comparisons, α is the probability of commiting _at least one type I error_

Answer 87

1) posthoc comparisons 2) a priori (pre-planned) contrasts

Answer 88

1) posthoc comparisons are used to compare all pairs of means 2) pre-planned constrasts are used to rest a limited subset of hypotheses

Answer 89

float numbers are means of treatments 14 is *n* in every treatment

Answer 90

Yes, Tukey tests can be performed without first doing an ANOVA. Note that not all posthoc tests can.

Answer 91

Doing an ANOVA test before a Tukey test can lower statistical power. Nonetheless, the common practice is to do the ANOVA and hen Tukey.

Answer 92

***Assuming that the two sample sizes are equal:*** 1) Arrange and number all sample means in order of increasing magnitude 2) Calculate pairwise differences between the means X̄_i – X̄_A (Note _i is the group with the highest mean) 3) Calculate q-statistic: divide a difference between two means: **q = (X̄_B - X̄_A)/SE** where SE = sqrt(s²/n) **Note that you calculate _a q for each comparison_** 4) *H₀*: X̄_B = X̄_A is rejected if q is greater than q-critical, q_α,df,k

Answer 93

1) Largest mean compared against smallest mean, then against second smallest, so on... 2) Second largest mean compared against smallest, then second smallest, so on...

Answer 94

The SSamong equals the SS of the 3 contrasts added together:

Answer 95

if no significant differnce between 2 means is found we can conclude that there are no significant differences between eclosed means

Answer 96

q-crit = q_0.05,33,3 = 3.407 * *3 vs 1:** (7.083 - 3.917)/0.549 = 3.166/0.549 = 5.767 ⇒ **reject *H₀*** * *3 vs 2:** (7.083 - 5.833)/0.549 = 1.25/0.549 = 2.277 ⇒ ***H₀* not rejected** * *2 vs 1:** (5.833 - 3.917)/0.549 = 1.916/0.549 = 3.49 ⇒ **reject *H₀***

Answer 97

*a priori* tests do not allow comparisons of all pairs of means

Answer 98

* a priori* contrasts allow to compare one mean against an average of other means * a priori* constrasts are also more powerful than post-hoc

Answer 99

1) orthogonality means that the contrasts are independent from one another

Answer 100

control (0) fake sponge (2) sponge spp 1 (-1) sponge spp 2 (-1)

Answer 101

MS/MS_error = 0.145/0.164 = 0.882

Answer 102

ortogonality ensures that P-values are not inflated

Answer 103

1) sums of coefficients must equal 0 2) for *k* treatment groups, only *k* - 1 contrasts 3) the sum of cross-wise coefficients must also be 0

Answer 104

They are orthognal because: 1) their coefficient sums are 0 in both cases 2) their number is lower than k - 1 3) their cross-products equal 0: (0) (3)+(2)(-1)+(-1)(-1)+(-1)(-1) = 0

Answer 105

1) source of variation 2) degrees fo freedom 3) sums of squares 4) mean squares 5) F 6) p- value

Answer 106

they have to be determined before doing the statistical analysis

Answer 107

sample non-normalit suggests population non-normality

Answer 108

the result of the ANOVA cannot be trusted if the assumptions are not met

Answer 109

ANOVA is a parametric test assumptions: 1) _independent_, random samples 2) all samples come from _normal_ populations 3) _variances_ between all treatments are equal

Answer 110

Welch's ANOVA for unequal variances

Answer 111

Kruskal-Wallis

Answer 112

transformation an assumption re-assessment

Answer 113

1) Visual assessment (histograms or QQ-plots) 2) Fligner-Killeen test

Answer 114

QQ-plots are not adviced for samples with fewer than 25 observations. In that case, histograms are better

Answer 115

normality is checked through 1) assessment (hisrograms) 2) Shapito-Wilk test

Answer 116

1) assign ranks to observations 2) tied observations get average of the ranks they would get if not tied

Answer 117

observations lose information when they are converted to ranks

Answer 118

trying different analyses until one is significant

Answer 119

P-hacking changes the actual α value

Answer 120

* taking too many data points * not adjusting p-values for multiple comparisons

Answer 121

In a Tukey test, alpha represents the probabiltiy of commiting at least one Type I error among all comparisons

Answer 122

pairs of data. x-values paired with y-values

Answer 123

Ŷi = α + βXi + εi

Answer 124

data from linear regression are plotted in a scatterplot.

Answer 125

Ŷi = α + βXi

Answer 126

1) α is the intercept (i.e. of Ŷi where the line crosses y axis 2) β is the slope of the line (i.e. the increase in Ŷi every unit of X)

Answer 127

we stimate α and β from our sample as a and b

Answer 128

ε is error (i.e. the departure of an Yi from a Ŷi). where Ŷi is what the equation predicts Yi to be

Answer 129

the sum of all εi equals 0

Answer 130

Least Squares

Answer 131

Xi Yi is a singple point (pair of X and Y)

Answer 132

Xi,Ŷ is a point corresponding to an X that falls on the line of best fit

Answer 133

the difference between Xi,Yi and Xi,Ŷi is called a residual

Answer 134

we derive from the first equation to calculate α

Answer 135

Line moves up or down note that a negative intercept makes the line cross X blow Y = 0

Answer 136

The line moves, but anchored at the intercept

Answer 137

the Least Squares metod calcualtes the equation for the line that minimized differences between Y an Ŷ

Answer 138

``` Slope = (Y2 – Y1)/(X2 – X1) Slope = (10 − 16) / (5 − 2) Slope = (−6) / (3) Slope = −2 ```

Answer 139

The function does not hold infinitely, not awat from the intercept and not into the intercept either

Answer 140

1) calculate β 2) use Ŷi = α + βXi b = (20 – 0) / (3 – 1) = 10 Use the point (3,20) to calculate a: a = Y – bX = 20 – 10\*3 = -10

Answer 141

we're interested in β because that's the parameter that defines the relationship between predictor and response variable

Answer 142

To obtain SSR we need:

Answer 143

simple linear regression assumes dependence of oen variable upon another in simple linear correlation there's a relationship but not dependence

Answer 144

residual error as a measure of the scatter of data points around the regression line.

Answer 145

Note: the lines don't add up because they are not yet squared

Answer 146

β tells how much the response variable Y increases per unit increase of predictor variable X

Answer 147

1) for each value of X, Y must be random an independent of one another 2) for each X, there exists a normal distribution of Y (and a normal distribution of ε 3) homogeneity of variances in the population (the variances of the distributions of Y values must all be equal) 4) relationship between X an Y is linear (mean of Yi lies in a straight line) 5) measurments of X are obtained without error (impossible, so we assume error is irrelevant)

Answer 148

β is the functional dependence in the population

Answer 149

H0: β = 0 HA: β ≠ 0

Answer 150

r² = 1 means all the variation in Y is explained by X r² = 0 means all the variation in Y is explained by X

Answer 151

ANOVA or t-test method Note: testing anything other than H0: β =0 (e.g. H0: β - β0) requries that we use a t-test

Answer 152

``` DF_reg = 1 DF_total = n - 1 DF_resid = n - 2 ```

Answer 153

r² indicates how strong the relationship is aka how much of the total variation in Y is attributed to X

Answer 154

r² = SSreg/SStotal = SSR/SSY

Answer 155

if you calculate the mean from a set of numbers n, one of thos numbers is not free to variable and df is n - 1

Answer 156

the number of values in the final calculation of a statistic that are free to vary int he daata sample the maximum number of logicaly independent values

Answer 157

DF associated with the effect of interest DF associated with the error

Answer 158

ANOVA: DFgroups = k - 1 (where k is the number of groups) Regression: DF_reg= 1

Answer 159

In regression we only calculate 1 parameter more than the mean of Y (remember the mean of Y is a). Ȳ = a + **bx**

Answer 160

DFerror = n - p Where n is sample size and p is the number of parameters used for estimations For regression, DFerror = n-2 becaue we need to know a and b

Stats Flashcards

(221 cards)