Brief Review of Intro to Statistics (Module 14) Flashcards

1
Q

What type of data analysis would be appropriate for 1 continuous explanatory variable (x) and 1 continuous response variable (y)?

A

simple linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What type of data analysis would be appropriate for multiple continuous explanatory variables (x1, x2, … x[n]) and 1 continuous response variable (y)?

A

multiple linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What 2 types of data analyses would be appropriate for 1 categorical variable (x) and 1 continuous response variable (y)?

A

(1) T-test
(2) one-way ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Under what circumstance would we run a T-test rather than a one-way ANOVA?

A

If the categorical explanatory variable is binary, such as for sex (‘male’, ‘female’) or the main belligerents in the Wars of the Roses (‘House of York’, ‘House of Tudor’), we perform a T-test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Under what circumstance would we run a one-way ANOVA rather than a T-test?

A

If the categorical explanatory variable has more than two possibilities, such as regions of Italy (‘Tuscany’, ‘Campania’, ‘Sicilia’, etc.), or different types of fruit people commonly pack for lunch (‘Oranges’, ‘Bananas’, ‘Apples’, etc.), we perform a one-way ANOVA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between a one-way ANOVA and a two-way ANOVA?

A

A one-way ANOVA will have only one explanatory variable (x), while a two-way ANOVA will have two explanatory variables (x1, x2).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an example of a question we might use a two-way ANOVA to answer which concerns people’s preferred fruit (x1), their annual average mileage (y), and the state in the U.S. they live (x2)?

A

Is the average annual mileage that people drive influenced by their favorite fruit and does that depend on the U.S. state where they live?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the phrase “n-way ANOVA” mean?

A

ANOVAs can be performed with as many explanatory variables as one wants - “n” represents the number in the analysis, whether that is two or three or ten, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What type of data analysis would be appropriate for 1 categorical explanatory variable (x) and 1 categorical response variable (y)?

A

Analysis of Contingency Tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In an analysis of contingency tables, if both the explanatory and the response variables are both binary, what term do we use to describe the test we perform?

A

Analysis of a Two-Way Contingency Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is it called if one or both of the explanatory or response variables in an Analysis of Contingency Tables has more than two possible entries?

A

Analysis of a R-by-C (Row-by-Column) Contingency Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What type of analysis would we perform if we have 1 continuous explanatory variable (x) and 1 categorical response variable (y)?

A

simple logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What type of analysis would we perform if we have more than 1 continuous explanatory variable (x1, x2, … x[n]) and 1 categorical response variable (y)?

A

multiple logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

T-tests, ANOVA, linear regressions and logistic regressions are all part of what family of mathematical concepts?

A

linear models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the term and a description for the first assumption of linear models?

A

LINEARITY - stipulates that there is a linear relationship between the explanatory variable (x) and the response variable (y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the term and a description for the second assumption of linear models?

A

NORMALITY - for any given value of the explanatory variable (x), the values of the response variable (y) have normally distributed errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the term and a description for the third assumption of linear models?

A

HOMOGENEITY OF VARIANCE - the variance in the response variable (y) is constant across a range of explanatory variable (x) outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the term and a description for the fourth assumption of linear models?

A

INDEPENDENCE - for any given value of the explanatory variable (x), the values from the response variable (y) have independent errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

FILL IN THE BLANKS: For linear models, decent method for (1)______________ and a strong (2)____________________ will make it easier to analyze the data than (3)______________ or (4)_______________ it after the fact to better fit the (5)___________________.

A

(1) sampling
(2) experimental design
(3) transforming
(4) sub-setting
(5) assumptions of linearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What kind of data is able to be analyzed in multiple different ways and may reveal answers to multiple different questions?

A

data collected in accordance with a sound experimental design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What term refers to the ability to detect a pattern when one is present in the data?

A

statistical power

22
Q

What component of proper experimental design culminates in a good statistical power?

A

selecting the necessary number of replicates

23
Q

What do we call data for which there are multiple response variables (y1, y2, … y[n]) for each sampling unit or observation?

A

multivariate data

24
Q

What would be an example of research which collects multivariate data from the hydrological sciences?

A

The sampling unit is Lake Apopka, from which we collect multiple measurements (y1 = pH, y2 = dissolved oxygen, … y[n-1] = salinity, y[n] = nitrate concentration)

25
Q

What would be an example of research which collects multivariate data from the science of forestry?

A

The sampling unit is the German Schwarzwald, from which we collect multiple measurements (y1= canopy height, y2 = canopy cover, … y[n-1] = snag density, y[n] = trunk diameter)

26
Q

Multivariate analysis always involves multiple response variables (y1, y2, … y[n]), but we can have (1)______________________, referred to as (2)____; or (3)________________________, referred to as (4)__________________; or even (5)______________________.

A

(1) one explanatory variable
(2) x
(3) multiple explanatory variables
(4) x1, x2, … x[n]
(5) no explanatory variables

27
Q

The linear models studied in this class have only focused on explanatory variables which have been chosen ahead of time (i.e., the treatment groups). What are these referred to as?

A

fixed effects

28
Q

What do the fixed effects influence?

A

the mean of the response variable

29
Q

What is the difference between a Fixed Effects Model (FEM), Random Effects Model (REM), and a Mixed Effects Model (MEM)?

A

FEMs only contain parameters which are fixed or non-random quantities

REMs only contain parameters which are random quantities

MEMs contain both fixed effects and random effects

30
Q

What is the definition of a random effect?

A

A variable which is represented by a random sample of all the possible levels of said variable

31
Q

What do random effects influence?

A

the variance of the response variable

32
Q

What type of effects represent unobserved variables?

A

random effects represent unobserved variables

33
Q

Are you interested in the size of the effect? If you are, it is most likely a (1)_______________. If you are not interested, it is most likely a (2)_______________.

A

(1) fixed effect
(2) random effect

34
Q

Is it reasonable to think that the factor levels arise from a population of levels? If it is, the variable likely has (1)________________. If it is not reasonable to assume that, the variable likely has (2)_________________.

A

(1) random effects
(2) fixed effects

35
Q

Are there sufficient levels of the factor in the data frame that an estimate of the variance of the population effects can be based? If yes, the factor likely has (1)________________. If no, the factor likely has (2)___________________.

A

(1) random effects
(2) fixed effects

36
Q

Are the factor levels of the explanatory variable informative? If they are, we are likely dealing with ____________________.

A

fixed effects

37
Q

Are the levels of a variable just numeric labels? If they are, the variable probably has _____________________.

A

random effects

38
Q

Consider the four variables below:

(1) sex assigned at birth (M/F)
(2) the colors on a rose (red/white)
(3) possible Summer temperatures to the nearest whole degree Celsius (integer values, 78 =< x =< 111)
(4) one strain of ‘B. bassiana’ versus another possible strain (ANT-03 vs. GHA)

Would we expect fixed effects or random effects from these variables?

A

fixed effects would be expected

39
Q

Consider the four variables below:

(1) day selected at which to perform a certain collection of data given the demands of the researcher’s schedule
(2) site selected at which to collect data
(3) responses from different members of the same household
(4) the nesting attempts by the same bird

Would we expect fixed effects or random effects from these variables?

A

random effects from these variables would be expected

40
Q

What do we call flexible linear models which allow for non-normal error terms to be applied to response variables?

A

generalized linear models (GLMs)

41
Q

What are the four possible types of distribution seen in generalized linear models?

A

normal
binomial
Poisson
gamma

42
Q

What makes GLMs more useful in some respects than simpler linear models, even if it also makes them more conceptually challenging?

A

GLMs can unify several types of data analysis into the same model framework

43
Q

Most of the statistics we have learned in “Intro to Stats” is referred to typically as what?

A

frequentist statistics

44
Q

What is another name for frequentist statistics, albeit one which is slightly less descriptive?

A

classical statistics

45
Q

The main idea of frequentist statistics can be summarized how?

A

consideration of uncertainty in terms of the expected outcome to the statistic under repeated sampling iterations

46
Q

What is the desired culmination of the repeated sampling iteration one performs under frequentist statistics?

A

obtaining the P-value for the desired confidence interval

47
Q

What is the most common P-value, which is a direct reflection of the most common desired confidence interval?

A

the most common P-value (alpha-level) is 0.05, which describes a 95% confidence interval

48
Q

What development has allowed Bayesian statistics to become more popular in the 21st century than it was in the early 20th century and beforehand?

A

advent of cheap computing technologies

49
Q

What is it called if we run a linear model in which we know the errors are not normally distributed but we assume that we have lots of data that can compensate for that short-coming?

A

asymptotic theory

50
Q

What is the main advantage that Bayesian statistics has over other linear models?

A

Bayesian statistics frees the researcher to devise a model which best matches the problem at hand, rather than one which forces the data into one of the premade statistical models like ANOVA or T-tests, with all of the assumptions that need to be required therein

51
Q

What is the primary disadvantage of using Bayesian models?

A

Bayesian statistics comes with the knowledge barrier of needing to know how to write computer code and the material barriers of needing to have strong computers