Brief Review of Intro to Statistics (Module 14) Flashcards
What type of data analysis would be appropriate for 1 continuous explanatory variable (x) and 1 continuous response variable (y)?
simple linear regression
What type of data analysis would be appropriate for multiple continuous explanatory variables (x1, x2, … x[n]) and 1 continuous response variable (y)?
multiple linear regression
What 2 types of data analyses would be appropriate for 1 categorical variable (x) and 1 continuous response variable (y)?
(1) T-test
(2) one-way ANOVA
Under what circumstance would we run a T-test rather than a one-way ANOVA?
If the categorical explanatory variable is binary, such as for sex (‘male’, ‘female’) or the main belligerents in the Wars of the Roses (‘House of York’, ‘House of Tudor’), we perform a T-test.
Under what circumstance would we run a one-way ANOVA rather than a T-test?
If the categorical explanatory variable has more than two possibilities, such as regions of Italy (‘Tuscany’, ‘Campania’, ‘Sicilia’, etc.), or different types of fruit people commonly pack for lunch (‘Oranges’, ‘Bananas’, ‘Apples’, etc.), we perform a one-way ANOVA.
What is the difference between a one-way ANOVA and a two-way ANOVA?
A one-way ANOVA will have only one explanatory variable (x), while a two-way ANOVA will have two explanatory variables (x1, x2).
What is an example of a question we might use a two-way ANOVA to answer which concerns people’s preferred fruit (x1), their annual average mileage (y), and the state in the U.S. they live (x2)?
Is the average annual mileage that people drive influenced by their favorite fruit and does that depend on the U.S. state where they live?
What does the phrase “n-way ANOVA” mean?
ANOVAs can be performed with as many explanatory variables as one wants - “n” represents the number in the analysis, whether that is two or three or ten, etc.
What type of data analysis would be appropriate for 1 categorical explanatory variable (x) and 1 categorical response variable (y)?
Analysis of Contingency Tables
In an analysis of contingency tables, if both the explanatory and the response variables are both binary, what term do we use to describe the test we perform?
Analysis of a Two-Way Contingency Table
What is it called if one or both of the explanatory or response variables in an Analysis of Contingency Tables has more than two possible entries?
Analysis of a R-by-C (Row-by-Column) Contingency Table
What type of analysis would we perform if we have 1 continuous explanatory variable (x) and 1 categorical response variable (y)?
simple logistic regression
What type of analysis would we perform if we have more than 1 continuous explanatory variable (x1, x2, … x[n]) and 1 categorical response variable (y)?
multiple logistic regression
T-tests, ANOVA, linear regressions and logistic regressions are all part of what family of mathematical concepts?
linear models
What is the term and a description for the first assumption of linear models?
LINEARITY - stipulates that there is a linear relationship between the explanatory variable (x) and the response variable (y)
What is the term and a description for the second assumption of linear models?
NORMALITY - for any given value of the explanatory variable (x), the values of the response variable (y) have normally distributed errors
What is the term and a description for the third assumption of linear models?
HOMOGENEITY OF VARIANCE - the variance in the response variable (y) is constant across a range of explanatory variable (x) outputs
What is the term and a description for the fourth assumption of linear models?
INDEPENDENCE - for any given value of the explanatory variable (x), the values from the response variable (y) have independent errors
FILL IN THE BLANKS: For linear models, decent method for (1)______________ and a strong (2)____________________ will make it easier to analyze the data than (3)______________ or (4)_______________ it after the fact to better fit the (5)___________________.
(1) sampling
(2) experimental design
(3) transforming
(4) sub-setting
(5) assumptions of linearity
What kind of data is able to be analyzed in multiple different ways and may reveal answers to multiple different questions?
data collected in accordance with a sound experimental design
What term refers to the ability to detect a pattern when one is present in the data?
statistical power
What component of proper experimental design culminates in a good statistical power?
selecting the necessary number of replicates
What do we call data for which there are multiple response variables (y1, y2, … y[n]) for each sampling unit or observation?
multivariate data
What would be an example of research which collects multivariate data from the hydrological sciences?
The sampling unit is Lake Apopka, from which we collect multiple measurements (y1 = pH, y2 = dissolved oxygen, … y[n-1] = salinity, y[n] = nitrate concentration)
What would be an example of research which collects multivariate data from the science of forestry?
The sampling unit is the German Schwarzwald, from which we collect multiple measurements (y1= canopy height, y2 = canopy cover, … y[n-1] = snag density, y[n] = trunk diameter)
Multivariate analysis always involves multiple response variables (y1, y2, … y[n]), but we can have (1)______________________, referred to as (2)____; or (3)________________________, referred to as (4)__________________; or even (5)______________________.
(1) one explanatory variable
(2) x
(3) multiple explanatory variables
(4) x1, x2, … x[n]
(5) no explanatory variables
The linear models studied in this class have only focused on explanatory variables which have been chosen ahead of time (i.e., the treatment groups). What are these referred to as?
fixed effects
What do the fixed effects influence?
the mean of the response variable
What is the difference between a Fixed Effects Model (FEM), Random Effects Model (REM), and a Mixed Effects Model (MEM)?
FEMs only contain parameters which are fixed or non-random quantities
REMs only contain parameters which are random quantities
MEMs contain both fixed effects and random effects
What is the definition of a random effect?
A variable which is represented by a random sample of all the possible levels of said variable
What do random effects influence?
the variance of the response variable
What type of effects represent unobserved variables?
random effects represent unobserved variables
Are you interested in the size of the effect? If you are, it is most likely a (1)_______________. If you are not interested, it is most likely a (2)_______________.
(1) fixed effect
(2) random effect
Is it reasonable to think that the factor levels arise from a population of levels? If it is, the variable likely has (1)________________. If it is not reasonable to assume that, the variable likely has (2)_________________.
(1) random effects
(2) fixed effects
Are there sufficient levels of the factor in the data frame that an estimate of the variance of the population effects can be based? If yes, the factor likely has (1)________________. If no, the factor likely has (2)___________________.
(1) random effects
(2) fixed effects
Are the factor levels of the explanatory variable informative? If they are, we are likely dealing with ____________________.
fixed effects
Are the levels of a variable just numeric labels? If they are, the variable probably has _____________________.
random effects
Consider the four variables below:
(1) sex assigned at birth (M/F)
(2) the colors on a rose (red/white)
(3) possible Summer temperatures to the nearest whole degree Celsius (integer values, 78 =< x =< 111)
(4) one strain of ‘B. bassiana’ versus another possible strain (ANT-03 vs. GHA)
Would we expect fixed effects or random effects from these variables?
fixed effects would be expected
Consider the four variables below:
(1) day selected at which to perform a certain collection of data given the demands of the researcher’s schedule
(2) site selected at which to collect data
(3) responses from different members of the same household
(4) the nesting attempts by the same bird
Would we expect fixed effects or random effects from these variables?
random effects from these variables would be expected
What do we call flexible linear models which allow for non-normal error terms to be applied to response variables?
generalized linear models (GLMs)
What are the four possible types of distribution seen in generalized linear models?
normal
binomial
Poisson
gamma
What makes GLMs more useful in some respects than simpler linear models, even if it also makes them more conceptually challenging?
GLMs can unify several types of data analysis into the same model framework
Most of the statistics we have learned in “Intro to Stats” is referred to typically as what?
frequentist statistics
What is another name for frequentist statistics, albeit one which is slightly less descriptive?
classical statistics
The main idea of frequentist statistics can be summarized how?
consideration of uncertainty in terms of the expected outcome to the statistic under repeated sampling iterations
What is the desired culmination of the repeated sampling iteration one performs under frequentist statistics?
obtaining the P-value for the desired confidence interval
What is the most common P-value, which is a direct reflection of the most common desired confidence interval?
the most common P-value (alpha-level) is 0.05, which describes a 95% confidence interval
What development has allowed Bayesian statistics to become more popular in the 21st century than it was in the early 20th century and beforehand?
advent of cheap computing technologies
What is it called if we run a linear model in which we know the errors are not normally distributed but we assume that we have lots of data that can compensate for that short-coming?
asymptotic theory
What is the main advantage that Bayesian statistics has over other linear models?
Bayesian statistics frees the researcher to devise a model which best matches the problem at hand, rather than one which forces the data into one of the premade statistical models like ANOVA or T-tests, with all of the assumptions that need to be required therein
What is the primary disadvantage of using Bayesian models?
Bayesian statistics comes with the knowledge barrier of needing to know how to write computer code and the material barriers of needing to have strong computers