Exam 1 Flashcards

1
Q

Science is the search for ______ through the accumulation of facts

A

truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True, actual state of things; reality, never truly known

A

Truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Measurement of truth, only known, perceived reality

A

Fact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True or false: Can truth be known?

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why can’t truth be known?

A

Due to error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two types of error?

A
  1. Sampling error- can never measure perfectly, different estimates, can’t make perfect or fully eliminate
  2. Process error- other factors that contribute to a particular outcome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Other factors that can influence an outcome are chalked up to “_____” or ______.

A

Error or noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Purpose of statistics?

A

Estimate truth from measurements of a sample; tool to estimate truth from facts; “statistics” means estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Accuracy vs Precision

A

Accuracy: how close an estimate is to truth
Precision: How repeatable an estimate is; how close measurements are to each other
Most procedures are accurate/precise
Tradeoff between accuracy and precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

5 steps of the Scientific Method

A
  1. Observation: field, previous studies, statistics, math models
  2. Hypothesis: generated by observation
  3. Experiment: collect data
  4. Analysis: data
  5. Conclusion: interpretation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Scientific hypothesis is a statement of _____

A

Truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

T/F: Predictions are specific to your experiment

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a variable?

A

Something that is “measured” or manipulated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Dependent variable?

A

Y; depends on x; response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Independent variable?

A

X; causes y; predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Variables in which every number is possible

A

Continuous variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Discrete or different group (Ex: drug or no drug; F or M)

A

Categorical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Types of categorical variables?

A
  1. Binomial: 2 groups
  2. Nominal variables: ‘named’; or multinominal- greater than 2 groups
  3. Ordinal: ranked: low, med, high
  4. Counts: # of species in habitat
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
You hypothesize that controlled burns will increase the number of Gopher tortoises in pine stand. You conduct an experiment in which you burn every 3 years in 3 experimental stands and don't burn at all in 3 control stands. At the end of 10 years, you measure the # of Gopher tortoises in 6 year stands. What is dependent variable in your study?
A. Stand
B. Gopher density
C. Burn/control
D. Time
A

B. Gopher density

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
Continued; what type of variable is burn/control?
A. Ordinal
B. Continuous
C. Binomial
D. Count
A

Binomial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q
Continued; What type is variable is Gopher density? Note the units of density is tortoises/km^2.
A. Continuous
B. Count
C. Ordinal
D. Binomial
A

A. Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Is this a statement of truth or fact: In a study, we found there were 127 deer living in Auburn city limits

A

Fact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

_____- used anytime you have a continuous x and continuous y

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Purpose of regression?

A

Estimate equation from data to get slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the linear model equation?

A

yi = B0 + B1Xi + E ~ N(O, s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

B0 is the ____

A

Intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

B1 is the ____

A

slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

A nuisance parameter is used to?

A

Needed to prevent errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the Principle of Parsimony?

A

Assume no relationship unless data says otherwise

Simplest explanation is best

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Null hypothesis testing

A
  1. Formulate null hypothesis: statistical hypothesis, written as equation, specifies expected outcome of statistical procedure, opposite of specific hypothesis, specifies no relationship H0=#H=#T
  2. Collect data
  3. Calculate probability of your outcome given/assuming Null is true
  4. Evaluate probability to make conclusions about H0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Regression:

H0=slope=?

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

If p is less than 0.05?

A

Reject null hypothesis; scientific hypothesis was supported; there is a relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

If p is greater than 0.05

A

Fail to reject H0; scientific hypothesis was not supported

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the p value?

A

Probability of getting the observed outcome (or something more extreme) given that the null hypothesis is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What are the 3 influences on p value in regression?

A
  1. Actual slope- “effect”; greater slope, smaller p value
  2. Noise or error- increase noise, bigger p value
  3. Sample size- increase n, decrease p value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Rules for p values

A
  1. If p is really small, report
  2. Number- always report 2 nonzero values after the decimal
  3. Scaling- scale your x units, then scale beta by same amount
  4. Always report results regardless of p
37
Q

2 attributes to get

A
  1. Error=0

2. squared error=0

38
Q

How is p value calculated?

A

Calculated by partitioning total variation in y:
Partition into:
1. Sum of squared errors
2. Sum of squares due to regression (variation in y due to variation in x)

39
Q

If SSR/SSE is large:

A

Very small p

40
Q

If SSR/SSE is small

A

Very big p

41
Q

SSR=0 and SSE=?

A

TSS

42
Q

Best fit is found by?

A
  1. Minimizing error and =0

2. Partitioning total variation in y

43
Q

What are some errors in hypothesis testing?

A
  1. Type I error: Happens 5% of the time when H0 is true; reject null when true
  2. Type II: Fail to reject H0 when it is false
44
Q

What do we mean by assumptions?

A

About data to calculate probability
Regression is robust to violations of assumptions- even if violated, rarely influences results
p values are generally affect more

45
Q

What are the 5 assumptions and what do they look like graphically?

A
  1. Continuous y variable: not categorical
  2. Error is normally distributed- Seen violation by unusually distributed error, not symmetrical around line
  3. Linear relationship between x and y
  4. Homoscedasticity- st. dev. is same for all x/y. Violation seen by low variance in error
  5. Independent samples- no autocorrelation!- Violation see by “path”
46
Q

What are the 2 sources of autocorrelation?

A
  1. Spatial: ex: pollution in river

2. Temporal: ex: height/age every 3 days

47
Q

What is kurtosis?

A

Pinched/bulging sides of bell curves but normally distributed. Cause? Points close to line at low x, opposite at high x

48
Q

Experiment where you want to know the effect of understory plant density on gopher density. Over 30 plots you measure understory density and estimate gopher density. What is the null hypothesis?
A. %gophers=%understory density
B. understory=gophers
C. Change in gopher density divided by change in understory=0
D. As plant density increases, gopher density increases

A

C. Change in gopher density divided by change in understory=0

49
Q

You found a p of 0.01. What does that mean for the null?

A

Reject

50
Q

What are 2 components of partitioning the y variable into 2 components?

A

Sum of squared error and sum of squares due to regression

51
Q

What 3 things do you typically want out of a regression?

A
  1. Slope- relationship of x and y
  2. P-value- probability given slope even though real slope is 0
  3. r^2- measure of strength of relationship
52
Q

Definition of slope

A

How much y changes for each unit change in x

53
Q

R^2?

A

Proportion of variation in y “caused” by x

0 { r^2 { 1

54
Q

High r means?

A

Strong relationship and variations are driven by each other

55
Q

Low r means

A

More error

56
Q

If r=1

A

ALL variation in y is explained by x

57
Q

If r=0

A

All variation in y is error

58
Q

R^2 can be found by?

A

SSR/TSS

59
Q

Calculation of p?

A

TSS: variation in y
SSR: variation in y explained by x
SSE: variation in y explained by error
p=SSR/SSE

60
Q

T/F R^2 is influenced by sample size

A

False! P is!

61
Q

What are measures of uncertainity?

A
  • Standard deviation- measure of variation in data
  • 66% of all data is within 1 St.Dev
  • 95% of all data is within 2 st. dev
  • 99% of all data is within 3 st. dev
62
Q

The confidence intervals are?

A

95% of all such intervals contain truth

63
Q

Rule: If p=0.05, then 1 C.I. must be?

A

0

64
Q

Increased n = ?

A

Decreased C.I.

65
Q

Decreased st. dev.

A

Decreased C.I.

66
Q

Ex: If p is greater than 0.05, then 95% CI is less than slope then ?

A

0 is not contained in the interval

67
Q

If p is greater than 0.05, then 95% CI is greater than slope then ?

A

0 will be contained in the interval

68
Q

T/F: Slope must be contained in the C.I.

A

True

69
Q

What is extrapolation?

A

Making predictions outside the observed x or y values

70
Q

What is interpolation?

A

Making predictions within range of observed values- good!

71
Q

What are prediction intervals?

A

Measures of uncertainty that captures the data or outcomes

72
Q

C.I. vs P.I.

A

CI: uncertainty in estimates/averages
PI: uncertainty in data/outcomes

73
Q

What test do you use with a continuous y but a categorical x?

A

t-test

74
Q

What is the B1 then?

A

Difference between the two categorical xs

75
Q

In regression, H0: slope=0

In Binomial x: ?

A

H0: slope=0

Difference between groups=0

76
Q

T/F: We report r2 in our binomial data

A

False

77
Q

P-value

A

Probability of getting the observed data or something more extreme, given that the null hypothesis (H0) is true

78
Q

Difference between statistical significance vs biological significance

A

Statistical significance: p is less than 0.05
deals with certainty of results
not about biologically meaningful
Biological significance: effect- observed importance
Stats cannot tell you this
large p does not always mean unimportant effect

79
Q

What are some problems with p values and null testing?

A
  1. P values say nothing about your data
  2. H0 are rarely true or even possible (except in manipulated experiments)
    3) p values are a function of:
    - effect size
    - sample size
80
Q

Ex: A big p value means?

A

1) No or small effect

2) Sample size too small

81
Q

Ex: A small p value means?

A

1) Big effect

2) Big sample size

82
Q

Johnson suggests…?

A

Report your confidence intervals with estimates of effect

83
Q

B0 in binomial tests mean?

A

Average y for reference

84
Q

B1x in binomial tests mean?

A

Difference in y between 2 groups (Reference and other group)

85
Q

Group left out in binomial test is the ______?

A

Reference

86
Q

Probability of getting a type 1 error while running 6 tests are?

A

1 - ( ( 1 - 0.05 )^6)

87
Q

Is there an inflated risk of committing a type I error by running multiple t-tests?

A

Not really a problem- calculate single p by testing H0 that all groups are the same

88
Q

In ANOVA testing, if p is less than 0.05, reject H0 that all groups are the same but that doesn’t mean ____?

A

All group are different! Means at least 2 are different

89
Q

Experimental design rules

A

1) Think about how you will analyze your data before you collect it
2) You must have replication- greater than 1 sample per treatment
3) Assumptions have to be representative: Random, Stratified, systematic)
4) Need good assignment treatments
5) Sample must be independent- close together in space and time
6) With categorical variables- must have something to compare to (pre vs. post density)
7) With continuous x variables you should have representative x values
8) How many samples? As many as you can get!