Sub-topic 1: Variables, distributions and summary statistics Flashcards

Week 2

1
Q

Biologists make observations of (collect data on) selected
variables on a sample from the population to estimate
the value of one or more parameters of that
population.

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
Variable: any observable feature of the
natural world (e.g. number of limpets in
a quadrat, sex of a frog, moisture
content of a leaf). These are all variables
as they have the potential to vary
A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Population: the target group of interest in
the study. Can be finite (e.g. number of fish
in a pond) or infinite (e.g. number of fish in
the ocean)

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
Sample: we cannot practically count
every unit in a population, therefore we
sample a subset of a population and
attempt to draw inferences about the
entire population from this sample
A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Parameters: a parameter is some
characteristic of the distribution of the
variables in a population (e.g. the
average or variance of weights of fish in
a pond)

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

VARIABLE A variable is any observable feature of the natural world

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DATUM A datum, or observation, is any one record of the state of a variable.

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DATASET Any collection of observations made on a variable is a data set

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

POPULATION The set of all possible observations on a variable is the population

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

FINITE
POPULATION
Populations can be either finite or infinite. Finite populations have a finite, countable
number of elements and can, in theory at least, be completely sampled.

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

INFINITE
POPULATION
Infinite populations have an infinite number of elements and can never be completely
sampled.

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SAMPLE Large and infinite populations cannot be observed in their entirety, so we take only
(nearly always randomly) a sample (sub)set of observations from a population.

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

PARAMETER A parameter is some characteristic of the distribution of the values of a variable in a
population.

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

STATISTIC The term “statistic” is used in two ways: to refer to the entire body of procedures for
dealing with data; or to refer to estimates of population parameters based on samples.

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

NOMINAL/
CLASSIFICATION
Features which can be classified into named groups, lacking
order

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

ORDINAL/
RANKING
Features which can be ranked in order

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

NUMERICAL/
QUANTITATIVE
Features which can be enumerated or quantified (counted or
measured)
e.g. weight, number, temperature, counts of animals
Can be subdivided into
• e.g. Interval v Ratio: arbitrary zero and unit
(temperature [Celsius]) v true zero (weight)
• Discrete v Continuous: values which are whole
numbers (counts) v values which can be fractions
(weight)

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
Measures of location
Mode
most common value
Median
middle value
Mean
average value
A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
Symbols
Sample mean
⨱
Population mean
μ
A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
Measures of shape
Variance
spread of distribution
Skewness
skew of peak of distribution to one
side of the mean
Kurtosis
“peakedness” or “flatness” of
distribution
A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q
Symbols
Sample
variance
s2
Population
variance
σ2
Sample standard
deviation
s
Population standard
deviation
σ
A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

VARIANCE (s2): measures the dispersion of data around their

mean value

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

NORMAL DISTRIBUTION: a symmetric distribution, often called a
bell-curve which describes many parameters of the natural world,
e.g. height, weight, test scores in a very large class

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

STANDARD DEVIATION: �2 = a measure of dispersion in the

data, standardised relative to the mean

A

Yes.

25
Q

SKEWNESS: measures the extent to which the distribution is

“pushed” to either side of the mean

A

Yes.

26
Q

KURTOSIS: measures the “peakedness” of the distribution

A

Yes.

27
Q

meso – middle, intermediate, halfway
• lepto – small, fine, thin, delicate
• platy – broad, flat

A

Yes.

28
Q
Accuracy
• How close the estimate is to the
true value
• A biased method gives estimates
which differ consistently from
the true value
• Cannot be determined from the
data
A

Yes.

29
Q
Precision
• How close repeated estimates
are
• Precision can be determined
from the data (standard error,
confidence interval)
A

Yes.

30
Q
Accuracy – how true (unbiased) is the
result?
• Requires attention to the methods
of sample selection and
measurement
• Ensure no bias in instruments or
techniques
• Calibrate, ground-truth or similar
• Randomly select samples
A

Yes.

31
Q
Precision – how variable is the result?
• Requires attention to sampling
design and effort
• Vary the number of samples taken
Vary the size of the sampling unit
• Vary the arrangement of the
sampling units
A

Yes.

32
Q
Aim
• To estimate the lead content of oysters
with a SE (standard error) of 8 ppm
Prior sampling indicates that s2
(variance) is usually about 900 ppm
Method
• Calculate SE for varying n and use
number which gives desired SE
> this is about 14
A

Yes.

33
Q

Recap: Take home messages
• Variables are any feature of the world – divided into 3 main groups:
• NOMINAL/CLASSIFICATION variables can be classified into named groups
e.g. sex, colour, habitat type etc.
• ORDINAL/RANKING variables can be ranked in order e.g. social position,
size-class, education level etc.
• NUMERICAL/QUANTITATIVE variables can be enumerated or quantified
(counted or measured), e.g. height, weight etc.

A

Yes.

34
Q

• Variable distribution can be graphically represented in frequency
histograms showing the ‘shape’ and ‘spread’ of the data

A

Yes.

35
Q

Summary statistics can be broadly divided into measures of location
(e.g. mean, median and mode) and measures of shape (e.g. variance
and standard deviation)

A

Yes.

36
Q

Shape of distribution is vital in choosing appropriate statistical tests

A

Yes.

37
Q

To be reliable observations and estimates should be accurate (not
showing bias) and precise (not too variable)

A

Yes.

38
Q
Precision – how variable is the result?
• Requires attention to sampling
design and effort
• Vary the number of samples taken
Vary the size of the sampling unit
• Vary the arrangement of the
sampling units
A

Yes.

39
Q
Scheme Advantages Disadvantages Uses
Simple
random
(SR)
Usually simple to
use
Provides limited
information,
probably not
efficient or precise
Pilot studies,
simple studies
Stratified
Usually provides
more precise
results than other
methods;
provides more
information than
SR
More complex to
run and analyse;
may take more
time to sample
Situations where
an area, or
population, can
be divided into
homogeneous
strata; testing
hypotheses
Cluster
When the
situation is
suitable, this
scheme is likely to
be more efficient;
provides more
information than
SR
More complex to
run and analyse;
may be less
precise than
stratified sampling
Situations where
items of interest
are naturally
grouped in
clusters; can be
used to test
some types of
hypotheses
Systematic Usually simple to
use
Unless done
carefully, may
provide biased
estimates
Drawing maps
and similar
situations
A

Yes.

40
Q

There is no one-size-fits-all approach, and depends on, e.g.:
• Cost/benefit à pilot studies useful in this regard
• Accuracy and precision trade-offs
• The study focus and ramifications
• Three main samplings schemes (and a special fourth case):
• Simple random
• Stratified random
• Cluster
• (Systematic)
• Models may be developed from observations and tested by
• Sampling (mensurative) experiments [more general results];
or,
• Manipulative experiments, generally trickier but can provide
much more explicit tests of mechanisms

A

Yes.

41
Q

Use a balanced design when possible
• Balanced designs have equal numbers of replicates in all treatments
• Analysis is usually easier
• More readily meet the assumptions necessary for some tests
• May conflict with the requirements of a stratified sampling scheme

A

Yes.

42
Q

Use a multifactor design when possible
• These designs are usually more efficient (more powerful for same
effort)
• These designs are usually more informative
• Ensure that all combinations of treatments are included
This is referred to as an orthogonal design
Non-orthogonal designs can be difficult to analyse

A

Yes.

43
Q

REPLICATION à DO IT!
• Studies must be replicated in order to draw correct inferences
• Avoiding pseudoreplication is imperative and requires close
attention

A

Yes.

44
Q

CONFOUNDING FACTORS
• To be avoided at all costs
• Makes it very difficult, if not impossible, to test hypotheses

A

Yes.

45
Q

RANDOM & INDEPENDENT
• Random samples alleviate bias and maintain independence
• Non-independence can lead to incorrect conclusions
• May be done if implicitly part of the study and
appropriately accounted for

A

Yes.

46
Q

BALANCED & COMPLETE (ORTHOGONAL)
• Usually provide more/better information
• Easier to analyse in many instances
• Unbalanced can be accommodated, incomplete not so much

A

Yes.

47
Q

Test Statistics
• Theoretical distributions based on sample data for varying sample sizes
(degrees of freedom)

A

Yes.

48
Q

Test Statistics
• Theoretical distributions based on sample data for varying sample sizes
(degrees of freedom)

A

Yes.

49
Q

Multiple pairwise tests (3+ means)
• DON’T DO IT
• Greatly inflate Type I error rate (≈ 5% per comparison)
• ANOVA
• Use when comparing 3+ means
• Controls � at 0.05 (5%) for the entire procedure
• Assumptions
• Independent, normal, equal variance, additive
• Check assumptions graphically and using Cochran’s test
*generally robust to violations of normality and equal
variance assumptions

A

Yes.

50
Q

Post-hoc multiple comparisons
• ANOVA identifies significant result but doesn’t tell you WHERE
• Use post-hoc tests, i.e. Tukey’s HSD to determine which means
differ

A

Yes.

51
Q
Sub-topic 5: Two-factor ANOVA
Design
• Two rivers were sampled: one with pollutant
released in upper reaches; the other was the
closest similar unpolluted control
• Samples were taken in the upper reaches of each
river, at the mouth, and about half-way between
• 3 water samples were collected at each
combination of river and section
• The number of plankton was counted and
pollutants measured
Factors
River (Pollution): Polluted, Control
Section (Area): Upper, Middle, Lower
Replicates
Water samples: 3
Variables
Number of plankton
Pollutants
A

Yes.

52
Q
Arrangement and number of factors:
Stratified (crossed/orthogonal)
– all levels of one factor are present
with all levels of the other(s) factor
Cluster (nested/hierarchical)
– some levels of one factor are present
only at some levels of the other(s)
factor
Selection and number of replicates:
Random
– the replicates in each subgroup
are randomly and independently
selected
Repeated measures
– the replicates in some subgroups are
the same as replicates in other
subgroups
Sub-topic 5: Two-factor ANOVA
Selection and number of levels:
Fixed
– specific levels are chosen from the
range available (or all available levels are
used)
Random
– the levels in the study are randomly
selected from those available and not all
available are used
These affect:
- the complexity of the design
- the appropriate model for the analysis
- the power of tests for different effects
A

Yes.

53
Q
Correlated variables vary together
Parametric (Pearson’s)
• For numerical or quantitative
variables
• r measures the closeness of the
relationship
• Correlation analyses linear
relationships
Non-linear relations need other
methods
Non-parametric (Spearman’s rank)
• Non-parametric correlation is used
when one or both variables is ordinal
• May be useful when the assumptions
of parametric analyses do not hold
A

Yes.

54
Q
Assumptions of parametric correlation
Normality of observations
The observations on each of the
variables are assumed to be normally
distributed
Linearity of relationship
• The relationship is assumed to be
linear (a straight line)
• Checking assumptions
Normality of observations
• With many observations (>40) can
plot frequency distribution
• With fewer observations can do
normal probability plots (not
discussed in this unit)
Linearity of relationship
Graph the data!
A

Yes.

55
Q

A valid test for a correlation requires the following:
• Quantitative observations: if one or both variables are ranking, or ordinal,
variables, use the non-parametric correlation coefficient (Sub-Topic 2)

A

Yes.

56
Q

Independent observations: the selection of one point (e.g. animal) must not
influence whether or not any other point is selected. Analyses of nonindependent observations may be unreliable

A

Yes.

57
Q

Observations bivariate normal: this means that the two variables must be
normally distributed. If one or both variables are not normally distributed it
may be possible to transform them so that they are (Sub-Topic 4)

A

Yes.

58
Q

Linear relationship: the correlation coefficient measures only the degree of
linear relationship. If the relationship is not linear it may be possible to
transform one or both variables so that it is (Sub-topic 4)

A

Yes.