Lab 5 Flashcards
Ecological data,
whether collected in the lab or in the field, has the distinction of being abundant and naturally varied.
Many abiotic and biotic variables must be examined or controlled so that a given variable can be assessed in various treatments or situations.
Environmental variables can influence natural variation.
If you only look at raw data, it can be difficult to see whether true differences exist between biological populations or variables.
Statistics are necessary to objectively assess if differences or relationships exist between biological populations and to evaluate ecological hypotheses.
It is rare that an entire population can be sampled
Samples are the logistical solution to this problem, remembering that there will be uncertainty generated because it is an estimate of the whole statistical population.
There is also inherent variability within biological populations due to genetic differences among individuals.
On a larger scale, favorable environments can result in pockets of individuals that have limited contact between each other and may have group differences that do not prevent interbreeding.
There are many potential hypotheses relating to the response of individuals and communities to the environment, and due to the variable responses of individuals we need to use statistics to determine if you see an effect beyond the natural variation.
Organizing
- After you have downloaded the dataset, open it in Excel. Look for the summary or treatment data.
Remember to be sure there is a column identifier for each column.
In Excel, the first row is often used as the row for headings, and the following rows each represent one replicate (in this case, one quadrat).
- After you have looked at the data, consider the sample size in this data set (in other words, how many samples do you have?). If you wanted a more accurate estimate of the variation in a measured variable, what would you do?
Once you have worked through the descriptive statistics with the data, you should take a moment and think about which statistical tests you need to perform on your data to evaluate your hypotheses for your Scientific Article.
quantitative ( measurement ) data;
values on a numerical scale. Data in this category may exist either as discontinuous data (also called discrete data) (e.g. like number or individuals) or continuous data (like soil moisture)
qualitative (nominal ) data;
where you as the experimenter determine a classification, set of categories or attributes and record outcomes as counts or frequencies placed in your categories (for example: no defoliation, slight defoliation, etc.).
Descriptions of patterns, gender, and visible differences between groups are good examples of nominal data descriptions.
Why should you care about what type of data you collect?
The graph, data summary 11 and statistical test will be different for different types of data.
Graphing with measurement data is done as a line or bar graph with error bars, or a scatter plot. Data summary is done as averages and an associated measure of the population variation.
With nominal data, graphing may be done as a bar graph, but there will be no error bars or measure of variation (number of females compared to males in a sample does not have variability to report). Data summary consists of reporting the data gathered as frequencies or proportions (including percentages) for each of the categories.
Normality
One important use of statistics is that they can determine (with some degree of confidence) how similar or different two samples really are.
parametric statistics:
The use of parametric statistics makes assumptions about the data being tested –> assumption of the distribution of our measured data, namely that a frequency histogram of the data conforms to a theoretical Gaussian or normal curve distribution.
normal curve distribution
In this type of distribution, observations are grouped symmetrically about the mean.
The shape of the distribution is such that
68.3% of the observations (or area under the curve) are within one standard deviation of the mean,
95.4% are within two standard deviations,
and 99.7% are within three standard deviations of the mean
(remember that the area under the bars of a histogram is proportional to the frequency of observations).
Many, but not all, types of biological data are similar to a normal distribution. Before making this assumption, a frequency histogram of the data should be drawn, and its shape examined.
non - parametric statistics
which have no assumptions about the data conforming to some theoretical distribution (such as the χ 2 test which we used previously).
For most parametric statistical tests there is a non -parametric equivalent test.
Method for assessing Normality
Now that you have downloaded your data, you can examine your data to assess the normality of your variables.
For our purposes, if a distribution has a single mode and it is relatively symmetrical, the normal distribution will be assumed.
If you follow the histogram procedures and discover that your data is highly skewed, then you will need to mention this in your discussion of your scientific article (i.e., mention that you would recommend reanalyzing your data using a non- parametric test or transforming your data).
Luckily for us, many parametric procedures are robust to departures in normality (such as the t -test ), so only extreme departures would have an effect
Descriptive statistics
types of analyses (calculation of mean, mode, median, standard deviation, variance, standard error and confidence intervals) are very useful to get a clearer picture of what quantitative/ measurement data might be telling us about the variable of interest.
Figures (often graphs) 13 help the reader to visualize the summarized data, and written descriptions of these statistics give a better picture of what is occurring in the experiment or in the ecosystem.
The most widely used average (or measure of central tendency) is the mean (x “bar”)
One common descriptor reported for a variable (such as species diversity) is the average for each treatment or category (such as side of the river valley or treated and untreated areas of the pond).
Two other measures of central tendency are the median and the mode.
The median is the middle observation, above and below which half of the observations lie.
The mode is the most frequent observation made.
When would you use the median or the mode instead of the mean?
The median or mode is an appropriate measure of central tendency if you are working data that is not normally distributed. (where the most frequent observation is also in the middle of the range).
sample data
is variable between individual observations (for example, not all the quadrats have the same diversity ).
If there were no variability within populations, there would be no need for statistics.
The simplest measure of variability is
The range.
This is the largest value minus the smallest value (often it is given by stating the largest and smallest value).
While simple to calculate, the range is limited in its usefulness because it gives no information about how the observations are distributed (are there more observations in the middle or at either end of the range?).
The range should be used when all that is required is the knowledge of the overall spread of the data, or when observations are too few or too scattered to warrant the calculation of a more precise measure of variability.