Lab 5 Flashcards

1
Q

Ecological data,

A

whether collected in the lab or in the field, has the distinction of being abundant and naturally varied.

Many abiotic and biotic variables must be examined or controlled so that a given variable can be assessed in various treatments or situations.

Environmental variables can influence natural variation.

If you only look at raw data, it can be difficult to see whether true differences exist between biological populations or variables.

Statistics are necessary to objectively assess if differences or relationships exist between biological populations and to evaluate ecological hypotheses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

It is rare that an entire population can be sampled

A

Samples are the logistical solution to this problem, remembering that there will be uncertainty generated because it is an estimate of the whole statistical population.

There is also inherent variability within biological populations due to genetic differences among individuals.

On a larger scale, favorable environments can result in pockets of individuals that have limited contact between each other and may have group differences that do not prevent interbreeding.

There are many potential hypotheses relating to the response of individuals and communities to the environment, and due to the variable responses of individuals we need to use statistics to determine if you see an effect beyond the natural variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Organizing

A
  1. After you have downloaded the dataset, open it in Excel. Look for the summary or treatment data.

Remember to be sure there is a column identifier for each column.

In Excel, the first row is often used as the row for headings, and the following rows each represent one replicate (in this case, one quadrat).

  1. After you have looked at the data, consider the sample size in this data set (in other words, how many samples do you have?). If you wanted a more accurate estimate of the variation in a measured variable, what would you do?

Once you have worked through the descriptive statistics with the data, you should take a moment and think about which statistical tests you need to perform on your data to evaluate your hypotheses for your Scientific Article.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

quantitative ( measurement ) data;

A

values on a numerical scale. Data in this category may exist either as discontinuous data (also called discrete data) (e.g. like number or individuals) or continuous data (like soil moisture)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

qualitative (nominal ) data;

A

where you as the experimenter determine a classification, set of categories or attributes and record outcomes as counts or frequencies placed in your categories (for example: no defoliation, slight defoliation, etc.).

Descriptions of patterns, gender, and visible differences between groups are good examples of nominal data descriptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why should you care about what type of data you collect?

A

The graph, data summary 11 and statistical test will be different for different types of data.

Graphing with measurement data is done as a line or bar graph with error bars, or a scatter plot. Data summary is done as averages and an associated measure of the population variation.

With nominal data, graphing may be done as a bar graph, but there will be no error bars or measure of variation (number of females compared to males in a sample does not have variability to report). Data summary consists of reporting the data gathered as frequencies or proportions (including percentages) for each of the categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Normality

A

One important use of statistics is that they can determine (with some degree of confidence) how similar or different two samples really are.

parametric statistics:
The use of parametric statistics makes assumptions about the data being tested –> assumption of the distribution of our measured data, namely that a frequency histogram of the data conforms to a theoretical Gaussian or normal curve distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

normal curve distribution

A

In this type of distribution, observations are grouped symmetrically about the mean.

The shape of the distribution is such that

68.3% of the observations (or area under the curve) are within one standard deviation of the mean,

95.4% are within two standard deviations,

and 99.7% are within three standard deviations of the mean

(remember that the area under the bars of a histogram is proportional to the frequency of observations).

Many, but not all, types of biological data are similar to a normal distribution. Before making this assumption, a frequency histogram of the data should be drawn, and its shape examined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

non - parametric statistics

A

which have no assumptions about the data conforming to some theoretical distribution (such as the χ 2 test which we used previously).

For most parametric statistical tests there is a non -parametric equivalent test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Method for assessing Normality

A

Now that you have downloaded your data, you can examine your data to assess the normality of your variables.

For our purposes, if a distribution has a single mode and it is relatively symmetrical, the normal distribution will be assumed.

If you follow the histogram procedures and discover that your data is highly skewed, then you will need to mention this in your discussion of your scientific article (i.e., mention that you would recommend reanalyzing your data using a non- parametric test or transforming your data).

Luckily for us, many parametric procedures are robust to departures in normality (such as the t -test ), so only extreme departures would have an effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Descriptive statistics

A

types of analyses (calculation of mean, mode, median, standard deviation, variance, standard error and confidence intervals) are very useful to get a clearer picture of what quantitative/ measurement data might be telling us about the variable of interest.

Figures (often graphs) 13 help the reader to visualize the summarized data, and written descriptions of these statistics give a better picture of what is occurring in the experiment or in the ecosystem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The most widely used average (or measure of central tendency) is the mean (x “bar”)

A

One common descriptor reported for a variable (such as species diversity) is the average for each treatment or category (such as side of the river valley or treated and untreated areas of the pond).

Two other measures of central tendency are the median and the mode.

The median is the middle observation, above and below which half of the observations lie.

The mode is the most frequent observation made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When would you use the median or the mode instead of the mean?

A

The median or mode is an appropriate measure of central tendency if you are working data that is not normally distributed. (where the most frequent observation is also in the middle of the range).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

sample data

A

is variable between individual observations (for example, not all the quadrats have the same diversity ).

If there were no variability within populations, there would be no need for statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The simplest measure of variability is

A

The range.

This is the largest value minus the smallest value (often it is given by stating the largest and smallest value).

While simple to calculate, the range is limited in its usefulness because it gives no information about how the observations are distributed (are there more observations in the middle or at either end of the range?).

The range should be used when all that is required is the knowledge of the overall spread of the data, or when observations are too few or too scattered to warrant the calculation of a more precise measure of variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Variance and standard deviation:

A

Measure the variability or spread of ecological data.

Variance (s^2 ) or its square root, the standard deviation (s).

The sample variance is a measure for describing the spread of data around the sample mean.

In words, equation S5.2b states that the sample variance is equal to:

s2 = (Σx2i - (Σxi)2 / n ) / n -1

The sum of the squared observations (Σx2i), minus the sum of the observations squared (Σxi)2 divided by the number of observations in the sample (n), all divided by the number of observations minus one (n-1).

You can see by the numbers included in this equation the importance of the means, the sample size and the range of the data in determining these measures of variance.

This helps to give you an idea of where the standard deviation is coming from.

The sample standard deviation (s ) can be easily calculated by taking the square root of the variance.

It should be evident that the variance and standard deviation provides more information than the range described earlier.

17
Q

Standard Error of the Mean and Confidence Intervals

A

Often a confidence interval is displayed as the mean, plus or minus a number - this indicates the upper and lower confidence limits.

From the bottom limit to the top limit would be the confidence interval.

The 95% confidence limits can be calculated from a sample, i.e., we can be 95% sure that the true population mean lies within that interval, based on our sample of the population.

Both standard error and confidence intervals are often reported in scientific articles.

To determine confidence limits, we must first calculate the Standard Error of the Mean

Note that standard error is more sensitive to sample size than the standard deviation, as the sample size is included in calculating standard deviation and again in calculating standard error.

Standard error can be used to indicate the precision of knowing the true mean of the population.

Using the standard error of the mean value, we can then calculate the confidence interval.

Where t is a value, obtained from a t -table, for n -1 degrees of freedom and a particular level of confidence or probability (alpha level) (∝, where ∝ = 1.0 - degree of certainty).

The more variable your sample data, the larger your confidence interval will be, meaning that there is a larger interval in which your true statistical population mean may fall.

18
Q

More on Confidence intervals

A

95% confidence intervals can be useful in determining where the true population value of a parameter is likely to fall, based on a given sample.

A 95% confidence interval means that we are 95% certain that the true population value (often a mean) falls within the calculated interval that was based on our sample data.

Confidence intervals can also be another way of assessing likely differences between groups of data.

Since the confidence interval indicates the most plausible values for a population parameter, if two groups have confidence intervals that do not overlap, chances are good that those populations would also be different if analyzed statistically.

This can be a quick way of assessing parameters prior to testing hypotheses using statistics – specifically, a quick way of approximating a 95% confidence interval is doubling standard error.

Adding and subtracting 2 standard errors from a parameter mean will give a range that is usually fairly close to the 95% confidence interval.

19
Q

Parametric Comparison of Two Means: The t -test

A

A t -test is a parametric test that compares the same variable measured in two different groups and is appropriate when our research hypothesis is looking at whether or not the two groups are from the same or different statistical populations.

Generally, it uses measurement data (either continuous or discontinuous).

The test compares the means of these two groups (like our diversity between locations) to see if there is a real difference (a statistically significant difference) between the two groups or if the means appear different due to the expected variation in sampling twice.

This expected variation is often called “sampling error” – not because we have made an error in sampling, but because there is a difference due to natural variation, rather than something that we are testing (example: phenotypic plasticity, acclimation, etc.)

20
Q

In statistical testing, prior to running their statistics, the researchers come up with an acceptable level of uncertainty.

A

We are never absolutely certain that we are making the ‘correct’ conclusion when we run our statistics (the reason we are unable to ‘prove’ things) – but we are generally willing to settle for being almost certain when we decide to reject our null hypothesis, which is a statistical hypothesis that generally assumes no difference between groups, or, in other tests, no relationship between variables.

It is common for this acceptable level of uncertainty to be 5%, and this is called an alpha level (α = 0.05).

This means that 5% of the time, a scientist will be making a mistake when he or she rejects their null hypothesis and states that there is a sig nificant difference between 2 groups in a t -test.

There is also a risk of making an error where there is a true difference and it goes unnoticed, and an α = 0.05 is generally an acceptable level that minimizes both errors as much as possible.

21
Q

P value and t-test and significance or not

A

Our t -test will calculate a t- value using the data that we have collected by comparing the means of two groups as well as the variability within each group. Associated with this t -value is a P -value, or probability.

This P -value helps us to interpret the meaning of the calculated t -value more easily.

If this P -value is less than alpha ( P<0.05) then our results are statistically different.

This is usually seen with a large difference in the means, which results in a large t -value.

If P>0.05, we interpret our results to be not statistically significant (= not statistically different) and assume that any difference in the means was due to natural variation in our samples.

If there is a large amount of natural variation in our statistical populations, the difference in the means must be larger for the test to be statistically significant.

we will use a non- paired test (i.e., one that applies to samples collected independently between the two groups) that does not make assumptions about the equality of the variances, as that minimizes our error in interpreting our research hypothesis.

22
Q

Frequently, ecologists are interested in determining whether two variables are related to each other.

A

It may be that the relationship in a correlation is one of cause- and -effect but we neither know nor assume this. Often both variables may be responding to a common cause.

What we wish to estimate is the degree to which the two variables vary together, i.e., are correlated.

Once established, a significant correlation between two variables may lead to hypotheses about causal relations (and, perhaps, to experimental test of generated hypotheses), but it is important not to confuse significant correlation with causation.

However, correlation analysis often serves as an important descriptive technique used in analyzing ecological relations.

23
Q

Correlation Coefficient

A

The strength of the linear association, or correlation, between two variables is measured by the product -moment Correlation Coefficient (r).

This coefficient varies between - 1 and +1. A coefficient of 0 indicates no correlation , i.e., the values for one variable is completely independent of the corresponding values for the other variable.

Coefficients of +1 and - 1 indicate perfect positive and negative correlations, respectively.

24
Q

Linear Regression

A

In many cases, ecologists have already formulated a hypothesis regarding a cause- and - effect relationship between two variables or have even established such a relation.

In general , we use regression when we want to establish the form and significance of a functional relationship between two measured variables.

If the functional dependence of a variable y (e.g., weight of a mouse) on a second variable x (e.g., density of mice) can be described by a straight line, we can use linear regression to objectively give us the equation for the line that best fits the data.

y = a + b x

Where y is the dependent variable, x is the independent variable, a is the y-intercept and b is the slope. Thus, we need to determine values for a and b.

To predict some value of y for a specific value of x, we simply plug in the value of x into our equation and solve for y.

Linear regression tests also use an F-value to assess significance so if your test does not generate a P=- value, you would use the F-table to find an associated P -value that can help you to determine whether your variables are indicate a significant relationship.

24
Q

Methods for Correlation

A
  1. To run a correlation test, you will assess the relationship between two variables that you do not believe are functionally related (one is not dependent on the other, they simply vary together).

Remember that you need to keep all of the information about each transect together (i.e. the data from a specific row should stay together – as correlation treats them as x, y coordinates).

  1. Your first step graph the two variables using a scatterplot. To see if the scatterplot has an identifiable trend.

You can proceed with the parametric correlation test below, but you will need to mention in your discussion of your scientific article if you see that there is a different shape to the data points (i.e. mention that you would recommend reanalyzing your data using a non- linear test ).

If there is just random scatter, the correlation is an appropriate test – it will likely indicate no relationship between your variables.

  1. R will give you an r-value and a P -value for your statistical test.

Another way to assess the significance of your correlation results (or any statistical results) is to make the comparison of your calculated coefficient to the appropriate critical value. use your degrees of freedom to do this.

If your calculated r - value is greater than the critical r - value, this means that P<0.05 (statistically significant), whereas if your calculated r - value is less than the critical r -value, then P>0.05(not statistically significant).

25
Q

Methods for Linear Regression

A
  1. Linear regression should only be run when we might have reason to hypothesize that these variables are closely related ( i.e., published relationships), or we may wish to try and determine one variable from another that is known to have an influence. What is your calculated F -value? What is your P -value?
  2. Is your regression significant? What does this say about the relationship between your abiotic and biotic variables?
  3. Use the data in your output to figure out the equation for the regression line that you just calculated.
26
Q

Methods for Chi - square (χ2 ) analysis

A

Looking at our lichen species, compare is site preference.

If you examine simply the presence of lichen (not including species ), this is a count (nominal) value.

The distances can then be used as categories, which makes this data appropriate for a chi - square analysis.

First, create your hypothesis and prediction for lichen preference.

Ex: Predicted that one distance was preferred to another, you would need to evaluate if the prediction was supported by the data.

For support, you would need to demonstrate that distance A had more quadrats with lichen than distance B, and that it was not random chance affecting the presence at each distance.

If the presence at a distance was random, there should be no statistical difference in the presence of lichen between distances A and B (the statistical null hypothesis) and we would expect that half of the quadrats would with lichen recorded would be at distance A and half at distance B.

Therefore, the expected results , if there were a total of 50 quadrats with lichen present in all categories , would be 25 quadrats with lichen present at distance A and 25 quadrats with lichen present at distance B.

If, however, after sampling the observed results were 32 quadrats with lichen present at distance A and 18 quadrats with lichen present at distance B, can we now conclude that the lichen s preferred distance A?

I.e., does distance influence the presence of lichens and support our prediction and research hypothesis?

To answer this, we would compute the Chi - square (χ 2 ) test statistic and compare the value we calculate with the critical value from the chi -square table.

Chi - square (χ 2 ) test statistic is often calculated by hand, although some calculators are available online