Appendix B Flashcards
chi-square test
goodness of fit or contingency test. Test of significance designed to determine if the difference in observed and expected frequencies are significant within a selected degree of probability (frequency data vs continuous data)
continuous variable
infinite number values possible (ex. measurement of haemoglobin in blood may be in whole grams or fractional units)
discrete variable
fixed numerical values with no intermediates possible (ex. number of toes, number of trees, number of white blood cells)
dispersion
scatter of values from a central point
histogram
graph of a frequency distribution of a continuous variable
interpolation
predicts values within range of values measured but which do not actually exist in the set
Mean
denoted by X with line on top
measurement error
reflects discrepancy between a measurement and the true value of the variable being measured
median
middle value in an ordered set of values (half values are less than and half more than)
mode
most frequently occurring value
normal distribution
a theoretical frequency distribution that is bell shaped and symmetrical
population
in stats, all possible values of a particular variable in all sampling units of a particular group
precision
nearness of values of successive measurements of the same specimen
range
largest values - smallest values
scientific notation
write number as factors of powers of 10
significant figures
reflects the accuracy for which we are able to measure
standard deviation
a measure of the dispersion of a set of data about the mean (calculated as the square root of the variance)
- average size of the deviation from the mean
standard error
a measure of how reliable the sample mean is as an approximation to the population mean (standard deviation/square root of sample size)
student’s t-test
a test of significance used to determine if the mean values of two groups of data are significantly different within a selected degree of probability (are populations the same with respect to variable tested). Tests for difference between two sample means.
test of significance
Statistical test designed to evaluate probability of rejecting the null hypothesis when it is true. Example if chance of rejecting null is 1% when it is in fact true, the significance level is 1% for that test.
variability
range of measured values in a population
variance
a measure of how much scatter there is around the mean
- calculated as the sum of the difference between each measurement and the mean divided by the sample size minus one
- average size of the squared deviations from the mean
variation
members of a given population differ from each other by characteristics they possess as a result of both genetic and environmental factors affecting the organism.
single measurement consists of
- unit measured
- scale factor showing relative magnitude of value
- significant figures
units
1 m = 100 cm = 1000 mm = 1 x 10^-6 micrometres
If measurement is accurate to nearest unit
then zeroes are included (91 000 micrometres accurate to nearest um is 5 sig figs). If it is not accurate to nearest unit 91 000 -> the zeroes are just scale factor
when multiplying and dividing
results should have no more reliability than the initial value which has the least number of significant figures 3.9 x 42.15 = 1.6 x 10^2
when adding and subtracting
result should contain only as many decimal places as did the initial value with the least number of decimal places
18.328 + 2.3 = 20.6
exception to arithmetic rules applying to sig figs
when calculating the mean of a set of measurements, usually the mean is given one more significant figure than the measurement with the least number of significant figures
If the digit to be rounded is followed by a five followed by zeroes…
- if the digit is EVEN leave it unchanged ex. 12.50 becomes 12 when rounded to 2 sig figs
- if the digit is ODD increase it by one ex. 11.50 becomes 12
These theoretically cancel each other out
sample size and pop are abbreviated by?
sample size = n
population = N
Size of sample must always
be reported so others can assess results and their applicability. You also cannot increase sample size by endlessly repeating measurements on the same sampling unit.
Qualitative attributes
Easy to group as only a definite number of possibilities. Ex. sex, hair colour, or presence of a certain disease.
Quantitative attributes
Counting something, may be discrete or continuous
tables should be designed so
they are read vertically not across
bar graph is used for
discontinuous (discrete) variables (scale line must start at zero and bars must be wider than spaces between them)
line graph
shows relationship between dependent and independent variable (time relationships)
statistics of location
Describe the position of a sample along a given dimension. Will give you a representative value of the observed set. Do not represent distribution of the observations rather the measures of central tendency (mean median mode).
statistic of variation
Shows variability (dispersion) of a data set. Range is often represented by variance and standard deviation or standard error
sum
denoted by weird E
assumptions of chi-squared test
- data must consist of samples taken at random from a large population
- for each sample taken there is a restricted number of outcomes that can be divided into distinct categories
- probabilities for each outcome are independent of each other and do not
chi-squared statistic tells us
whether an observed count deviates significantly from the mean. ex. (10 females and 10 males chi-squared value is 0, 20 females and 0 males is a value of 20)
When do you use a t test vs a chi-squared test?
- a t test is for testing the null hypothesis that “there is no difference between the means of two samples” example number of leaves on shady plant vs sunny plant, height of girls vs boys
- a chi-squared test looks at the relationship between two variables
What are the necessary conditions of the data set for a t- test
Must be two variables: one is categorical and can only have 2 options and the other must be quantifiable by some mean
What are the necessary conditions of the data set for a chi-squared test
Two categorical variables which can have numerous levels (presidential candidate voted for vs. ethnicity)