2801 Unit 9: statistics Flashcards
biostatistics
science of analyzing data and interpreting results
- used to solve problems in bio or health related fields
univariate analysis
describes ONE variable in a data set using simple statistics like
frequencies, proportions, and averages
bivariable analysis
examines the associations between TWO variables
multivariable analysis
examine the relationships among 3 or more variables
variable
a characteristic that can be assigned to more than one value
OR
Any quantity that varies from one entity to another
2 qualitative variables
nominal and ordinal
2 quantitative variables
discrete and continuous
nominal variable
no obvious numerical way to rank
ex. favourite sport activity, you cant rank blood types
ordinal variable
A variable with responses that span from first to last, from best to worst, from most favorable to least favorable, from always to never, or that are expressed using other types of ranked scales
ex. mild pain to severe pain (scale)
continuous variable
can take any value
plotted as a line
ex. Blood pressure, temperature
discrete variable
can take a finite limited number of values
plotted as dots
ex. age, #of drinks, you can own 2 dogs not 2.5
interval variable
value 0 doesn’t mean absence of characteristic
ex. 0 degrees Celsius does not mean no temperature, and 100 degrees is not double 50 degrees. ex2: pH = 0
ratio variable
can be plotted on a scale on which a value of 0 indicates the total absence of the characteristic
ex. heart rate, age, blood pressure
mean calculation
add values and divide by how many you have
median calculation
the middle value of all the numbers (if there is 2 middles then take the avg of those)
Mode calculation
response listed the most
ex. 2 7 8 9 2 3 1 6
2 is represented twice therefore 2 is the the mode
range calculation
range is the difference between the minimum and maximum values in the data set
quartiles
rank values from smallest to largest then draw 3 lines separating the numbers in 4 equal parts
Interquartile Range (IQR)
the middle 50% of values for a numeric variable (from 25th QR to 75th QR)
25th = the 2 values on either side added then /2
50th = same thing
75th same thing
outlier
value that is distinct from the other observations and outside the expected range of values
variance
[(each value in the set - the mean) squared] + [(next value in set - mean) ^2] + [(next value - mean)^2] do this until all values in set are done. then divide that number by how many values you have
standard deviation
square root of variance
standard error
standard deviation / root of sample size (# of values given)
confidence interval
the expected value of a measure in a source population based on the value of that measure in a study population
- expect a 5% discrepancy (where CI misses capturing the true value of a measurement)
fabrication
CREATE fake data
falsification
misrepresentation of results
comparative statistics
Tests that compare the characteristics of two or more independent populations
OR
test compares the before and after of 1 population being followed forward in time
null hypothesis
no significant difference
inferential statistics
use statistics from a random sample of a population to make assumptions about the population as a whole
steps in hypothesis testing
- take random sample
- set 2 competing hypotheses (null and alt)
- use sample statistics to decide western to accept or reject the null
- determine, if null is really true, what the observed statistics will be
p-value
measure how strongly the sample data agrees with the null
parametric test
assumes the variables being examined have particular distributions ???
nonparametric test
does not make assumptions about the distributions of responses
- used for ranked variables and when distribution of a ratio/internal variable is not normal ???