WEEK 9: STATISTICS Flashcards
What is biostatistics
is the science of analyzing data and interpreting the results so that they can be applied to solving problems related to biology, health, or related fields
what is univariate analysis
describes one variable in a data set using simple statistics like counts (frequencies), proportions, and averages
what is bivariable analysis
uses rate ratios, odds ratios, and other comparative statistical tests to examine the associations between two variables (mostly exposure and outcome)
what is multivariable analysis
analysis encompasses statistical tests such as multiple regression models that examine the relationships among three or more variables
what is a variable
Any quantity that varies from one entity to another (sometime within an entity over time)
- any attribute, phenomenon or event that can have different values
what are the 2 types of variables
quantitative and qualitative
nominal variables (qualitative)
- no intristic or logical order or value
- ex. university programs
- you can assign numbers to a different categories
- do not have any other numeric properties
ordinal variables (qualitative)
Intrinsic value but with no clear or equal differences between levels (a set of ordered categories)
- ex. mild vs. moderate vs. severe pain
- rating scales
3 ways to display qualitative data (nominal, ordinal)
pie chart, bar chart, frequency tables
numeric variable (quantitative)
- any positive real number, depends on the nature of the variable can be expressed in decimals
- meaningful numeric scales
- age, blood pressure, # of friends, temperature
- assigned numbers have total mathematical meaning
continuous variable
- can take any value within a range
- ex. a persons height. can be 60 inches
- blood pressure, temp.
- plotted as. a line
discrete variable
- can take a finite or limited number of values
- not continious
- a family can not own 10 1/2 cars
- age in year, number of drinks
- can be plotted as dots
quantitative variables: interval vs ratio
interval:
- difference is meaningful
- no natural zero
ratio:
- ratio is meaningful
- zero means absense of attribute (is natural)
Mean
is calculated by adding up all the values for a particular variable and dividing that sum by the total number of individuals with a value for the variable=arithmetic average
median
is the value in the middle when you rank the data in ascending or descending order
- Divides the data into 2 equal parts
Mode
the most frequently occurring value for a particular variable in a data set
histogram
a graph that shows the frequency of numerical data using rectangles
- important to manage the intervals
range
range for a variable is the difference between the minimum (lowest) and the maximum (highest) values in the data set
what are quartiles
mark the three values that divide a data set into four equal parts
what is the interquartile range
captures the middle 50% of values for a numeric variable
standard error of the mean
adjusts for the number of observations in the data set by dividing the variance by the total number of observations and then taking the square root of that number
confidence intervals
Provide information about the expected value of a measure in a source population based on the measured value in a study population
- a larger sample size will yield a narrower confidence interval
what does a 95% confidence interval mean
interval is usually reported for statistical estimates, which means that 5% of the time the confidence interval is expected to miss capturing the true value of a measure in the source population
inferential statistics
Techniques that use statistics from a random sample of a population to make evidence-based assumptions (inference) about the values of parameters in the population as a whole
null hypothesis
there is no difference between the two or more values being compared
alternative hypothesis
there is a difference between the two or more populations being compared
steps in hypotheis testing
- Take a random sample from the population of interest
- Set up two competing hypotheses (based on research questions)
- use sample stats (mean, frequency) to decide whether to support or reject null
- determine if the null hypothesis is really true, what the observed sample statistics will be
p value
Introduced by Fisher to determine whether the observed sample supports the null
- between 0.1 and 0.9: no reason to suspect null hypothesis is false
- 0.05 the convention commonly used in health research
how is p value calculated
from observed data based on pertinent test statistic
if p = 0.01 what does this mean
If p=0.01 it means if in the real-world null is true (no difference) there is only 1% chance that the data produce results on a difference
what is the significance level
is the p value at which the null hypothesis is rejected, usually 0.05 in health research
parametric test
assumes the variables being examined have particular distributions
- Inferential methods are based on types of distributions (mostly normal)
nonparametric test
does not make assumptions about the distributions of responses
- Nonparametric tests are used for ranked variables and when the distribution of a ratio or interval variable is non-normal
bar chart and pie chart
Bar Chart - graph that presents categorical data with rectangle lengths proportional to the values they represent
Pie Chart - circle in which each wedge or slice displays the percentage of participants who provided a particular answer to 1 question
kurtosis
describes how peaked or flat a bell-shaped distribution is
leptokurtic vs playkurtic
Leptokurtic - distribution curve is very peaked
Platykurtic - curve is relatively flat
unimodal and bimodal
Unimodal - has 1 peak
Bimodal - has 2 peaks
standard deviation
the square root of variance
z score
is a # that indicates how many standard deviations away from the sample mean the response of an individual from within that population
ex. a person who is the mean age has a z score of 0. A person whose age is 1 standard deviation above the mean in the population will have a z score of 1
fabrication, falsification and plagirism
Fabrication - creation of fake data (creating fictitious data in a spreadsheet for people who never completed a questionnaire or ever participated in an experiment)
Falsification - misinterpretation of results (modifying extreme values to improve the results of statistical tests, manipulating photographs, or intentional misreporting a study’s methods to make the study look more rigorous)
Plagiarism - the use of other people’s ideas, words, ot images w/out permission and proper attribution
outlier
value in a numeric data set that is distant from other observations and outside the expected range of values
Steps for identifying a statistical test
- Select variables to compare
- specify the goal of the test
- Check variable types
- Choose appropriate test for variables
- Confirm that assumptions of the test are met
- Run test and interpret results
Fisher’s Exact Test
compares the values of of a binomial variable in 2 independent populations