INTRO Flashcards
is the science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data.
Statistics
a valuable tool in making sense of data in the information age
Statistics
is the development and application of statistical concepts and techniques to biological sciences
Biostatistics
subcategory of statistics, it is statistics applied to biology
Biostatistics
variable in a particular table of data are column-headers
a characteristic or attribute that can assume different values
e.g. age, sex,
Variable
a variable that can have values that are determined by chance
yet to be determined, may still assume different values
e.g. age - ages of participants are yet to be determined
Random Variable
variable with data that is already known because it is already pre-recorded
it is already determined
e.g. date - a non-random variable because date cannot assume different values, date is already determined by convention
Non-random Variable
values that the variables can assume
below the headers in a table of data
can be determined through measurement or observation
Data
collection of data values
table of data
Data
each value in a data set
individual values
Data value or datum
consists of all the subjects that fits the criteria
Population
group of subjects selected from the population
Sample
is a decision making process for evaluating claims about a population
Hypothesis testing
collection, organization, summarization, and presentation of data
describing a situation
merely describing the data
Descriptive Statistics
e.g. census (income, family members) taken from the whole population, survey is the same as census but it only takes a sample from a given population
Descriptive Statistics
EXAMPLE:
Male - 51%
Female - 49%
Descriptive Statistics
generalizing from samples to populations; concept of probability (chance of an event occuring) is used
e.g. When studying the average grade of MLS-1 students in BE-100 with the population of 3725. A sample of 100 students from the population is taken and the average of these students is determined. For as long as the 100 samples are chosen using probabilistic methods, the conclusion taken from this sample is probably true for the whole population.
Inferential Statistics
performing estimations and hypothesis tests;
e.g. testing the claim that the average age of MLS students is 23
Inferential Statistics
determining relationships among variables; and
e.g. relationship between the number of hours studying and the final grade
making predictions
e.g. if there is a relationship between variables, predictions can be made
describing and drawing conclusions from a given data
Inferential Statistics
variables that can be placed into distinct categories, according to some characteristic or attribute
numbers can be used but for labeling only (e.g. 0 for male, 1 for female in an excel sheet)
e.g. sex (male/female), gender, program
Qualitative Variable
are numerical and can be ordered or ranked
e.g. age, grades, blood glucose level
Quantitative Variable
Quantitative Variable: 2 types
discrete variables
continuous variables
assume values that can be counted
e.g. number of participants, number of siblings (counting numbers)
e.g. 1,2,3
discrete variables
can assume an infinite number of values between any two specific values
e.g. arm length/span
e.g. 1.1, 1.2, 1.3
continuous variables
can assume an infinite number of values between any two specific values
e.g. arm length/span
e.g. 1.1, 1.2, 1.3
continuous variables
______________ must be measured, answers must be rounded off because of the limits of the measuring device.
continuous data
Measurement Scales
Nominal Level of Measurement
Ordinal Level of Measurement
Interval Level of Measurement
Ratio Level of Measurement
classifies data into mutually exclusive (nonoverlapping), exhausting categories in which no order or ranking can be imposed on the data
categorical in nature
Nominal Level of Measurement
e.g. religion (Christian, Jewish, Islam, and others) - Christian cannot be Jewish, Jewish cannot be Islam, etc. as it is a nonoverlapping data.
Nominal Level of Measurement
e.g. zip codes (no meaningful order or ranking, just for labeling cities)
it is simply for naming or categorical purposes
Nominal Level of Measurement
classifies data into categories that can be ranked; however, precise differences between the ranks do not exist
Ordinal Level of Measurement
higher than nominal, has all the characteristics of nominal and can now be ranked
Ordinal Level of Measurement
e.g. 1st yr, 2nd yr, 3rd yr - categorical as it is used to label a student, and can also be ranked in the sense that students with the higher level can be recognized (3rd yr being the highest)
Ordinal Level of Measurement
precise differences between ranks do not exist
e.g. superior, average, poor in a student evaluation - differences between the superior and the average professor, may not be the same as the difference between the average and the poor.
Gaps are not precisely defined.
Ordinal Level of Measurement
ranks data, and precise differences between units of measure do exist; however, there is no meaningful zero
Interval Level of Measurement
a higher level of measurement than the ordinal because the data can be ranked, and the gaps between units exists
Interval Level of Measurement
e.g. Biostat final grades of 90, 80, and 70 - the grades can be ranked from highest to lowest difference of 10 between 90 and 80, and 80 and 70
the difference between interval and ratio level is that interval has no meaningful zero
Interval Level of Measurement
e.g lowest grade cannot be 0
Interval Level of Measurement
e.g. temperature, 0°C is still a temperature, in the sense that it is still meaningful, there are even temperatures below zero
Interval Level of Measurement
possesses all the characteristics of interval measurement, and there exists a true zero
Ratio Level of Measurement
true ratios exist when the same variable is measured on two different members of the population
Ratio Level of Measurement
e.g. height of 152 cm, 132 cm, 121 cm - these can be ranked from tallest to shortest, intervals are the same between values (rational), there exists a true zero (a height of 0 cm is not possible)
Ratio Level of Measurement
Temperature in celsius or fahrenheit falls under ________
interval-level
Kelvin falls under _________ as kelvin starts at 0.
ratio-level
Interval and Ratio falls under ____________
Numerical or Quantitative Data
3 types of data collection
Survey
Surveying records
Direct observations
PROBABILISTIC SAMPLING
Random Sampling
Systematic Sampling
Stratified Sampling
Cluster Sampling
samples are selected by using chance methods or random numbers
simplest method of sampling
disadvantage of random sampling is it is tedious
Ensures that samples are unbiasedly chosen
e.g. fishbowl method
Random Sampling
numbering each subject of the population and then selecting every kth subject
still a random type of sampling but differs in the methods used
samples should reflective of the population
Systematic Sampling
dividing the population into groups (called strata) according to some characteristic that is important to the study, then sampling from each group
samples within the strata should be randomly selected
e.g. if the entire population is 40% male and 60 % female, the sample should reflect the same. For instance, if 10 participants are chosen as the sample, it should consist of 4 males and 6 females.
Stratified Sampling
population is divided into groups called clusters by some means such as geographic area
then the researcher randomly selects some of these clusters and uses all members of the selected clusters as the subjects of the samples
Cluster Sampling
NON PROBABILISTIC SAMPLING
Convenience Sampling
Voluntary response Sampling
Snowball Sampling
is a non-probability sampling method where units are selected for inclusion in the sample due to convenience
Convenience Sampling
a voluntary response sample can be defined as a sample made up of participants who have voluntarily chosen to participate as a part of the sample group
Voluntary response Sampling
is a recruitment technique in which research participants are asked to assist researchers in identifying other potential subjects
Snowball Sampling
Statistical Studies: 2 types
Observational Study
Experimental Study
the researcher merely observes what is happening or what has happened in
the past and tries to draw conclusions based on these observations
no interventions or manipulations, variables cannot be controlled
Observational Study
advantage is it occurs in a natural setting, and it can be done in situations considered unethical or dangerous to conduct an experiment
advantage is it occurs in a natural setting, and it can be done in situations considered unethical or dangerous to conduct an experiment
disadvantages are; the variables are not controlled, and a cause-effect relationship cannot be defined clearly depending on how the study is designed, can be expensive and time-consuming, and the researcher may not be using his/her own measurement so results can be inaccurate
Observational Study
the researcher manipulates one of the variables and tries to determine how the manipulation influences other variables
e.g. clinical trials - manipulating the participants, if they belong to the control or treatment group
Experimental Study
cause-and-effect is clear as the variables are controlled by researchers
Experimental Study
disadvantages are; it occurs in an unnatural setting as it is controlled, the study should pass high ethical standards, and the Hawthorne Effect (when subjects of an experimental study attempt to change or improve their behavior simply because it is being evaluated or studied)
in order to avoid the Hawthorne Effect, clinical trials are now blind studies
Experimental Study
Variables in Statistical Studies: 3
- Independent Variable (Explanatory Variable)
- Dependent Variable (Outcome Variable)
- Confounding Variable
the variable being manipulated by the researcher
Independent Variable (Explanatory Variable)
dependent on the independent variable
resultant variable
heavily affected by the independent variable
Dependent Variable (Outcome Variable)
a variable that influences the dependent, but not separated from the independent variable
those that affect other variables in a way that produces spurious or distorted associations between two variables
Confounding Variable
if the sample size is too small, it might not be conclusive
if the samples are selected purposefully or biasedly, probabilistic sampling methods are not used
falsely skew results
Suspect Samples
averages can be deceptive
difference between mean (heavily influenced by outliers) and median (not heavily influenced by outliers)
choosing between mean and median depends upon the data set, it’s better to choose median when the data set consists of outliers, and mean if it’s the opposite
Ambiguous Averages
the use of absolute (number) and the relative (percentage) number
Changing The Subject
one in which no comparison is made
statistics must be in context
Detached Statistics
relationships between ideas are any that are not specifically stated in the passage
Implied Connections
no sense of numbers or labels in the axis
Misleading Graphs
the way questions are asked, leading questions
Recall Bias - a type of bias that occurs when participants in a research study or clinical trial do not accurately remember a past event or experience or leave out details when reporting about them
Faulty Survey Questions