CH17 Biostatistics Flashcards
______ is the use of data analysis and interpretation in health care research.
Biostatistics
______ involves the application of statistical tests to the data in order to organize, describe, summarize, and analyze it to answer a research question or test a hypothesis.
It also explains results and requires that ________ be used to explain the meaning and application of the findings, identifies possible factors that could have influenced the results, and draws inferences to the population.
Data Analysis; critical thinking
Dental hygienists should know the research process in order to understand the epidemioloty of disease, practice therapies, implement programs and practice _________ dentistry.
evidence-baseed
An insufficient number of subjects, too short of a duration, as well as the use of incorrect measurement instruments, incorrect procedure utilization & incorrect statistical tests are all causes of ________.
invalid research
What are some example of Nominal scale data?
(Unordered categories)
Male/Female
Smoker/Non-smoker
(Qualitative categories)
What are some examples of Ordinal scale data?
(Ordered categories)
Mutually exclusive categories:
1, 2, 3, 4, 5
IOTN
Minimal, Moderate, Severe, Unberable pain
(Each of the above have data that exclude all other data in the data set)
_______ data are a scale of measurements that contain all of the characteristics of the preceding scales.
This data is quantitative and has an absolute zero point (0 means there is an absence).
Some examples are height, weight, duration, and number of teeth/sealants.
Ratio Scale Data
Data that is reperesented by numbers would be considered _________. This data can be expressed as counts, percentages, and means of something.
An example of this in DH is pockets depths, # of DMFT, time spent scaling.
quantitative data
Asks the question HOW MANY
Data focused on Information that reflects the quality or nature of variables that cannot be expressed numerically is called ________ data. It is expressed as outcomes, or states, and can be counted for reporting and its variable can be rank ordered.
An example of this in DH is tissue color, tenacity of calculus, and what patients liked most & least about visit.
Qualitative Data
Asks the question, HOW MUCH?
What are some example of a Continuous Variable?
Height in cm
pocket depth in mm
Age
Time
(Example of age: 25 years, 10 months, 2 days, 5 hours, 4 seconds, 4 milliseconds, 8 nanoseconds, 99 picosends…and so on.)
What are some examples of data that are Discrete Variables?
Number of visits to the dentist
DMF
______ is a type of data that has no numeric representation therefore, it is qualitative in nature.
Ex: male/female, freshman/sophomore/jr/sr, eye color, race
Catergorical Variable Data
_______ data are categorical variable data that places subjects into ONLY two groups/catergories. it takes on one of only two possible values when observed or measured and are qualitative in nature.
Ex: M/F, yes/no, T/F
Dichotomous Variable Data
What are the 3 Categorical Data Categories?
- (Qualitative Data Categories)*
- Double Check!*
- Nominal
- Ordinal
- Dichotomus
Name the 4 Numerical Data Categories.
(Quantitative Data Categories)
- Discrete
- Continuous
- Interval
- Ratio
__________allows raw data to be organized and summarized in a meaningful way that allows for a pattern to emerge.
This type of data alway precedes ________.
Descriptive statistics; inferential statistics
(If raw data was just presented it would be hard to visualize what was being seen. By using descriptive statistics we can see data in a meaningful way.)
_______ are used to study something but do not have access to the entire population (or total). It is a ________.
Because of this limitation a sample of the population is taken and studied.
Inferential Statistics data; generalizations
What measure of central tendency is an average usedwith continuous data?
It is appropriately used for ratios and interval data.
Mean
What measure of central tendency is a midpoint of data when placed in ascending or descending order?
If there are an even amount of numbers, the ____ of the two middle numbers must be taken.
It’s appropriate use is for ordinal data.
Median; mean
Calculate the Mean of the following:
2, 3, 3, 5, 7, 10 = 30
30 ÷ 6 = 5
Mean = 5
Calculate the median of the following numbers:
3, 2, 5, 10, 3, 7
In order to calculate the median the numbers must be placed in ascending order.
2, 3, 3, 5, 7, 10
(the median point is when ½ the data is above and ½ the data is below)
NO MIDPOINT!?!?
3 + 5 = 8 ➗ 2
Median = 4
Calculate the median of the following numbers:
7, 3, 2, 3, 5, 4, 10
In order to calculate the median the numbers must be placed in ascending order.
2, 3, 3, 4, 5, 7, 10
Median = 4
(the midpoint)
What measure of central tendency is concerned with the value that occurs most often? It is used in all types of data.
It’s appropriate use is for nominal data.
mode
Calculate the Mode of the following numbers:
2, 3, 3, 5, 7, 10
2, 3, 3, 5, 7, 10
Mode = 3
What is the goal of using the measure’s of central tendency?
To take a collection of data and identify the middle of the data collected.
A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data.
Name the 3 Measures of Central Tendency.
Mean, Median and Mode
What two data categories are numerical?
Discrete and Continuous
Define: Discrete Variable
Discrete Variable are counted a finite number of times.
Descriptive Statistics are used to summarize data in a meaningful way. There are generally two MAIN types of statistics used to describe data. Name them.
- Measures of Central Tendency
- (Mean, Median, Mode)*
2. Measures of Dispersion - (Range, Variance, Standard Deviation)*
- **Though not a statistic type Graphs, histograms, and charts are also used to describe and summarize***
_______ communicates how much variation is present in a group of data.
In statistics, this is a way of describing how spread out a set of data is.
(Range, Variance, Standard Deviation)
Measures of Dispersion
(aka Measure or variability)
Measures of dispersion communicate how much variation is present in a group of data. Name the three data sets that are used to describe the dispersion of a group of data.
- Range
- Variance
- Standard deviation
What measure of dispersion is determined by subtracting the lowest score from the highest score?
It is the simplest and least helpful measurement and is usually reported with the median.
Range
What represents the average distance of each score from the mean, is associated with standard deviation, and is the most common and useful measure of dispersion.
It is usually reported with the mean to calculate data intervals.
Its value or the SD in relation to the mean depicts the distribution of scores.
Variance
(it measures how far each number in the set is from t_he mean_ and therefore from every other number in the set.)
Square root of the variance = standard deviation
Define: Variance
Wheen is Standard Deviation used and how is it determined?
Standard deviation is used when determining how spread out the numbers are around the mean.
(Used with Qualitative Continuous Data)
Square root of the variance = standard deviation
Define: Standard Error of the Mean
I need your help with this one Jesse!
Standard Error of the Mean is used to determine how accurate your estimation (or generalization) of the sample is to the entire population.
A _________ is an asymmetrical curve distorted by a few extreme scores.
Skewed Distribution

A ________ shows how often something happened in a specific catergory.
These tables may be _______ or _______.
Example: how many times does the number 9 occur?
1, 2, 3, 4, 6, 9, 9, 8, 5, 1, 1, 9, 9, 0, 6, 9.
Frequency Distribution Table
Is this frequency distributuion table grouped or ungrouped?

Grouped

Characteristics of Effective Tables
- Accuracy
- Simplicity
- Clarity
- Appearance
- Well-Designed Structure
________ is a relationship or association between variables that can be measured mathematically.
Correlation
A _______ is a relationship between two variables in which both variables move in the same direction.
Positive Correlation

A ________is an inverse correlation is a relationship between two variables that move in opposite directions
Negative Correlation

“_” signifies the correlation coefficient. Its value communicates the ______ and strength of the association.
“r”; direction
Hypothesis testing
______ is a formal decision-making process of testing a hypothesis using statistical significance and inference, followed by interpreting the statistical results
Hypothesis testing
A ________ is an initial negative statement of belief about the value of a population parameter. It rejects the research or alternative hypothesis.
Null hypothesis
Probability
expressed as “p” value
(AKA alpha 𝛼 level)
A ______ is also called an alpha a error. It occurs when the null hypothesis is rejected, but is actually true so it should have been accepted.
The probability of computing this error is the same as at the alpha level.
Researchers can control a type I error by setting the alpha level low.
This type of error can be very costly.
Type I Error
A _____error is also called a beta b error. It occurs when the null hypothesis is accepted, but it is actually false, so it should have been rejected. The exact probability of computing this type of error is generally unknown.
They are caused by using too small a sample, unreliable measuring devices, or imprecise research methods.
type II
Chi-square test
External validity
Less than ____ subjects would a research project invalid.
30
A _______ are made up of distinct and separate units or categories is is expressed by a large or infinite number of measures along a continuum and can be expressed in fractions or decimals.
This type of data are considered quantitative and can be converted into nominal or ordinal scales.
continuous variable
_________ are data made up of distinct and separate units or categories, but is counted only in whole numbers. This data is quantitative in nature because it is represented numerically. It can be converted to nominal or ordinal scale.
Discrete Variable Data
What type catergorical variable data organizes its data into mutually exclusive categories that have no rank order, value or numeric relationship between the different classifications?
Ex: L/R handed, M/F, hair color
Nominal Scale
What type of catergorical data organizes data into mutually exclusive catergories that are rank ordered based on criterion.
In this type of data, the difference in rank is not equal ibn value.
Ex: Poor/fair/good/excellent, shades of whiteness of teeth, calc class A-B-C-D
Ordinal Scale
What type of data has the characteristics of the ordinal scale and an equal distance between any two adjacent units of measurement.
This type of data is quantitative in nature and does not have a meaningful zero point.
Ex: temperature (0 degrees is colder than 90 degrees
Interval Scale Data
Data summary such as bar graphs, histograms, pie charts; measures of central Ttndency such as mean, median, mode; and measures of variability such as range, variance and standard deviation are all considered _______ Statistics.
Descriptive
A mode value can be either _____ (consisting of 2 modes) or _____ (consisting of more than 2 modes).
bimodal; multimodal
The _________, also referred to as ________ forms the theoretical foundation for comparisons and making statistical decisions.
It is a symmetrical, unimodal, bell-shaped curve that explains why random variables tend to be normally distributed.
The mean, median, and mode are equal in value.
Normal Distribution; Gaussian Distribution

The ______ provides an estimation of the spread of data given the mean and the standard deviation of a data set that follows the standard normal distribution.
Empirical Rule
The Empirical rule says that ___% of data fall within one SD of the mean, ___% within two SD of the mean, and ___% within three SD of the mean.
–68%
–95%
–99.7%

________is the foundation of the ________.
Normal distribution; central limit theorem
What factor is most effected with skewed distribution?
the mean
Skewed distributions can be _____ or ______.
positive or negative
Is this frequency distributuion table grouped or ungrouped?

Ungrouped

An example of an ________ frequency distribution table would include all the scores in the distribution, good for less than 30 observations.
Ungrouped
An example of a ____ freqency distribution table groups a set number of scores into mutually exclusive intervals, usually 5-10 intervals (easier to understand) (Those who got A’s, Those who got B’s…)
grouped
A ____ is used to represent categorical data. Its length corresponds with the frequency of the value.
Bar graph

A _______ is similar to a bar graph but the bars appear side by side and are touching. They are used to represent interval or ratio variables, grouped & ungrouped frequencies and ordinal datathat is treated as continuous data.
histogram

A _________is a line graph that represents frequency data that are continuous in nature. It is drawn by connecting midpoints of the bars of a histogram, then extending the line at both ends to imaginary midpoints at the right and left of the histogram
This graph represent grouped or ungrouped frequencies and can also represent frequency, percent, cumulative frequency, or cumulative percent.
frequency polygon

A _______is a line graph used to plot a variable over time.
Polygon

A ______ shows the relationship between two variables and how the level of one variable varies as the level of the other variable changes.
Scattergram

As it relates to correlation, the “r” value indicates the ______ of relationship.
If a value moves closer to +1 or -1, there is a stronger relationship. When it is closer to 0 there is a weaker relationship
+1 or -1 indicate PERFECT relationship, while 0 indicates ZERO relationship
strength
A _________ can be used to quantify the relationship of two variables, and expresses the functional relationship between the variables.
It is used to predict the score of one variable based on the score of another
Example: National board scores based on students’ GPA
regression analysis
A _________ provides a mathematical model that gives the strength or ability of two or more variables to predict another variable.
Examples: SAT scores, GPA strength
Multiple Regression Analysis
A _____ is called the alternative or positive hypothesis. It is the logical opposite of the null hypothesis and can indicate a direction of difference.
Example: One brand of sealants does differ from another brand of sealants.
Research Hypothesis
The ______is a probability value, also called alpha value or significance value. It represents the probability that the findings from the study are due to chance. It is commonly accepted in oral health research as equal to or smaller than 0.05 (p≤.05) so we reject the null hypothesis because we are confident that statistical decision is correct
If this value is ______ than 0.05, the results are said to be not statistically significant so we do not reject null hypothesis.
p-value; larger
_________ are used for hypothesis testing when the data meet certain assumptions.
It must be classified as continuous (includes ratio, interval, and ordinal data)
Parametric Inferential Statistics
What are the types of parametric statistics?
–Student t-test
–Analysis of variance (ANOVA)
The ______ determines is a statistically significant difference exists between two mean scores
T-test
_____ determines if statistically significant differences occur when comparing more than two mean scores and tells researchers that there is a difference among groups.
It does not, however, specify which group is different.
ANOVA
(Analysis of variance)