Week 9 Descriptive and Comparative Statistics Flashcards

1
Q

Objectives of Today’s Class

A

-understanding the role of statistical analysis in health research
-getting familiar with terminology
you must know all terms in ch29 and 30
-essential for correct interpretation of health research results
note: today class is only for introducing the concepts you will learn (much more) in a biostatistics class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Research process

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Descriptive Statistics

A

Definions need to know in ch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Biostatistics

A

is the science of analyzing data and interpreting the results so that they can be applied to solving problems related to biology, health or related fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Univariate analysis

A

describes one variable in a data set using simple statistics like counts (frequencies), proportions, and averages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bivariable analysis

A

uses rate ratios, odds ratios, and other comparative statistical tests to examine the associations between two variables (mostly exposure and outcome)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Multivariable analysis

A

encompasses statistical tests such as multiple regression models that examine the relationships among three or more variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Advanced statistics should be used only when

A

they are appropriate for the study question and the analyst knows how to use and interpret them correctly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Link to Study Design

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a Variable?

A
  • any quantity that varies from one entity to another (sometime within an entity over time)
    *any attribute, phenomenon, or event that can have different values
    -to describes characteristic of a person, place, thing, or idea
    We measure variables when an experiment is carried out or an observation is made (week5-2)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

THE BIG PICTURE

A

VMDSI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

WHAT are the types of variables

A

Qn-DC
Ql-NO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Types of Variables (1)Nominal

A

-No intrinsic or logical order or value
* university programs, countries, types of fruits
-You can assign numbers to different categories (like assigning a number to a pear) but they do not have any other numeric properties
-doing arithmetic (e.g., 1+2= 3) IS NONSENSICAL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Types of Variables (1)ORDINAL

A

Intrinsic value but with no clear or equal differences between levels (a set of ordered categories)
Primary vs. secondary vs. university education
Mild vs. moderate vs. severe pain
Rating scales (assigning numbers)

Legitimate to say: 1 ≠ 2; 5 > 4 > 3 > 2 > 1
But in terms of the attribute being measured, we cannot say
(4-3) = (3-2) = (2-1)
4 is not two times larger than 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Displaying Qualitative (nominal, ordinal) Data

A

-Pie chart
-Bar chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Florence Nightingale

A
17
Q

Displaying Qualitative (nominal, ordinal) Data

A
18
Q

Types of Variables_Quantitative (Numeric)

A

Meaningful numeric scales

Age, blood pressure, # of friends, temperature

Assigned numbers have total mathematical meaning

1 ≠ 2; 2 ≠ 3
5 > 4 > 3 > 2 > 1
4 is indeed two times larger than 2
(4-3) = (3-2) = (2-1)

19
Q

Classification of QUANTITATIVE VARIABLES
Continuous vs Discrete

A
20
Q

Classification of QUANTITATIVE VARIABLES
Interval vs Ratio

A
21
Q

Numeric Variables; Measures of Central Tendency: 1- Mean

A

A sample MEAN is calculated by adding up all the values for a particular variable and dividing that sum by the total number of individuals with a value for the variable=arithmetic average
Find the mean for the set of measurements:
2, 9, 11, 6, 6, 26
Solution: x̅=(2+9+11+6+6+26)/6=10

22
Q

Numeric Variables; Measures of Central Tendency: 2-Median

A

The median is the value in the middle when you rank the data in ascending or descending order
Divides the data into 2 equal parts
Find the median for the set of measurements: 9, 5, 11, 6, 6, 26
Solution:
Rank the measurements from smallest to largest: 5, 6, 6, 9, 11, 26
Find the middle observation(s)
Choose a value between the two middle observations
Median = (6+9) = 7.5
2

23
Q

Numeric Variables; Measures of Central Tendency: 3-Mode

A

The most frequently occurring value for a particular variable in a data set

Find the mode for the set of measurements:
2 9 11 6 6 26

24
Q

Displaying Distributions: Histogram

A

important to manage the intervals
-remember histograms is one bar that includes a group such as 70-80

25
Q

Shape of the Histogram Normal Distribution

A

Positively skewed looks like a P on his back.

26
Q

Numeric Variables: Measures of Variability (Spread, dispersion)

A

-The range for a variable is the difference between the minimum (lowest) and the maximum (highest) values in the data set
-Quartiles mark the three values that divide a data set into four equal parts
The interquartile range (IQR) captures the middle 50% of values for a numeric variable

27
Q

Boxplot: Display of Distribution

A

A simple visual depiction of and intuitive way to explore the data

28
Q

Variance (σ2)

A

-The extent of deviation from the average value of that variable in the data set
-Calculated by adding together the squares of the differences between each observation and the sample mean (µ) and then dividing by the total number of observations

-The standard deviation (σ) is the square root of the variance
-The standard error of the mean adjusts for the number of observations in the data set by dividing the variance by the total number of observations and then taking the square root of that number

29
Q

Mean (µ) and SD (σ) in a Normal distribution

A

-About 68% of area (population) within μ±1σ;
-95% of area within μ±2σ;
-99.7% of area within μ±3σ
If μ=20 and σ=5, then 68% of subjects are measured between 15 (20-5) and 25 (20+5)
The probability of observing a value between 15 and 25=0.68

between 10 and 30=0.95

between 5 and 35 =0.997

30
Q

Confidence Intervals (CI)

A

-Provide information about the expected value of a measure in a source population based on the value of that measure in a study population
–A larger sample size will yield a narrower confidence interval
-A 95% confidence interval is usually reported for statistical estimates, which means that 5% of the time the confidence interval is expected to miss capturing the true value of a measure in the source population
–Example: mean systolic blood pressure of a sample is 120 mmHg; 95%CI: 110-130
-We are 95% confident that the real average is between 110-130; 5% chance that the true value of mean is either larger than 130 or smaller than 110

31
Q

Comparative Statistics

A

Comparing main factors between exposed and unexposed in cohort studies
Average age of exposed=Average age of unexposed
% male in exposed=% male in unexposed
Testing if randomization was effective in experimental studies
Comparing the outcome status
We can NOT just look at the calculated values (these are estimates from samples, subject to random sampling error)

32
Q

Inferential Statistics

A

Techniques that use statistics from a random sample of a population to make evidence-based assumptions (inference) about the values of parameters in the population as a whole
Decision about parameters via information obtained from a sample is via hypothesis testing

33
Q

Hypothesis Testing

A
34
Q

Steps in Hypothesis Testing

A
  1. Take a random sample from the population of interest
  2. Set up two competing hypotheses (based on research questions)
    Null Hypthesis (H0); no effect, no difference between sample and the original population
    Alternative Hypothesis (H1 or Ha), there is an effect (a difference)
  3. Use sample statistics (mean, frequency) to decide whether to support or reject the null
    By calculation of a test statistics
    Note: Tests are developed (specific formula) for different types of data and research questions (Figures 30-12 to 30-15 of the textbook)
  4. Determine if the null hypothesis is really true, what the observed sample statistics will be
    How?
35
Q

Idea of (Probability) p. Value

A

Introduced by Fisher to determine whether the observed sample supports the null
Between 0.1 and 0.9: no reason to suspect null is false
<0.02 sufficiently strong evidence to conclude null does not reflect the state of nature, unlikely to be true
“The value for which P=0.05, or 1 in 20; it is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not.”
0.05 the convention commonly used in health research
P.value measures how strongly the sample data agrees with the null

36
Q

Idea of (Probability) p. Value

A

Is calculated from observed data based on a pertinent test statistic
The probability that the observed sample will produce a value of the test statistics as or more extreme than the observed test statistic in a universe in which we know that null in true
If 0.01 it means if in the real-world null is true (no difference) there is only 1% chance that the data produce results on a difference
Small chance, we can safely reject the null
The significance level (α) is the p value at which the null hypothesis is rejected, usually 0.05 in health research

37
Q

A parametric test

A

assumes the variables being examined have particular distributions
Inferential methods are based on types of distributions (mostly normal)

38
Q

A nonparametric test

A

does not make assumptions about the distributions of responses
Nonparametric tests are used for ranked variables and when the distribution of a ratio or interval variable is non-normal