Week 9: Descriptive and Comparative Statistics Flashcards

1
Q

Research Process

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Descriptive Statistics

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Biostatistics

A

is the science of analyzing data and interpreting the results so that they can be applied to solving problems related to BIOLOGY, HEALTH, or related fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Univariate analysis

A

describes ONE variable in a data set using simple statistics like counts (frequencies), proportions, and averages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Bivaraible analysis

A

uses rate ratios, odds ratios, and other comparative statistical tests to examine the associations between two variables (mostly exposure and outcome)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Multivariable analysis

A

encompasses statistical tests such as multiple regression models that examine the relationships among three or more variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Advanced statistics should be used only when…

A

they are appropriate for the study question and the analyst knows how to use and interpret them correctly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Link to Study Design

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a Variable?

A

-Any quantity that varies from one entity to another (sometime within an entity over time)

  • Any attribute, phenomenon, or event that can have different values
    ØTo describes characteristic of a person, place, thing, or idea
    We measure variables when an experiment is carried out or an observation is made (week 5-2)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a Variable?

A

-Any quantity that varies from one entity to another (sometime within an entity over time)

  • Any attribute, phenomenon, or event that can have different values
    ØTo describes characteristic of a person, place, thing, or idea
    We measure variables when an experiment is carried out or an observation is made (week 5-2)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is The Big Picture?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Types of Variables

A

Quant.dis.con

Qual.no.or

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

(Qualitative)
Types of Variables

Nominal

A

ØNo intrinsic or logical order or value
-University programs
-Countries
ØTypes of fruits
ØYou can assign numbers to different categories
Ø1=apple
Ø2=peach
Ø3=pair
ØBut they do not have any other numeric properties
Ø
ØDoing arithmetic (e.g., 1 + 2 = 3) is nonsensical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

(Qualitative)
Types of Variables

Ordinal

A

-Intrinsic value but with no clear or equal differences between levels (a set of ordered categories)
-Primary vs. secondary vs. university education
-Mild vs. moderate vs. severe pain
-Rating scales (assigning numbers)

-Legitimate to say: 1 ≠ 2; 5 > 4 > 3 > 2 > 1
-But in terms of the attribute being measured, we cannot say
Ø (4-3) = (3-2) = (2-1)
Ø 4 is not two times larger than 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you display qualitative(nominal, ordinal) data ?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Who in the absolute fuck is Florence Nightingale

A

the lady with the pie chart

17
Q

How else can we display qualitative (nominal, ordinal) data?

A

-Frequency tables

18
Q

Classification of Quantitative Variables
What is the difference between Continuous and Discrete

A
19
Q

Quantitative Variables: Interval vs Ratio
What is the difference?

A
20
Q

Numeric Variables; Measures of Central Tendency:
Mean

A

A sample mean is calculated by adding up all the values for a particular variable and dividing that sum by the total number of individuals with a value for the variable=arithmetic average
*Find the mean for the set of measurements:
*2, 9, 11, 6, 6, 26
*Solution: x̅=(2+9+11+6+6+26)/6=10

21
Q

Median

A

The median is the value in the middle when you rank the data in ascending or descending order
Divides the data into 2 equal parts
Find the median for the set of measurements: 9, 5, 11, 6, 6, 26
Solution:
1.Rank the measurements from smallest to largest: 5, 6, 6, 9, 11, 26
2.Find the middle observation(s)
Choose a value between the two middle observations
Median = (6+9) = 7.5
2

22
Q

Mode

A

The most frequently occurring value for a particular variable in a data set
–Find the mode for the set of measurements:
2 9 11 6 6 26

23
Q

Displaying Distributions: H

A

Histograms
-important to manage the intervals

24
Q

Shape of the histogram: Normal Distribution

A
25
Q

Number Variables: Measures of Variability (Spread, dispersion)

A

-The range for a variable is the difference between the minimum (lowest) and the maximum (highest) values in the data set
-Quartiles mark the three values that divide a data set into four equal parts
-The interquartile range (IQR) captures the middle 50% of values for a numeric variable

26
Q

Boxplot: Display of Distribution

A

A simple visual depiction of and intuitive way to explore the data

27
Q

Variance (σ2)

A

standard error of the mean ADJUSTS FOR THE NUMBER OF OBSERVATIONS

28
Q

Mean (µ) and SD (σ) in a Normal distribution

A
29
Q

Confidence Intervals (CI)

A

-Provide information about the expected value of a measure in a source population based on the value of that measure in a study population
-A larger sample size will yield a narrower confidence interval
-A 95% confidence interval is usually reported for statistical estimates, which means that 5% of the time the confidence interval is expected to miss capturing the true value of a measure in the source population

-Example: mean systolic blood pressure of a sample is 120 mmHg; 95%CI: 110-130
-We are 95% confident that the real average is between 110-130; 5% chance that the true value of mean is either larger than 130 or smaller than 110

30
Q

Comparative Statistics
what are we comparing and in what type of studies?

A

COMPARING main factors between exposed and unexposed in cohort studies
-Average age of exposed=Average age of unexposed
-% male in exposed=% male in unexposed
-Testing if randomization was effective in experimental studies
-Comparing the outcome status
-We can NOT just look at the calculated values (these are estimates from samples, subject to random sampling error)

31
Q

Inferential Statistics

A

ØTechniques that use statistics from a random sample of a population to make evidence-based assumptions (inference) about the values of parameters in the population as a whole
ØDecision about parameters via information obtained from a sample is via hypothesis testing

32
Q

Inferential Statistics

A

-Techniques that use STATISTICS from a random sample of a population to make evidence-based assumptions (INFERENCE) about the values of PARAMETERS in the population as a whole
-Decision about parameters via information obtained from a sample is via hypothesis testing

33
Q

Hypothesis testing:

A

Aim:
To test an explicit statement or a ‘hypothesis’ about a population parameter
The null hypothesis (H0): there is no difference between the two or more values being compared
The alternative hypothesis (Ha): there is a difference between the two or more populations being compared

34
Q

What are the four steps for Hypothesis Testing?

A

1.Take a random sample from the population of interest
2. Set up two competing hypotheses (based on research questions) Null and Alternative
3.Use sample statistics (mean, frequency) to decide whether to support or reject the null
By calculation of a TEST STATISTICS
4. DET IF the null hyp is really true, what the observed sample statistics will be

35
Q

Steps for Hypothesis Testing
1

A
  1. Take a random sample from the population of interest
36
Q

Steps for Hypothesis Testing
2

A

Set up two competing hypotheses (based on research questions)
Null Hypthesis (H0); no effect, no difference between sample and the original population
Alternative Hypothesis (H1 or Ha), there is an effect (a difference)

37
Q

Steps for Hypothesis Testing
3

A
  1. Use sample statistics (mean, frequency) to decide whether to support or reject the null

By calculation of a test statistics
Note: Tests are developed (specific formula) for different types of data and research questions (Figures 30-12 to 30-15 of the textbook)

38
Q

Steps for Hypothesis Testing
4

A
  1. Determine if the null hypothesis is really true, what the observed sample statistics will be
    How?
    Idea of (Probability) p. Value
    ØIntroduced by Fisher to determine whether the observed sample supports the null
    ØBetween 0.1 and 0.9: no reason to suspect null is false
    Ø<0.02 sufficiently strong evidence to conclude null does not reflect the state of nature, unlikely to be true
    Ø“The value for which P=0.05, or 1 in 20; it is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not.”
    Ø0.05 the convention commonly used in health research
    * P.value measures how strongly the sample data agrees with the null
    ØIs calculated from observed data based on a pertinent test statistic
    ØThe probability that the observed sample will produce a value of the test statistics as or more extreme than the observed test statistic in a universe in which we know that null in true
    ØIf 0.01 it means if in the real-world null is true (no difference) there is only 1% chance that the data produce results on a difference
    ØSmall chance, we can safely reject the null
    ØThe significance level (α) is the p value at which the null hypothesis is rejected, usually 0.05 in health research