Exam 1: Biostatistics Flashcards

1
Q

Descriptive Statistics

  1. Involves
  2. Purpose
A

Involves: Collecting, Presenting, and Characterizign Data

Purpose: Describe Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Inferential Statistics

  1. Involves
  2. Purpose
A

Involves: Estimation and Hypothesis Testing

Purpose: Make decisions about population characteristics

***Allows us to describe a population based on a sample***

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variable

A

A symbol of an event, act, characteristic, trait, or attribute that can be measured and to which we can assign some values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Categorical Variable

A

Consists of some numeric or character codes that represent either:

  1. The presence or absence of something that is of research interest
  2. The relative weight or rank of the thing that is of research interest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Quantitative Variable

A

Variable that holds the numerical result of some measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Process

A

Series of actions or operations that transforms inputs to outputs; generates output over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Characteristics of Variables: Nominal Scale

A

Simplest level of measurement - categories without order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Characteristics of Variables: Ordinal Scale

A

Nominal variables with an inherent order among the categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Characteristics of Variables: Interval Scale

A

Measruable difference or interval or distance between observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Characteristics of Variables: Ratio

A

Same as interval but with an absolute reference point (such as “0”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Presentation: Qualitative Data

A

Summary Table –> Either a Bar Graph or Pie Chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Presentation: Quantitative Data

A

Dot Plot, Stem and Leaf Display,

or

Frequency Distribution –> Histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Class

A

One of the categories into which qualitative data can be classified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Class Frequency

A

Number of observations in the data set falling into a particular class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Class Relative Frequency

A

Class frequency divided by the total numbers of observations in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Class Percentage

A

The class relative frequency multipled by 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Bar Graph

A

Classes (Bars) have heights equivalent to class frequency, class relative frequency, or class percentage

(Unlike Histogram –> just class frequency and class relative frequency, bars are touching)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Pie Chart

A

Classes are in slices proportional to the class relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Central Tendency

A

Tendency to cluster/center about certain numerical values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Variability

A

Spread of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What symbols represents Sample/Population Mean and Size?

A

X bar should be lower case x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which is used for both quantiative and qualitative data, mean, median, or mode?

Which is not effected by extreme values?

A

Mode

Median and Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Summary of Mean, Median, and Mode

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Variance and Standard Deviation

A
  1. Measures of dispersion ***More reliable than Range***
  2. Most common measures
  3. Consider how data are distributed (unlike Range)
  4. Show variation about mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does Normal Distribution mean?

A

Mean = Median = Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What does the mean equal in the standard normal curve and what is the first standard deviation?

A

Mean = 0

First SD is +/- 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Standard Notation (Sample vs. Population)

  1. Mean
  2. Standard Deviation
  3. Variance
  4. Size
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

When do you use n-1 vs. n in the denominator of the Variance Formula?

A

n-1 = Sample Variance

n = Population Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Shape of Curve: Mean vs. Median

1. Left-Skewed

A

Left Skewed

Mean < Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Shape of Curve: Mean vs. Median

  1. Right-Skewed
A

Right-Skewed

Mean > Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

The Empirical Rule

  1. Applies to
  2. What percentage of the measurements lie within 1, 2, and 3 SDs of the mean? What are their Z-scores?
A

Applies to: Data sets that are mound-shaped and symmetric (i.e. Normal Distributions)

68% of measurements lie within one SD of the mean (x-s to x+s) z-score = b/w -1 and 1

95% of measurements lie within two SDs of the mean (x-2s to x+2s) z-score = b/w -2 and 2

99.7% of measurements lie within three SDs of the mean (x-3s to x+3s) z-score = b/w -3 and 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

If you scored in the 58th percentile, what percentage of test takers scored lower/higher than you?

A

Lower: 58%

Higher: 42%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Numerical Measures of Relative Standing: Z-Scores

  1. Describes…
  2. Measures…
A

Describes the relative location of a measurement compared to the rest of the data

Measures the number of standard deviations away from the mean a data value is located

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the Frequentist definition of Probability?

A

If an experiment is repeated n times under identical conditions and if the event A occurs m times, then as n grows, the ratio of m/n approaches a fixed limit called the probability of A

P(A) = m/n

“Law of Large Numbers”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Probability Equation

A

Frequency of times an outcome occurs divided by the total number of possible outcomes (symbolized as p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Random Event

A

Any event where the outcomes observed in that event involves uncertainty or the outcome can vary

(predicted by Probability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

When is probability unnecessary to calculate?

A

For a fixed event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

An Event (Two Definitions)

A
  1. An occurrence due to nature
  2. A collection of one or more outcomes of an experiment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Simple vs Compound Probabilities

A

Simple = Single occurrence

Compound = Result of operations

-Define relationships between or combination of event occurrences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What are the three operations that can be used to create compound events?

A
  1. Intersection
  2. Union
  3. Complement
41
Q

Intersection

A

The intersection is defined as “both A and B”

Represented by A Π B

42
Q

Union

A

Union is defined as “either A or B or both A and B”

A Ü B

43
Q

Complement​

A

Defined as “Not A”

Denoted by AC or -A

44
Q

The Additive Law: Special Rule of Addition

A

Two events A and B that cannot occur simultaneously are said to be mutually exclusive or disjoint

e.g. The probability of a newborn weightin under 2000 grams is 0.025 and over is 0.043

***simply would add the probabilities of the individual events***

45
Q

The Additive Law: General Rule of Addition

A

This is used when there is a common region; must subtract out common region

46
Q

Two-Way Table Example

A
47
Q

Probabilities Example

A
48
Q

Independent Events

A

Two unrelated events

***When expressing the joint probabilit of independent events, the general rule of multiplication does not hold

49
Q

The Special Rule of Multiplication

A

e.g. tossing a coin

Second toss has nothing to do with the first

50
Q

Questions about Mutual Exclusiveness…

  1. If events are mutually exclusive…
  2. If events are not mutually exclusive…
A

Use “or” and the additive rule

  1. ME: add them all up
  2. Not ME: Subtract out common region
51
Q

Questions about indpendence…

  1. Independent events…
  2. Not independent…
A

Use “and” and the multiplication rule

  1. Multiply them all together
  2. P (A and B) = P(A|B) x P(B)

P (B and A) = P(B|A) x P(A)

52
Q

Bayes’ Theorem

  1. When is it used
  2. P(A) vs P(B|A)
  3. Importance
A
  1. When multiplicative events are not independent
  2. P(A) = prior probability (known before calculation)

P(B|A) = posterior probability (only known after calculation)

  1. Helps investigators determine the other pertinent probability when only one is known
53
Q

How do you figure out the population mean?

A

You can use a sample and will be very close

54
Q

Unbiased vs. Biased Estimates

A

Unbiased: if the sampling distributino of a sample statistic has a mean equal to the population paramater that the statistic is intended to estimate

Biased: if the mean of the sampling distribution is not equal to the parameter

55
Q

Central Limit Theorem

A

As sample size gets large enough, the sampling distribution becomes almost normal

***Justifies Inferential Statistics***

56
Q

Confidence Interval for a Population Mean: Normal (z) Statistic

  1. Finds what?
A

Finds the range over which the population parameter MIGHT be found

***A range of plausible values for the population parameter***

57
Q

What does a 95% Confidence Level indicate?

A

In the long run, 95% of our confidence intervals will contain u (the population mean) and 5% will not

58
Q

What are 2 conditions required for a Valid Confidence Interval for u?

A
  1. A Random Sample is selected from the target population
  2. The sample size n is LARGE
    - Due to the Central Limit Theorem this condition guarantees that the sampling distribution of x(bar) is approximately normal

Also, for large n, s will be a good estimator of o- (population standard deviation)

59
Q

Student’s t-statistic

A

Has a sampling distribution very much like that of the z-statistic (mound shaped, symmetric, with mean 0)

***Primary difference is that t-statistic is more variable than z-statistic***

60
Q

Degrees of Freedom (df)

A

Actual amount of variability in the sampling distribution of t depends on the sample size, n

T-statistic has (n-1) degrees of freedom

61
Q

What happens as Degrees of Freedom (df) go down?

A

The t-distribution flattens out

62
Q

Sampling Error

A

A way of expressing the reliability associated with a confidence interval for the population mean, u

Sampling Error (SE) is equal to half-width of the confidence interval

63
Q

What is a Hypothesis?

A

A statment about the numerical value of a population parameter

64
Q

Null Hypothesis (H0)

A

The hypothesis that will be accepted unless the data provide convincing evidence that it is false. This usually represents the “status quo” or some claim about the population parameter that the researcher wants to test

65
Q

Alternative Hypothesis (Ha)

A

The hypothesis that will be accepted only if the data provide convincing evidence of its truth. This usually represents the values of a population parameter for which the researcher wants to gather evidence to support

***Opposite of the null hypothesis***

66
Q

When do we use Hypothesis Testing?

A
  1. Observational Studies
    - Find the “true” population parameter
    (e. g. what is the prevalence of AIDs in some community)

***1 sample***

  1. Clinical Trials
    - Compare Group 1 to Group 2 or
    - Compare Baseline state to post-intervention state

***2 sample tests - Independent Samples***

67
Q

Test Statistic

A

A sample statistic, computed from information provided in the sample, that the researcher uses to decide between the null and alternative hypotheses

68
Q

Type I Error

A

Occurs if the researcher reject the null hypotehsis in favor of the alternative when, in fact, the null hypothesis is true. The probabilit of committing a Type I error is denoted by a (alpha)

***The level of a is usually small and is referred to as the level of significance of the test***

69
Q

Rejection Region

A

The set of possible values of the test statistic for which the researcher will reject H0 in favor of Ha

70
Q

Type II Error

A

Occurs if the researcher accepts the null hypothesis when, in fact, it is false. Probabiility of committing a Type II error is denoted by B (beta)

71
Q

How do you identify the null hypothesis?

A

It will always have an equality sign

72
Q

What is a p-value?

A

The observed significance level for a specific statistical test is the probability (assuming H0 is true) of observing a value of the test statistic that is at least as contradictory to the null hypothesis, and supportive of the alternative hypothesis, as the actual one computed from the sample data

***Used to make rejection decision***

73
Q

What does a p-value > or = to a mean?

A

DO NOT reject H0

74
Q

What does a p-value < a mean?

A

REJECT H0

75
Q

Where is the Confidence in hypothesis testing?

A

Confidence is in the testing process, NOT in the particular result of a single test

76
Q

Strength of Correlation

A

Reflects how consistently scores for each factor change

77
Q

Regression Line

A

The best fitting straight line to a set of data points. A best fitting line is the line that minimizes the distance of all data points that fall from it

78
Q

Numerical Measure of Correlation: Pearson Correlation Coefficient

A

The Pearson (product moment) correlation coefficient (r)- used to measure the direction and strength of the linear relationship of two factors in which the data for both factors are measured on an interval or ratio scale of measurement

Numerator –> Covariance (extent to which X and Y axis vary together)

Denominator –> “” independently or separately

79
Q

Regression Analysis

A

Statistical procedure used to determine the equation of a regression line to a set of data points and to determine the extent to which the regression equation can be used to predict values of one variable, given known values of a second factor in a population

  • One quantitative dependent variable
  • One or more quantitative or qualitative (binary) variables
80
Q

Regression Analysis –>

Logistic Regression –>

A

Regression Analysis (Quantitative DV)

Logistic Regression (Qualitative DV)

-Yes or no, Male or Female

81
Q

What do rows and columns represent in a data table?

A

Rows = Cases

Columns = Variables

82
Q

What type of data do proportions summarize?

A

Nominal and Ordinal

(i.e. Qualitative data)

83
Q

How are rates different from proportions?

A

They are similar to proportions EXCEPT a multiplier (e.g. 100, etc.) is used

***They have a time reference - are computed over a known/given period of time***

84
Q

Vital Statistics Rates

A

Also known as demographic measures

***Describe the health status of a population***

e.g. Mortablity Rates (Crude, Specific) and Morbidity Rates

85
Q

Crude Mortality Rate

A

Number of all deaths in a given geography over a given year divided by the total population of the geography durnig the same year

86
Q

Specific Mortality

A

Relates to specific populations within the geographic region

87
Q

What is Morbidity Rate also known as?

A

Prevalence or Prevalence Rate

88
Q

Incidence

A

The number of new cases that have occurred during a given interval of time divided by the total population at risk

89
Q

What are Adjusting Rates used for?

A

To make a fair comparison between different populations and to avoid Confounding

90
Q

Examples of Confounding Factors

A

Age composition, Gender composition, Race/ethnic composition of a population

91
Q

Absolute Risk Reduction (ARR)

A

The reduction in risk (by the experiment) compared with the baseline risk

92
Q

Number Needed to Treat (NNT)

A

The number needed to treat in order to prevent one event

93
Q

What is the reciprocoal of the absolute value of NNT?

A

Absolute Risk Increase or the Number Needed to Harm

94
Q

Relative Risk Reduction (RRR)

A

The amount of risk reductuion relative to the baseline risk

95
Q

Relative Risk

What types of studies is it mainly used in?

A

The ratio of the incidence of a disease in people who are exposed to a risk to the incidence of people without exposure to risk

Mainly used in cohort studies

(Prospective)

96
Q

Odds Ratio

What type of study is this used in?

A

The odds that a person with the disease is exposed to a potential cause for the disease relative to the odds of a person without the disease is expose to the potential cause

Mainly used in a case/control study

(Retrospective)

97
Q

What does a RR or OR <1, >1, or =1 mean?

A

< 1 = Protective exposure

> 1 = Risky exposure

= 1 = No effect

98
Q

Inference (on RR and OR)

A

Inference is possible using the normal distribution

RR and OR distributions do not follow the theoretical probability distribution

The distribution of the natural log of RR and OR do follow normal distribution

***Need to transform to generate inferential statistics***

99
Q

When can you reject the null hypothesis?

A

When the p value involves less error than you were willing to commit (the significance level, a)

p-value of 0.03

significance level of 0.05

****Can reject the null hypothesis in this case