Exam 1: Biostatistics Flashcards

Question

What does **Normal Distribution** mean?

Answer 1

Mean = Median = Mode

Answer 2

**Mean = 0** First SD is +/- 1

Answer 3

**n-1** = Sample Variance **n** = Population Variance

Answer 4

Left Skewed Mean \< Median

Answer 5

Right-Skewed Mean \> Median

Answer 6

**Applies to**: Data sets that are **mound-shaped** and **symmetric** (i.e. **_Normal Distributions_**) **68%** of measurements lie within **one** SD of the mean (x-s to x+s) **z-score = b/w -1 and 1** _95%_ of measurements lie within _two_ SDs of the mean (x-2s to x+2s) **z-score = b/w -2 and 2** *99.7%* of measurements lie within *three* SDs of the mean (x-3s to x+3s) **z-score = b/w -3 and 3**

Answer 7

**Lower:** 58% **Higher:** 42%

Answer 8

Describes the relative location of a measurement compared to the rest of the data Measures the **number of standard deviations** away from the mean a data value is located

Answer 9

If an experiment is repeated **n times** under identical conditions and if the event **A** occurs **m** times, then as **n** grows, the ratio of m/n approaches a fixed limit called the probability of **A** P(A) = m/n **"Law of Large Numbers"**

Answer 10

Frequency of times an outcome occurs **divided by** the total number of possible outcomes (symbolized as *p*)

Answer 11

Any event where the outcomes observed in that event involves uncertainty or the outcome can vary (predicted by **Probability**)

Answer 12

For a **fixed event**

Answer 13

1. An occurrence due to nature 2. A collection of one or more outcomes of an experiment

Answer 14

Simple = Single occurrence Compound = Result of operations -Define relationships between or combination of event occurrences

Answer 15

1. Intersection 2. Union 3. Complement

Answer 16

The intersection is defined as **"both A and B"** Represented by **A Π B**

Answer 17

Union is defined as **"either A or B or both A and B"** **A Ü B**

Answer 18

Defined as **"Not A"** Denoted by **A^C or ^-A**

Answer 19

Two events A and B that **cannot** occur simultaneously are said to be **mutually exclusive or disjoint** e.g. The probability of a newborn weightin under 2000 grams is 0.025 and over is 0.043 \*\*\*simply would **add the probabilities of the individual events**\*\*\*

Answer 20

This is used when there is a **common region**; must **_subtract out common region_**

Answer 21

Two **unrelated** events \*\*\*When expressing the joint probabilit of independent events, the general rule of **multiplication** _does not hold_

Answer 22

e.g. tossing a coin Second toss has nothing to do with the first

Answer 23

Use **"or"** and the **additive rule** 1. ME: add them all up 2. Not ME: Subtract out **common region**

Answer 24

Use **"and"** and the **multiplication rule** 1. Multiply them all together 2. P (A and B) = P(A|B) x P(B) P (B and A) = P(B|A) x P(A)

Answer 25

1. When **multiplicative** events are **not independent** 2. P(A) = prior probability (**_known before_** calculation) P(B|A) = posterior probability (only **_known after_** calculation) 3. Helps investigators determine the other pertinent probability when only one is known

Answer 26

You can use a sample and will be **very close**

Answer 27

**Unbiased:** if the sampling distributino of a sample statistic has a mean **equal to** the population paramater that the statistic is intended to estimate **Biased:** if the mean of the sampling distribution is **not equal to** the parameter

Answer 28

As sample size gets **large enough**, the **sampling distribution** becomes _almost normal_ \*\*\***Justifies Inferential Statistics**\*\*\*

Answer 29

Finds the **range** over which the population parameter **MIGHT** be found \*\*\*A **range of plausible values** for the **population parameter**\*\*\*

Answer 30

In the long run, 95% of our confidence intervals will contain **u** (the **population mean**) and 5% will not

Answer 31

1. A **Random Sample** is selected from the target population 2. The sample size **n** is **LARGE** - Due to the **Central Limit Theorem** this condition guarantees that the sampling distribution of x(bar) is approximately normal Also, for large n, s will be a good estimator of o^- (population standard deviation)

Answer 32

Has a sampling distribution very much like that of the **z-statistic** (mound shaped, symmetric, with mean 0) \*\*\*Primary difference is that t-statistic is more variable than z-statistic\*\*\*

Answer 33

Actual amount of **variability** in the sampling distribution of **t** depends on the **sample size, n** T-statistic has **(n-1)** degrees of freedom

Answer 34

The t-distribution **flattens out**

Answer 35

A way of expressing the **reliability** associated with a confidence interval for the population mean, u **Sampling Error (SE)** is equal to **half-width** of the **confidence interval**

Answer 36

A statment about the **numerical value** of a _population parameter_

Answer 37

The hypothesis that will be accepted unless the data provide convincing evidence that it is false. This usually represents the **"status quo"** or some claim about the population parameter that the researcher wants to test

Answer 38

The hypothesis that will be accepted only if the data provide convincing evidence of its truth. This usually represents the values of a population parameter for which **the researcher _wants to gather evidence to support_** \*\*\***Opposite** of the null hypothesis\*\*\*

Answer 39

1. **_Observational Studies_** - Find the **"true" population parameter** (e. g. what is the prevalence of AIDs in some community) **\*\*\*1 sample\*\*\*** 2. **_Clinical Trials_** - Compare Group 1 to Group 2 or - Compare Baseline state to post-intervention state **\*\*\*2 sample tests - _Independent Samples_\*\*\***

Answer 40

A sample statistic, computed from information provided in the sample, that the researcher uses to **_decide between_** the null and alternative hypotheses

Answer 41

Occurs if the researcher reject the null hypotehsis in favor of the alternative when, in fact, the **null hypothesis is _true_**. The probabilit of committing a Type I error is denoted by **a (alpha)** \*\*\*The level of a is usually small and is referred to as the **level of significance** of the test\*\*\*

Answer 42

The set of possible values of the test statistic for which the researcher will reject **H₀** in favor of **H_a**

Answer 43

Occurs if the researcher **accepts the null hypothesis** when, in fact, it is **false**. Probabiility of committing a Type II error is denoted by **B (beta)**

Answer 44

It will always have an **equality sign**

Answer 45

The **observed significance level** for a specific statistical test is the probability (assuming H₀ is true) of observing a value of the test statistic that is at least as contradictory to the null hypothesis, and supportive of the alternative hypothesis, as the actual one computed from the sample data ## Footnote **\*\*\*Used to make _rejection decision_\*\*\***

Answer 46

**DO NOT** reject H₀

Answer 47

**REJECT** H₀

Answer 48

Confidence is in the **testing process, _NOT_** in the particular result of a single test

Answer 49

Reflects how consistently scores for each factor change

Answer 50

The **best fitting straight line** to a set of data points. A best fitting line is the line that minimizes the distance of all data points that fall from it

Answer 51

**The Pearson (product moment) correlation coefficient (r)-** used to measure the _direction_ and _strength_ of the linear relationship of two factors in which the data for both factors are measured on an interval or ratio scale of measurement Numerator --\> **Covariance** (extent to which X and Y axis vary _together_) Denominator --\> "" _independently_ or _separately_

Answer 52

Statistical procedure used to determine the equation of a regression line to a set of data points and to determine the **extent to which the regression equation can be used to predict values of one variable**, given known values of a second factor in a population - One quantitative dependent variable - One or more quantitative or qualitative (**binary**) variables

Answer 53

Regression Analysis (**Quant**itative DV) Logistic Regression (**Qual**itative DV) -Yes or no, Male or Female

Answer 54

Rows = **Cases** Columns = **Variables**

Answer 55

**Nominal** and **Ordinal** | (i.e. **Qual**itative data)

Answer 56

They are similar to proportions EXCEPT a **multiplier (e.g. 100, etc.) is used** \*\*\*They have a **time reference** - are computed over a known/given period of time\*\*\*

Answer 57

Also known as **demographic measures** \*\*\*Describe the **health status of a population**\*\*\* e.g. Mortablity Rates (Crude, Specific) and Morbidity Rates

Answer 58

Number of **all deaths** in a given geography over a given year **divided by** the total population of the geography durnig the same year

Answer 59

Relates to **specific populations** within the geographic region

Answer 60

**Prevalence** or **Prevalence Rate**

Answer 61

The number of **new cases** that have occurred during a given interval of time **divided by** the total population at risk

Answer 62

To make a **fair** comparison between different populations and to **avoid _Confounding_**

Answer 63

Age composition, Gender composition, Race/ethnic composition of a population

Answer 64

The **reduction in risk** (by the experiment) compared with the **baseline risk**

Answer 65

The number needed to treat in order to **prevent _one_ event**

Answer 66

Absolute Risk Increase or the **Number Needed to Harm**

Answer 67

The amount of risk reductuion relative to the baseline risk

Answer 68

The ratio of the **incidence of a disease** **in people who are exposed** to a risk to the incidence of **people without exposure to risk** Mainly used in **cohort studies** (Prospective)

Answer 69

The odds that a person with the disease is exposed to a potential cause for the disease relative to the odds of a person without the disease is expose to the potential cause Mainly used in a **case/control study** (Retrospective)

Answer 70

\< 1 = **Protective** exposure \> 1 = **Risky** exposure = 1 = **No effect**

Answer 71

Inference is possible using the **normal distribution** RR and OR distributions do not follow the theoretical probability distribution The distribution of the **natural log** of RR and OR **do** follow **normal distribution** \*\*\*Need to **transform** to generate inferential statistics\*\*\*

Answer 72

When the p value involves **less error** than you were willing to commit (the **significance level, a**) p-value of **0.03** significance level of **0.05** **\*\*\*\*Can reject** the null hypothesis in this case

Exam 1: Biostatistics Flashcards

(99 cards)