Sampling, Data Description & Probability Flashcards

1
Q

Sampling error

A

when measuring something in a sample of the population, the difference between the sample result and the true underlying population value (if you measured everyone). This error may be the result of biased selection (i.e. sampling bias) or random variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

biased selection (i.e. sampling bias)

A

sampled subjects are not representative of the population. Bias can be minimized through good study design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

random variation

A

variation due strictly to chance; sample to sample variability that we expect in any study, even with unbiased selection of subjects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Types of samples

A

Sample of Convenience, Random sample, Stratified sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample of Convenience

A

take who you can get:

  • easy to obtain
  • bias may be a problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Random sample

A

every individual in the population has an equal chance of being in the sample:

  • used to ensure that uncontrolled factors do not bias results
  • may be difficult to obtain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Stratified sampling

A

sample is drawn within each of two or more strata (groups with common characteristic):
- used to improve accuracy of results in certain circumstances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Categorical variables

A

values fit into natural categories

Examples: gender, disease status, vital status, type of bone break (hairline, simple, etc).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

discrete variables

A

ordered numerical data restricted to integer values (count data)
Examples: # of siblings, # of days hospitalized, # of pregnancies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

continuous variables

A

numerical data that can take on any value. Often limited by precision of measuring instrument (e.g. height to 1/4 inch)
Examples: age, height, weight, cholesterol, blood pressure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Frequency distribution

A

a table of categories along with their observed frequencies.

a. categories may be natural (e.g. gender, race, type of fracture) or they may be created from continuous variables by grouping values together (e.g. age < 21 yrs, 21-49, 50+)
b. percentages are often included (relative frequencies)
c. may also include cumulative percentages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Distribution shapes

A

Unimodal and symmetric (Mean = Median = Mode), Bimodal, Skewed left (Mean < Median)/right (Mean > Median)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mean (x)

A

the sum of all observations divided by n, the number of subjects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Median

A

Half the values are below the median and half are above. The middle-most observation of ordered data. If the data are ordered from smallest to largest, the median is

  • the observation in the middle of the list if n is odd
  • the mean of the two middle observations if n is even
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Mode

A

the most frequently occurring observation(s) in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Range

A

difference between the lowest and highest observations

17
Q

Standard Deviation (sd or s)

A

a measure of the average distance of each observation from the mean

18
Q

Characteristics of a normal distribution

A
  • mean, median, and mode are equal
  • approximately:
    68% of its area lies within 1 std. dev. of mean
    95% of its area lies within 1.96 std. dev. of mean
    99% of its area lies within 2.58 std. dev. of mean
19
Q

Normal Range AKA Normal Limits

A

Can be computed from data and gives an estimate of where a certain percentage (usually 95%) of the population values lie. It is a statistical measure and may not adequately reflect “normal” in a clinical sense.

20
Q

Confidence Interval (CI) AKA Confidence Limits

A

An interval that describes where the population mean is likely to be with a certain level of confidence (usually 95%). If the interval is narrow then we feel that we have a good estimate of the population mean

21
Q

Pr(Event)/Probability of the event

A

estimated as (# experiencing event)/(# in the Set)

22
Q

Event

A

A characteristic defining a subset of our Set (e.g. affliction with a disease)

23
Q

Set

A

A collection of distinct objects (e.g. a sample of patients)

24
Q

General Multiplication rule

A

Pr(E1 and E2) = Pr(E1|E2) * Pr(E2). Simple Multiplication Rule (if independent events): Pr(E1 and E2) = Pr(E1) * Pr(E2)
[since, if E1 and E2 are independent events, Pr(E1|E2) = Pr(E1).]

25
Q

General Addition Rule

A

Pr(E1 or E2) = Pr(E1) + Pr(E2) - Pr(E1 and E2).
Simple Addition Rule (if mutually exclusive): Pr(E1 or E2) = Pr(E1) + Pr(E2)
[since, if E1 and E2 are mutually exclusive, Pr(E1 and E2) = 0.]