Biostats Flashcards

1
Q

In biostatistics name the steps in study design. There 5 steps.

A

1) Design of studies–> sample size/selection of study participants/role of randomization
2) Data collection variability –> important patterns in data are obscured by variability.
3) Inference -> draw conclusions from limited data
4) Summarize –> what summary measures will best convey the results
5) Interpretation –> what do the results mean in terms of practice, the program and the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 4 types of data in biostatistics?

A

1) Binary (Dichotomous) data: yes/no answers
2) Categorical Data: either nominal (no ordering) or ordinal (ordering)
3) Continous Data: blood pressure, weight, etc
4) Time to event data: time in remission

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

There are different statistical methods for different types of data. What two methods are used for binary data?

A

Fishers Exact Test

Chi-Square Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what method is used for continous data?

A

2 sample t test

wilcoxon rank sum (nonparametric) test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How would you calculate the mean of a sample (sample average)?

A

Add up data and then divide by the sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between population and sample in regards to data?

A

Population –> the entire group about which you want information (all women ages 30 and 40)
Sample –> a part of the population from which we actually collect information; used to draw conclusions about the whole population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is population vs sample mean differentiated when it comes to statistical symbols?

A

Population Mean –> Mu

Sample Mean –> X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The median number is the middle number. What happens when the sample size is an even number?

A

Average the two middle numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are ways in which spread of the distribution can be explained?

A

Min and Max
Range –> min - max
sample standard deviation (SD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why would a researcher feel it appropriate to make a histogram?

A

Way of displaying the distribution of a set of data by charting the number of observations whose values fall within pre defined numerical ranges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How would one go about making a histogram?

A

Divide the data into equal intervals
Count the number of observations in each class
Draw the histogram
Label scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Generally, now many intervals should you have in a histogram?

A

depends on the same size , n

usually the guideline is the square root of n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are other types of histograms?

A

frequency histogram
relative frequency histogram
relative frequency polygon
(note see lecture page 9 for images)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

There are several shapes of distribution when plotting data, explain what right skewed and left skewed and symmetrical means

A
Symmetrical --> right and left sides are mirror images (mean = median = mode) 
Left Skewed (negatively skewed) --> long left tail; mean  long right tail; mean> median (ex: hospital stays)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe in general terms what probability density refers to?

A

smooth idealized curve that shows the shape of the distribution in the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some features of a normal (gaussian) distribution

A

symmetric
bell shaped
mean = median = mode
(mean is the center) (SD is the spread)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what does the 68–95-99.7 Rule mean?

A

In any normal distribution, approximately;
68% of the observations fall within one standard deviation of the mean
95% of the observations fall within two standard deviations of the mean
99.7% of the observations fall within three standard deviations of the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a Z score?

A

Tells how many standard deviations from the population mean you are
Z = observation - population mean / SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the standard Z scores?

A

Z= 1 –> observation lies one SD above the mean
Z=2 –> observation lies two SD above the mean
Z = -1 –> observation lies one SD below the mean
Z= -2 –> observation lies two SD below the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If female heights, mean = 65 , s =2.5 inches

what is the Z score for 72.5 inches and 60 inches?

A
Z= 72.5 
Z = 72.5 - 65/2.5 = +3.0 SD above the average 
Z= 60 
Z = 60-65/ 2.5 = -2.0 SD below the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Example:
Suppose the population is normally distributed: if you have a standard score of Z=2, what percent of the population would have scores greater than you?

A

2.5% (95% so total would be 5% but it asks for greater then)
(refer to the 68-95-99.7 rule)

22
Q

Example:

If you have a standard score of Z=2, what % of the population would have scores less then you?

A

97.5% (this person is 2 SD away which would be 5% so therefore it would be 100-2.5 = 97.5 )
again refer to the 68-95-99.7 rule

23
Q

Example:

If you have a standard score of Z=3, what % of the population would have scores greater than you?

A

.15% (this person is 3 SD away which is 0.3% total however this asks for greater then so therefore the answer would be 0.15% )
again refer to the 68–95-99.7 rule

24
Q

Example:

If you have a standard score of Z=-1.5, what % of the population would have scores less than you?

A

this requires a table 6.68%

however knowing that 2 SD would be 2.5% therefore the answer has to be higher then 2.5% but less then 16%

25
Example: Suppose we call "unusual" observations those that are either at least 2 SD above the mean or about 2 SD below the mean. What % are unusual? (in order words, what % of the observations will have a standard score either Z> +2.0 or Z 2?
5% of outside of 2 (again this is known from the rule of 95%)
26
what % of the observations would have Z > 1.0 (aka more than 1 SD away from the mean?
32% | again 100-68 = 32
27
What % of the observations would have Z > 3.0?
0.3% | again 100-99.7 = 0.3
28
What % of the observations would have Z > 1.15
well Z > 1.0 would be 32% and Z >2.0 would be 5% | so therefore the answer would be between 32% and 5%
29
what is the difference between a parameter and a statistic?
Parameter --> number that describes the population; this is a fixed number (population mean; population proportion) Statistic --> number that describes a sample of data; can be calculated (sample mean; sample proportion)
30
What are errors from biased sampling?
``` study systemically favors certain outcomes voluntary response non response convenience sampling solution? randomly sampling ```
31
what are errors from random sampling?
caused by change occurrence get a bad sample because of bad luck can be controlled by taking larger sample
32
When a selection procedure is biased does taking a larger sample help?
no | this just repeats the mistake on a larger scale
33
When a sample is randomly selected from the population, it is called what?
random sample
34
What is an advantage to random sample?
helps control systematic bias | however there is still some sampling variability or error
35
If we repeatedly choose samples from the same population, a statistic will take different values in different samples, what is this called?
Sampling Variability
36
The spread of a sampling distribution depends on the sample size. Is it better to have a bigger or smaller sample size?
larger unbiased samples are better | larger samples also give us more tightly clustered histograms therefore more values are closer to the mean
37
If the researcher was to increase the sample size by a factor of 4 what would happen to the spread?
The spread each time will be cut in half
38
Describe the sampling distribution
what the distribution of the statistic would look like if we chose a large number of samples from the same population it describes the distribution of all sample means, from all possible random samples of the same size taken from a population.
39
What is the central limit theorem?
Provided this mathematical result: sampling distribution of a statistic is often normally distributed For the theorem to work, it requires the sample size (n) to be large (n >60)
40
What is a standard errors (SE)?
Measures the precision of your sample statistic such as the sample mean or proportion that is calculated from a number (n) of different observations.
41
As the sample size gets bigger what happens to the standard error?
gets smaller and therefore the more precise the sample mean is.
42
Standard Error of the Mean (SEM) is again a measure of the precision of the sample mean. What is the formula to calculate SEM?
``` s/square root n example: blood pressure on random sample of 100 students Sample Size: n=100 Sample Mean: X=123.4 Sample SD: s= 14.0 SEM: 14/sq.root 100 = 1.4mmHg ```
43
How close to population mean (mu) is sample mean (X)?
the standard error of the sample mean tells us 95% of the time the population mean will lie within about 2 standard errors of the sample mean. X+- 2SEM 123.4 +- 2 x 1.4 123.4 +- 2.8 we are 95% confident that the sample mean is within 2.8mmHg of the population mean. The 95% error bound is 2.8
44
From the blood pressure example, what would be the 95% Confidence Interval (CI)?
123.4 +- 2.8 | We are highly confident that the population mean falls in the range 120.6 to 126.2
45
Is a 99% or 90% CI wider?
99% CI is wider | 90% is narrower
46
The length of CI decreases (narrower) when n and s do what?
n increases s decreases (level of confidence decreases)
47
what are the two underlying assumptions for a 95% CI for the population mean?
Random Sample of Population | Sample Size n is at least 60 to use +- 2SEM
48
How would one calculate 95% CI for mean if sample size is smaller or larger then 60?
based on a t- table df is degrees of freedom: n-1 according to the df you find the t value
49
For example if: n=5 X= 99mmHg s= 15.97 then what is the 95% CI?
``` 99 +- 2.776 (from t table) x SEM 15.97/sq. root 5 99 +- 2.776 x 7.142 99 +- 19.93 The 95% CI for mean blood pressure is: (79.17, 118.83) ```
50
Does standard error or standard deviation depend on sample size?
standard error | remember the formula (s/sq.root n)