Biostats Flashcards

1
Q

In biostatistics name the steps in study design. There 5 steps.

A

1) Design of studies–> sample size/selection of study participants/role of randomization
2) Data collection variability –> important patterns in data are obscured by variability.
3) Inference -> draw conclusions from limited data
4) Summarize –> what summary measures will best convey the results
5) Interpretation –> what do the results mean in terms of practice, the program and the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 4 types of data in biostatistics?

A

1) Binary (Dichotomous) data: yes/no answers
2) Categorical Data: either nominal (no ordering) or ordinal (ordering)
3) Continous Data: blood pressure, weight, etc
4) Time to event data: time in remission

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

There are different statistical methods for different types of data. What two methods are used for binary data?

A

Fishers Exact Test

Chi-Square Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what method is used for continous data?

A

2 sample t test

wilcoxon rank sum (nonparametric) test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How would you calculate the mean of a sample (sample average)?

A

Add up data and then divide by the sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between population and sample in regards to data?

A

Population –> the entire group about which you want information (all women ages 30 and 40)
Sample –> a part of the population from which we actually collect information; used to draw conclusions about the whole population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is population vs sample mean differentiated when it comes to statistical symbols?

A

Population Mean –> Mu

Sample Mean –> X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The median number is the middle number. What happens when the sample size is an even number?

A

Average the two middle numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are ways in which spread of the distribution can be explained?

A

Min and Max
Range –> min - max
sample standard deviation (SD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why would a researcher feel it appropriate to make a histogram?

A

Way of displaying the distribution of a set of data by charting the number of observations whose values fall within pre defined numerical ranges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How would one go about making a histogram?

A

Divide the data into equal intervals
Count the number of observations in each class
Draw the histogram
Label scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Generally, now many intervals should you have in a histogram?

A

depends on the same size , n

usually the guideline is the square root of n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are other types of histograms?

A

frequency histogram
relative frequency histogram
relative frequency polygon
(note see lecture page 9 for images)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

There are several shapes of distribution when plotting data, explain what right skewed and left skewed and symmetrical means

A
Symmetrical --> right and left sides are mirror images (mean = median = mode) 
Left Skewed (negatively skewed) --> long left tail; mean  long right tail; mean> median (ex: hospital stays)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe in general terms what probability density refers to?

A

smooth idealized curve that shows the shape of the distribution in the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some features of a normal (gaussian) distribution

A

symmetric
bell shaped
mean = median = mode
(mean is the center) (SD is the spread)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what does the 68–95-99.7 Rule mean?

A

In any normal distribution, approximately;
68% of the observations fall within one standard deviation of the mean
95% of the observations fall within two standard deviations of the mean
99.7% of the observations fall within three standard deviations of the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a Z score?

A

Tells how many standard deviations from the population mean you are
Z = observation - population mean / SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the standard Z scores?

A

Z= 1 –> observation lies one SD above the mean
Z=2 –> observation lies two SD above the mean
Z = -1 –> observation lies one SD below the mean
Z= -2 –> observation lies two SD below the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If female heights, mean = 65 , s =2.5 inches

what is the Z score for 72.5 inches and 60 inches?

A
Z= 72.5 
Z = 72.5 - 65/2.5 = +3.0 SD above the average 
Z= 60 
Z = 60-65/ 2.5 = -2.0 SD below the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Example:
Suppose the population is normally distributed: if you have a standard score of Z=2, what percent of the population would have scores greater than you?

A

2.5% (95% so total would be 5% but it asks for greater then)
(refer to the 68-95-99.7 rule)

22
Q

Example:

If you have a standard score of Z=2, what % of the population would have scores less then you?

A

97.5% (this person is 2 SD away which would be 5% so therefore it would be 100-2.5 = 97.5 )
again refer to the 68-95-99.7 rule

23
Q

Example:

If you have a standard score of Z=3, what % of the population would have scores greater than you?

A

.15% (this person is 3 SD away which is 0.3% total however this asks for greater then so therefore the answer would be 0.15% )
again refer to the 68–95-99.7 rule

24
Q

Example:

If you have a standard score of Z=-1.5, what % of the population would have scores less than you?

A

this requires a table 6.68%

however knowing that 2 SD would be 2.5% therefore the answer has to be higher then 2.5% but less then 16%

25
Q

Example:
Suppose we call “unusual” observations those that are either at least 2 SD above the mean or about 2 SD below the mean. What % are unusual? (in order words, what % of the observations will have a standard score either Z> +2.0 or Z 2?

A

5% of outside of 2 (again this is known from the rule of 95%)

26
Q

what % of the observations would have Z > 1.0 (aka more than 1 SD away from the mean?

A

32%

again 100-68 = 32

27
Q

What % of the observations would have Z > 3.0?

A

0.3%

again 100-99.7 = 0.3

28
Q

What % of the observations would have Z > 1.15

A

well Z > 1.0 would be 32% and Z >2.0 would be 5%

so therefore the answer would be between 32% and 5%

29
Q

what is the difference between a parameter and a statistic?

A

Parameter –> number that describes the population; this is a fixed number (population mean; population proportion)
Statistic –> number that describes a sample of data; can be calculated (sample mean; sample proportion)

30
Q

What are errors from biased sampling?

A
study systemically favors certain outcomes
voluntary response 
non response 
convenience sampling 
solution? randomly sampling
31
Q

what are errors from random sampling?

A

caused by change occurrence
get a bad sample because of bad luck
can be controlled by taking larger sample

32
Q

When a selection procedure is biased does taking a larger sample help?

A

no

this just repeats the mistake on a larger scale

33
Q

When a sample is randomly selected from the population, it is called what?

A

random sample

34
Q

What is an advantage to random sample?

A

helps control systematic bias

however there is still some sampling variability or error

35
Q

If we repeatedly choose samples from the same population, a statistic will take different values in different samples, what is this called?

A

Sampling Variability

36
Q

The spread of a sampling distribution depends on the sample size. Is it better to have a bigger or smaller sample size?

A

larger unbiased samples are better

larger samples also give us more tightly clustered histograms therefore more values are closer to the mean

37
Q

If the researcher was to increase the sample size by a factor of 4 what would happen to the spread?

A

The spread each time will be cut in half

38
Q

Describe the sampling distribution

A

what the distribution of the statistic would look like if we chose a large number of samples from the same population
it describes the distribution of all sample means, from all possible random samples of the same size taken from a population.

39
Q

What is the central limit theorem?

A

Provided this mathematical result: sampling distribution of a statistic is often normally distributed
For the theorem to work, it requires the sample size (n) to be large (n >60)

40
Q

What is a standard errors (SE)?

A

Measures the precision of your sample statistic such as the sample mean or proportion that is calculated from a number (n) of different observations.

41
Q

As the sample size gets bigger what happens to the standard error?

A

gets smaller and therefore the more precise the sample mean is.

42
Q

Standard Error of the Mean (SEM) is again a measure of the precision of the sample mean. What is the formula to calculate SEM?

A
s/square root n 
example: blood pressure on random sample of 100 students
Sample Size: n=100
Sample Mean: X=123.4 
Sample SD: s= 14.0 
SEM: 14/sq.root 100 = 1.4mmHg
43
Q

How close to population mean (mu) is sample mean (X)?

A

the standard error of the sample mean tells us 95% of the time the population mean will lie within about 2 standard errors of the sample mean.
X+- 2SEM
123.4 +- 2 x 1.4
123.4 +- 2.8
we are 95% confident that the sample mean is within 2.8mmHg of the population mean. The 95% error bound is 2.8

44
Q

From the blood pressure example, what would be the 95% Confidence Interval (CI)?

A

123.4 +- 2.8

We are highly confident that the population mean falls in the range 120.6 to 126.2

45
Q

Is a 99% or 90% CI wider?

A

99% CI is wider

90% is narrower

46
Q

The length of CI decreases (narrower) when n and s do what?

A

n increases
s decreases
(level of confidence decreases)

47
Q

what are the two underlying assumptions for a 95% CI for the population mean?

A

Random Sample of Population

Sample Size n is at least 60 to use +- 2SEM

48
Q

How would one calculate 95% CI for mean if sample size is smaller or larger then 60?

A

based on a t- table
df is degrees of freedom: n-1
according to the df you find the t value

49
Q

For example if:
n=5
X= 99mmHg
s= 15.97 then what is the 95% CI?

A
99 +- 2.776 (from t table) x SEM 15.97/sq. root 5 
99 +- 2.776 x 7.142 
99 +- 19.93 
The 95% CI for mean blood pressure is: 
(79.17, 118.83)
50
Q

Does standard error or standard deviation depend on sample size?

A

standard error

remember the formula (s/sq.root n)