Biostats Flashcards

1
Q

Tchebysheffs Theorem

A

For any value of k that is ≥ 1, at least 100(1 – 1/k2)% of the data will lie within k standard deviations of the mean. 100(1 – 1/12 )% = 0% of the data will lie within one standard deviation of the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When performing a nonparametric Wilcoxon rank-sum test, the first step is to combine the data values in the two samples and assign a rank of ‘1’ to

A

the smallest observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Contingency table test

A

(r-1)(c-1) (r means row, c means column)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Dichotomous variables

A

Only two possible responses

Used to classify participants (e.g. has/does not have attribute of interest)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ordinal variables

A

Categorical, ordered variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Nominal variables

A

Categorical, unordered variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Continuous variables

A

Quantitative/measurement variables; unlimited responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Standard deviation

A

Measures how far individual observations deviate from the average
Small = the observed values are close to the mean
Large = if the observed values vary widely around the sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sample variance

A

Average of squared deviations; not interpretable, therefore use sample std. deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sample standard deviation

A

Square root of sample variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interquartile range

A

Difference between 1st and 3rd quartiles

IQR = Q3 – Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sensitivity

A

true positive fraction; the probability of a diseased person testing positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Specificity

A

true negative fraction; the probability of a disease-free person testing negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Z scores

A

Used when we cannot use the properties of a normal distribution
Converting to a z score means we are standardizing
Z score formula converts x values to a standard normal distribution: Z=x-μ/σ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Central Limit Theorem

A

Theorem states, as long as the distribution is sufficiently large (n ≥ 30), then the distribution of sample means is normal in spite of a normal or skewed population distribution
Two exceptions:
1. Results are normal for population, then results will be normal for sample means if sample is less than 30
2. If the outcome for the population is dichotomous and the results meet the following criteria: min [np, n (1 – p)] > 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Standard error

A

Standard deviation of the sample means
Decreases as sample size increases
Variability in sample means is smaller for larger sample sizes (extreme values less likely to impact larger samples)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

confidence interval

A

Range of values for a population parameter with a level of confidence attached. (e.g. 95% confidence =we are 95% confident that the interval contains the unknown parameter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Confidence Interval estimates

A

General form: point estimate ± margin of error
Confidence level starts with point estimate then adds in a margin of error
Margin of error = Z*SE
Z value = Z score value from standard normal distribution based on confidence level (e.g. 90%, 95%, etc.)
SE = standard error of the point estimate (sampling variability)
Reflects the likelihood that the confidence interval contains the true, unknown parameter
Commonly used values are 90%, 95%, and 99% (Table 1 B in textbook)
Higher confidence levels = larger z values, therefore wider confidence intervals; (99% CI = wider range to account for greater variability to include unknown parameter)
0 = null value; if included in range, then results are not statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

T distribution

A

Used for small samples (generally n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Hypothesis testing

A

An explicit statement or hypothesis generated about a population parameter. We analyze sample statistics to determine if the hypothesis about the population parameter is supported or rejected.
Based on probability theory and the Central Limit Theorem

21
Q

Type 1 Error

A

Reject the null hypothesis and it is true

22
Q

Type 2 Error

A

Fail to reject the null hypothesis and it is false

Commonly occurs when the sample sizes are small

23
Q

Chi-Square test for goodness of fit

A

Tests with one sample, categorical and ordinal variables

24
Q

Independent Samples T-Test

A

Tests with two independent samples, continuous outcome

25
Q

Paired-Samples T-Test

A

Two matched samples, continuous outcome

26
Q

Chi-Square Test for Independence

A

Two or more independent samples, categorical and ordinal outcomes

27
Q

ANOVA

A

More than two independent samples, continuous outcome

28
Q

The goal of an ANOVA statistical analysis is to determine whether or not

A

One of the simplest experimental design is the completely randomized design in which random samples are selected independently from each of g populations. An analysis of variance is used to test if the g population means are the same, or is at least one mean different from the others.

29
Q

The probability distribution for all possible values of a given sample statistic is called

A

The sampling distribution of a statistic is the distribution of values of the statistic over all possible samples of size n that could have been selected from the reference population.

30
Q

The sum of the deviations of the individual observations from their mean is?

A

The sum of the deviation of the individual data elements from their mean is always equal to zero. This is why we use the sum of squared deviations.

31
Q

How to find cumulative relative frequency

A

add all the previous relative frequencies to the relative frequency for the current row. Thus the last entry of the cumulative relative frequency column is one, indicating that one hundred percent of the data has been accumulated.

32
Q

Which is larger sample size or population size

A

The sample size is a subset of the population size, thus it is always smaller than or equal to the population size.

33
Q

Protective factor

A

confidence interval falls below 0

34
Q

Risk factor

A

confidence interval falls above 0

35
Q

If the range includes _ or _ there is no significant difference

A

0 or 1

36
Q

The assumption of a t-test for the difference between the means of two independent populations is that the respective

A

One of the assumptions for the t-test for two independent populations is normality.

37
Q

One can describe the F-distribution as a sampling distribution of the ratio of which of the following

A

F statistic which is typically used for comparing two population variances. If the parent populations are independently and normally distributed, then the F statistic is calculated by (F=var1/var2 or F=var2/var1) where the numerator is the larger of the two variances. This ratio has F-distribution with the degrees of freedom n 1 -1, n 2 -1 where n 1 and n 2 are the sample sizes.

38
Q

A clinical experiment with four treatment groups was analyzed using an ANOVA and a significant difference in the population means is found. Which of the following is a natural next step?

A

Once a significant difference among the population means is found after performing an ANOVA, we next examine pairwise comparisons to further identify the nature of the differences while adjusting for the multiple comparisons via Tukey’s method or a similar method

39
Q

Poisson Distribution

A

The Poisson distribution is used to model data that represent the number of occurrences of a specified event in a given unit of time or space

40
Q

Parts of a box plot

A

The lower fence is defined as: Q1 – 1.5(IQR). The upper fence is defined as: Q3 + 1.5(IQR) where Q1 and Q3 are the lower and upper quartiles and IQR is the interquartile range. The upper and lower fences are boundaries to detect any measurements beyond those fences which are called outliers

41
Q

What components are needed to compute a z-score?

A

mean and standard deviation

42
Q

Kurtosis

A

a measure of the “peakedness” of the probability distribution of a real-valued random variable. Higher kurtosis means more of the variance is due to infrequent extreme deviations versus frequent modestly sized deviations

43
Q

standard deviation of a mean

A

given by standard deviation divided by the square root of n: , here n = 100.

44
Q

scatterplot

A

used to investigate the relationship between two continuous variables

45
Q

If all of the numbers in a list increase by 2, then the standard deviation is

A

Adding a constant number to a list of data does not change the standard deviation, but it will change the list of numbers.

46
Q

In simple linear regression, what is a method of determining the slope and intercept of the best-fitting line

A

least squares

47
Q

point prevalence

A

of current cases/ # of people in the population

48
Q

What is beta?

A

slope of a regression