Biostatistics Flashcards
Standard Deviation [SD; s]
Shows variability of observations in a data set
What is standard error [SE] of the mean?
The standard deviation of the random sampling distribution of means
SE or SEM; in other words SE shows variability of sample means around the true population mean.
Describe 95% Confidence Interval [CI] of the mean.
Is the range of values for the population mean in which you are 95% sure the true population mean falls within the CI range
Describe 95% Confidence Interval [CI] of the mean.
Is the range of values for the population mean in which you are 95% sure that the true population mean falls within the CI range
List the measures of central location.
Mean [average], Median [middle value], and Mode [most common value]
List the measures of central variation.
Range, Percentiles, Standard Deviation, and Standard Error of the mean.
What is a normal range in frequency distribution?
Mean +/- 2 SD
When is a Independent (non paired) t- test used?
An independent t-test is used when two groups of subjects are sampled on one occasion; e.g. testing the mean difference in body weights of subjects in group A and group B at time 1
When is a dependent (paired) t- test used?
A dependent/paired t-test is used when a group variable of interest is measured on two occasions; e.g. testing the mean difference in body weights of subjects in group A at time 1 and time 2.
What is an element in statistics?
A single observation; denoted by X
The total number of elements in a population are denoted by?
N
The total number of elements in a sample are denoted by?
n
What are probability samples?
Samples in which the researcher can specify the probability of any one element in the population being included.
Probability samples permit the use of what type of statistics?
Inferential statistics
Non-probability samples allow what type of statistics to be used?
Descriptive statistics only
What are the four basic kinds of probability samples?
- simple random samples
- stratified random samples
- cluster samples
- systematic samples
What is a simple random sample?
A sample in which every individual n the population has an equal probability of being included in the sample.
When are clustered samples used?
When drawing a simple random sample or stratified random sample is too expensive or laborious.
How is a cluster sample drawn?
The investigator starts by selecting a random set of clusters, and then selects a subset of sample from each cluster.
The probability of an event is denoted by what?
p
How are probabilities expressed?
Decimal fractions between 0-1
What is the probability of an event “not occurring”?
q = 1-p
What is the addition rule of probability?
The probability of any one of several particular events OCCURRING TOGETHER is equal to the sum of their individual probabilities, provided the events are mutually exclusive
What is a typical medical use of the binomial distribution?
Genetic counseling
List the four scales of measurement.
Nominal, Ordinal, Interval, and Ratio
What are nominal data that fall into only two groups called?
Dichotomous data
Describe nominal scale data.
data can be divided into qualitative categories or groups, but cannot be ordered
Describe ordinal scale data.
data that can be placed in a meaningful order, but lacks information about the size of the interval
What are interval scale data?
Measurements that can be placed in a meaningful order, and have meaningful interval between observations.
What are interval scale data?
Measurements that can be placed in a meaningful order, and have meaningful interval between observations but has no value of absolute zero.
What are ratio scale data?
data with meaningful order, meaningful intervals, and absolute zero.
What are discrete variables?
Variables that can take only certain values and none in between
What are continuous variables?
Variables that may take any value.
What is a centile rank?
represents the percentage of observations that fall below a particular score.
What are quantiles?
divide distributions into a number of equal parts
What is a distribution called when 2 scores both occur with the greatest frequency?
Bimodal distribution
When is a median better measure than mean?
The median is more useful for highly skewed distributions.
The median in a population is denoted by?
μ
The median in a sample is denoted by?
X-bar
_
X
The measure of central tendency that best resists the influence of fluctuation between different samples?
The mean
Explain variability?
The extent to which scores are clustered together or scattered about.
How do you find the deviation score of an element?
By subtracting the distribution’s mean from the element.
How can you verify the results of calculating the deviation scores for all the elements in a distribution?
By checking that the sum of the deviation scores for all the elements is 0
What is the variance of a distribution?
The mean of the squares of all the deviation scores in the distribution
Variance is sometimes known as what?
Mean square
What is standard deviation?
The square root of the variance.
What are the symbols for standard deviation?
σ in a population
s in a sample
or SD
Why is the standard deviation particularly useful in normal distributions?
The proportion of elements in the normal distribution is constant for a given number of standard deviations above or below the mean of the distribution.
How much of the population falls within +/- 1, +/- 2, and +/-3 SD of the mean?
68%, 95%, and 99% respectively.
How much of the population falls within +/- 1, +/- 2, and +/-3 SD of the mean?
68%, 95%, and 99.7% respectively.
What is the Z score of an element?
The location of any element in a normal distribution, expressed in terms of how many standard deviations it lies above or below the mean of the distribution.
List population parameters.
The population mean and standard deviation
What are the sample mean and standard deviation called?
Sample statistics
Inferential statistics involves what?
Using a statistic to estimate a parameter
What is sampling error?
Natural, expected random variation that will cause the sample statistic to differ from the population parameter.
Describe central limit theorem?
- Random sampling distribution of means always tends to be normal, irrespective of its population distribution.
- The random sampling distribution of means will become closer to normal as sample size increases.
- the mean of the random sampling distribution of means is equal to the mean of the original population.
How is the standard error calculated?
The population SD divided by the square root of the size of the samples.
What are the steps for determining the probability that a random sample of n will have a mean above X?
- calculate the standard error
- calculate the z score of the sample mean
- find the proportion of the normal distribution that lies beyond that z score
what are confidence limits equal to?
the sample mean +/- the z scores obtained from the table multiplied by the standard error
95% confidence limits are approximately equal to what?
the sample mean plus or minus two standard errors
what must be done to the sample size to halve the confidence interval?
must be increased 4 fold
what is precision?
the degree to which a figure is immune from random variation
the width of the confidence interval reflects what?
precision
what is accuracy?
the degree to which an estimate is immune from systematic error or bias
When is a t score used instead of the z score?
t score is used when inferences about means are based on estimates of population parameters rather than the population parameters themselves
how similar are the values for z and t?
The values of z and t are similar when the sample size is large; t and z scores become increasingly different when the sample size is different.
how is the sample size expressed in a table of t values?
indirectly as degrees of freedom [df]
what are degrees of freedom equal to?
n-1
what are the steps to test a hypothesis about a mean?
- state the null and alternative hypothesis
- select the decision criterion α (level of significance)
- establish the critical values
- draw a random sample from the population, and calculate the mean of that sample
- calculate the standard deviation and estimated standard error of the sample
- calculate the value of the test statistic t that corresponds to the mean of the sample
- compare the calculated value of t with the critical values of t, and then accept or reject the null hypothesis
When can a null hypothesis be rejected?
When the probability that the sample mean could have come from the hypothesized population is less than or equal to .05. (p < .05).
If the probability of obtaining the sample mean is greater than .05, what will be made of the null hypothesis?
the null hypothesis will be accepted as correct.
what is the area of acceptance?
inside the range within which 95% of random sample means would be expected to fall
what does it mean when a result is significant as p <0.05?
the result was unlikely to have occurred by chance.
what is a type I /α error?
H0 is true but rejected; false negative conclusion about null hypothesis
what is a type II/β error?
H0 is false but accepted; false positive conclusion about null hypothesis
how is it possible to guard against a type I error?
using a more stringent (lower) level of α
what is the power of a test?
The ability of a test to reject H0 when it is false;
power = 1-β
conventionally, a study is required to have a power of what to be acceptable?
0.8
what is the most practical and important way of increasing the power of a statistical test?
increasing the sample size
which are more powerful, one tailed tests or two tailed tests?
one tailed tests
When is ANOVA used?
when more than two means are being compared.
What contributes to the total variability in the results?
- the variability resulting from the known differences between the groups.
- the ordinary random variability within each group, expected in any set of data, caused by sampling error, individual differences between the patients and so on.
what type of data does a chi-square test?
nominal data;
chi square is a test of proportions.
what are the two basic kinds of correlational techniques?
Correlation and Regression
what is correlation used for?
to quantify the strength and direction of the relationship between two variables.
what is regression used for?
to express the functional relationship between two variables, so that the value of one variable can be predicted from the knowledge of the other
what are the possible values of the correlation coefficient r?
-1 (-ve correlation) to +1 (+ve correlation);
+ve correlation: means high values of one variable are associated with high values of the other variable.
-ve correlation: means high values of one variable are associated with low values of the other variable.
what are the two most commonly used correlation coefficients?
- Pearson product-moment correlation [r]
2. Spearman rank-order correlation [ρ]
Pearson product-moment correlation used for?
interval scale data
ratio scale data
Spearman rank-order correlation used for?
ordinal scale data
True/False? Pearson and Spearman correlational techniques are for non-linear data?
False; both techniques show only a linear association.
if there is a strong non-linear relationship between 2 variables, the Pearson or Spearman coefficients will be what?
an underestimate of the true strength of the relationship.
does a correlation between two variables demonstrate a causal relationship?
No