Biostatistics Flashcards
What are the two types of statistics?
Descriptive and Inferential
Define population
An aggregate of subjects we want to study
- things
- cases
- Bacterias
- Animals
- Humans
Define sample
a sample refers to a set of observations drawn from a population.
Define observation
Study unit / subject / individual
Define variable
Quality or quantity measured for each subject in the sample (age, sex, colour, weight)
Define dataset
A set of values on all variables of interest for all
observation in the study
Define parameters
Parameter are quantities used to describe characteristics of the population
Parameters are quantities such as:
Mean height of Swedish men
Prevalence of Hepatitis C in Swedish drug users
Proportion of breast cancer patients who develop another cancer
μ
Population mean
σ2
population variance
p
population proportion
Define target population
The population to whom we wish to
generalize our findings
Define study population
The population from which we sample

what are the measurements of central tendency?
Median
Mean
Mode
What measure of tendency is good to use when data contains outliers?
Median
Define mode
Mode is that most frequently occuring value in the data
S2
Sample variance
S
Standard deviation of a sample
How is the standard deviation calculated?
By taking the is the square root of its variance
What does a low standard deviation indicate?
A low standard deviation indicates that the data points tend to be very close to the mean
What does a high standard deviation indicate?
a high standard deviation indicates that the data points are spread out over a large range of values
What does the standard deviation tell us?
it tells us how much variation or “dispersion” exists from the average (mean, or expected value)
What does the variance tell us?
The variance is describing how far the numbers lie from the mean (expected value)

What is the constant for 90 % confidence intervall?
C = 1.64
What is the constant for 95 % confidence intervall?
C = 1.96
What is the constant for 99 % confidence intervall?
C = 2.58
what is a stochastic or random variable?
is a variable whose value is subject to variations due to chance

Sample mean

Population mean

Population variance
(Sigma square)

Sample variance
What is a nominal variable?
A variable that assume values that fall into unordered categories (e.g. maritial status, place of birth)
What is a binary or dichotomous variable?
A nominal variable with only two categories (e.g. gender, yes/no)
What is a ordinal variable?
A variable that assume values that fall into ordered categories
disease status: minor, moderate, and severe
Blood pressure: Low, normal, and high
What is the
interquartile range?
The interquartile range is equal to Q3 minus Q1
Quantitative variables can either be:
Discrete or continuous
Define discrete variable
Data that can be arranged into naturally occurring groups. For example number of children in a family or number of cigarettes smoked per day.

Define
continuous variable
A variable with a potentially infinite number of possible values along a continuum. For example height and weight
Explain
*range of distribution *
The difference between the largest and smallest values in a distribution.
The number of successes that result from the binomial experiment is denoted by the symbol
X
The number of trials in the binomial experiment is denoted by the symbol
n
The probability of success on an individual trial in a binominal experiment is denoted by the symbol..
P
The probability of failure on an individual trial in a binominal experiment is denoted by
1 - P
The mean of any distribution is also called…
Expectation
Both standard deviation and standard error (SE) are calculated from the…
Variance
When calculating variance why do we square the deviations?
to eliminate negative values
How is the standard error calculated?
By dividing the standard deviation with the square root of n

What measure of distribution is good to use for the median?
Percentiles or quartiles
What is a type I error?
Type I error occurs when the researcher rejects a null hypothesis when it is true.
What is a type II error?
A Type II error occurs when the researcher accepts a null hypothesis that is false.
What is the confidence interval used for?
the* confidence interval* is used to express the degree of uncertainty associated with a sample statistic.
What is a continuous varuable?
a variable that can take on any value between its minimum value and its maximum value.
Z-score is also called…
Standard score
What does a Z-score indicates?
it indicates how many standard deviations an element is from the mean.
How is the Z-score calculated?

How is the variance of a population calculated?


What does the horizonatal line in a box plot diagram represent?
It represents the median or the 50% percentile

What type of variables are histograms good for?
Continuous variables
What does the lower limit of the box in a box plot represent?
the 25th percentile
What does the upper limit of the box in a box plot represent?
The 75th percentile
what does the lower whisker of a box plot represent?
it is the smallest value within 1.5 times the interquartile range from lower limit of the box
what does the upper whisker of a box plot represent?
it is the largest value within 1.5 times the interquartile range from upper limit of the box
What does the outer dots in a box plot represent?
**Outliers **
values greater than upper whisker or smaller than lower whisker
How many percent of the observations do we find within 1 standard deviation of the mean?
68 %
How many percent of the observations do we find within 2 standard deviations of the mean?
95 %
The standard deviation has the same unit as the…?
Mean
Name four characteristics of the Normal distribution
- meant for continuous variables
- defined from minus infinity to plus infinity
- symmetrical and bell-shaped
- centered about its mean
A Normal distribution with mean
zero and variance one is called
standard Normal distribution.
Name five sampling schemes
Simple random sampling
Systematic sampling
Stratified sampling
Cluster sampling
Non-probability sampling
Simple random sample
Sampling units are equally likely to be part of the sample units
Systematic sampling
a statistical method involving the selection of elements from an ordered sampling frame.
Ex. One random number is generated then every 5th is choosen.
Stratified sampling
Divide the population into strata; draw random samples within each stratum;
sampling fractions may vary across strata
It ensures that all the strata are represented
Cluster sampling
Identify clusters or groups of units in the population (e.g. families); draw of
random sample of cluster rather than units (e.g. individuals)
Non-probability sampling
Convenience sampling schemes (e.g. volunteers)
Prone to bias
Probability can also be said to be the….?
Relative frequence in the long run
The probability is always a number between…?
0-1
In linear regressions the independent variable is denoted by what letter?
X
In linear regressions the dependent variable is denoted by what letter?
Y
Positive linear association means
Positive covariance
Negative linear association means
Negative covariance
What are the association?

Positive
What are the association?

Negative
What are the association?

Non!
Independent
The correlation coefficient can never be greater than…?
The correlation coefficient can never be smaller than?
-1
what does it mean if the correlation coefficient is equal to 0
There are no covariance between two variables
Explain residuals
it is the difference between the observed value of the dependent variable (y) and the predicted value (ŷ)
What is the coefficient of determination (r2) if x does not affect y at all?
the coefficient of determination (r2) is 0%
What does the intercept of an eqation mean?
The intercept is the value of the dependent variable when the value of the independent variable is = 0
what does β (slope) represent?
β is the value that determines how many units y increases when x increases one unit.
In linear regressions the independent variable is denoted by what letter?
X
What types of variables are used in binominal distributions?
Categorical binary variables
The null hypothesis is denoted by…?
H0
The alternative hypothesis is denoted by…?
H1 or HA
What are the most common α-levels?
- 01
- 05
- 10
if the confidence level is 95%, then alpha would equal
0.05
What do we do if the If the P-value is less than the significance level?
P < α
We reject the null-hypothesis
H0
The criteria for rejecting the null hypothesis are:
p ≤α
reject the null hypothesis
The criteria for rejecting the null hypothesis are:
p \> α
do not reject the null hypothesis
What values can a p-value take?
only values between 0 and 1
The 95% confidence interval for the mean represents
The interval that contains, with 95% probability, the true mean value in the population.
A binomial distribution must meet these four requirements
- A fixed number of tests
- Each test must be independent
- There can be only two results (Success or Failure)
- No test has any impact on any other test.
Define Z-score
A z-score is defined as the number of standard deviations a specific point is away from the mean.