Biostatistics Flashcards

1
Q

What are the two types of statistics?

A

Descriptive and Inferential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define population

A

An aggregate of subjects we want to study

  • things
  • cases
  • Bacterias
  • Animals
  • Humans
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define sample

A

a sample refers to a set of observations drawn from a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define observation

A

Study unit / subject / individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define variable

A

Quality or quantity measured for each subject in the sample (age, sex, colour, weight)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define dataset

A

A set of values on all variables of interest for all
observation in the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define parameters

A

Parameter are quantities used to describe characteristics of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Parameters are quantities such as:

A

Mean height of Swedish men

Prevalence of Hepatitis C in Swedish drug users

Proportion of breast cancer patients who develop another cancer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

μ

A

Population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

σ2

A

population variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

p

A

population proportion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define target population

A

The population to whom we wish to
generalize our findings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define study population

A

The population from which we sample


How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the measurements of central tendency?

A

Median

Mean

Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What measure of tendency is good to use when data contains outliers?

A

Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define mode

A

Mode is that most frequently occuring value in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

S2

A

Sample variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

S

A

Standard deviation of a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How is the standard deviation calculated?

A

By taking the is the square root of its variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does a low standard deviation indicate?

A

A low standard deviation indicates that the data points tend to be very close to the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does a high standard deviation indicate?

A

a high standard deviation indicates that the data points are spread out over a large range of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does the standard deviation tell us?

A

it tells us how much variation or “dispersion” exists from the average (mean, or expected value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does the variance tell us?

A

The variance is describing how far the numbers lie from the mean (expected value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the constant for 90 % confidence intervall?

A

C = 1.64

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the constant for 95 % confidence intervall?

A

C = 1.96

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the constant for 99 % confidence intervall?

A

C = 2.58

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what is a stochastic or random variable?

A

is a variable whose value is subject to variations due to chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q
A

Sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q
A

Population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q
A

Population variance

(Sigma square)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q
A

Sample variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is a nominal variable?

A

A variable that assume values that fall into unordered categories (e.g. maritial status, place of birth)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is a binary or dichotomous variable?

A

A nominal variable with only two categories (e.g. gender, yes/no)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is a ordinal variable?

A

A variable that assume values that fall into ordered categories

disease status: minor, moderate, and severe

Blood pressure: Low, normal, and high

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the

interquartile range?

A

The interquartile range is equal to Q3 minus Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Quantitative variables can either be:

A

Discrete or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Define discrete variable

A

Data that can be arranged into naturally occurring groups. For example number of children in a family or number of cigarettes smoked per day.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Define

continuous variable

A

A variable with a potentially infinite number of possible values along a continuum. For example height and weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Explain

*range of distribution *

A

The difference between the largest and smallest values in a distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

The number of successes that result from the binomial experiment is denoted by the symbol

A

X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

The number of trials in the binomial experiment is denoted by the symbol

A

n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

The probability of success on an individual trial in a binominal experiment is denoted by the symbol..

A

P

41
Q

The probability of failure on an individual trial in a binominal experiment is denoted by

A

1 - P

42
Q

The mean of any distribution is also called…

A

Expectation

43
Q

Both standard deviation and standard error (SE) are calculated from the…

A

Variance

44
Q

When calculating variance why do we square the deviations?

A

to eliminate negative values

45
Q

How is the standard error calculated?

A

By dividing the standard deviation with the square root of n

46
Q

What measure of distribution is good to use for the median?

A

Percentiles or quartiles

47
Q

What is a type I error?

A

Type I error occurs when the researcher rejects a null hypothesis when it is true.

48
Q

What is a type II error?

A

A Type II error occurs when the researcher accepts a null hypothesis that is false.

49
Q

What is the confidence interval used for?

A

the* confidence interval* is used to express the degree of uncertainty associated with a sample statistic.

50
Q

What is a continuous varuable?

A

a variable that can take on any value between its minimum value and its maximum value.

51
Q

Z-score is also called…

A

Standard score

52
Q

What does a Z-score indicates?

A

it indicates how many standard deviations an element is from the mean.

53
Q

How is the Z-score calculated?

A
54
Q

How is the variance of a population calculated?

A
55
Q

What does the horizonatal line in a box plot diagram represent?

A

It represents the median or the 50% percentile

56
Q

What type of variables are histograms good for?

A

Continuous variables

57
Q

What does the lower limit of the box in a box plot represent?

A

the 25th percentile

58
Q

What does the upper limit of the box in a box plot represent?

A

The 75th percentile

59
Q

what does the lower whisker of a box plot represent?

A

it is the smallest value within 1.5 times the interquartile range from lower limit of the box

60
Q

what does the upper whisker of a box plot represent?

A

it is the largest value within 1.5 times the interquartile range from upper limit of the box

61
Q

What does the outer dots in a box plot represent?

A

**Outliers **

values greater than upper whisker or smaller than lower whisker

62
Q

How many percent of the observations do we find within 1 standard deviation of the mean?

A

68 %

63
Q

How many percent of the observations do we find within 2 standard deviations of the mean?

A

95 %

64
Q

The standard deviation has the same unit as the…?

A

Mean

65
Q

Name four characteristics of the Normal distribution

A
  • meant for continuous variables
  • defined from minus infinity to plus infinity
  • symmetrical and bell-shaped
  • centered about its mean
66
Q

A Normal distribution with mean
zero and variance one is called

A

standard Normal distribution.

67
Q

Name five sampling schemes

A

Simple random sampling

Systematic sampling

Stratified sampling

Cluster sampling

Non-probability sampling

68
Q

Simple random sample

A

Sampling units are equally likely to be part of the sample units

69
Q

Systematic sampling

A

a statistical method involving the selection of elements from an ordered sampling frame.

Ex. One random number is generated then every 5th is choosen.

70
Q

Stratified sampling

A

Divide the population into strata; draw random samples within each stratum;

sampling fractions may vary across strata

It ensures that all the strata are represented

71
Q

Cluster sampling

A

Identify clusters or groups of units in the population (e.g. families); draw of
random sample of cluster rather than units (e.g. individuals)

72
Q

Non-probability sampling

A

Convenience sampling schemes (e.g. volunteers)

Prone to bias

73
Q

Probability can also be said to be the….?

A

Relative frequence in the long run

74
Q

The probability is always a number between…?

A

0-1

75
Q

In linear regressions the independent variable is denoted by what letter?

A

X

76
Q

In linear regressions the dependent variable is denoted by what letter?

A

Y

77
Q

Positive linear association means

A

Positive covariance

78
Q

Negative linear association means

A

Negative covariance

79
Q

What are the association?

A

Positive

80
Q

What are the association?

A

Negative

81
Q

What are the association?

A

Non!

Independent

82
Q

The correlation coefficient can never be greater than…?

A
83
Q

The correlation coefficient can never be smaller than?

A

-1

84
Q

what does it mean if the correlation coefficient is equal to 0

A

There are no covariance between two variables

85
Q

Explain residuals

A

it is the difference between the observed value of the dependent variable (y) and the predicted value (ŷ)

86
Q

What is the coefficient of determination (r2) if x does not affect y at all?

A

the coefficient of determination (r2) is 0%

87
Q

What does the intercept of an eqation mean?

A

The intercept is the value of the dependent variable when the value of the independent variable is = 0

88
Q

what does β (slope) represent?

A

β is the value that determines how many units y increases when x increases one unit.

89
Q

In linear regressions the independent variable is denoted by what letter?

A

X

90
Q

What types of variables are used in binominal distributions?

A

Categorical binary variables

91
Q

The null hypothesis is denoted by…?

A

H0

92
Q

The alternative hypothesis is denoted by…?

A

H1 or HA

93
Q

What are the most common α-levels?

A
  1. 01
  2. 05
  3. 10
94
Q

if the confidence level is 95%, then alpha would equal

A

0.05

95
Q

What do we do if the If the P-value is less than the significance level?

P < α

A

We reject the null-hypothesis

H0

96
Q

The criteria for rejecting the null hypothesis are:

p ≤α

A

reject the null hypothesis

97
Q

The criteria for rejecting the null hypothesis are:

               p \> α
A

do not reject the null hypothesis

98
Q

What values can a p-value take?

A

only values between 0 and 1

99
Q

The 95% confidence interval for the mean represents

A

The interval that contains, with 95% probability, the true mean value in the population.

100
Q

A binomial distribution must meet these four requirements

A
  1. A fixed number of tests
  2. Each test must be independent
  3. There can be only two results (Success or Failure)
  4. No test has any impact on any other test.
101
Q

Define Z-score

A

A z-score is defined as the number of standard deviations a specific point is away from the mean.

102
Q
A