BIOSTATISTICS Flashcards

1
Q

What is statistic plus history

A

A branch of mathematics concerned with collecting and interpreting data

Also a tool for prediction and forecasting using data and statistical models.

Thought to be made in 1662 by john graunt then developed in the 17th century

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 2 kinds of statistics

A

descriptive statistics
inferential statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

descriptive statistics

A

summarize the population data by describing what was observed in the sample numerically or graphically.
numerical descriptors include mean , standard deviation for continuous data types ( like height or weight )

it explains what

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

inferential statistics

A

deals with generalization of information . Inference is the principle of thinking when we go from concrete info acquired by observations and measurements of samples to general rules that are valid for the whole population.

it explains why

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is applied statistics

A

it is statistics and inferential statistics applied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

deductive inference

A

We hold a theory and based on it we make a prediction of its consequence , we predict what the observation should be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Inductive inference

A

we go from specific to the general . We make many observations to discern a pattern , make a generalization and infer an explanation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

population

A

a collection of individuals which we may be interested , which have something in common.

mostly not possible to look at a whole pop we usually take a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

sample

A

a group of individuals taken from a larger population and used to find out something about that popualtion.

you need it to be representative of the population and have the characteristics in proportion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Random sampling

A

pick them randomly
simple random sampling - random number generator

systematic random sampling - random from the system using a sampling frame

multi stage sampling - constructed by taking a series of simple random samples in stage.
for example take a sample of children between 10 – 12 years. We divide the population in several hierarchically arranged stages towns – schools – classes – pupils and then we randomly take a few of elements from the highest stage (towns), and from these we randomly chose a few of elements from the lower stage (schools) etc.

stratified random sampling - divide them into different strata , age groups , sex etc then take a random sample from within the strata in order to obtain a sample that is representative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

types of data

A

qualitative - nominal + ordinal

quantitative - interval + ratio

graphical presenting data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

data

A

the indications produced by observation and measurement .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

qualitative data

A

individuals may fall into seperate classes
nominal - assumes its possible to clearly decide whether any 2 objects are the same or different .
e.g sex of a person ,employment , colour of eyes , blood group

oridnal data - possible to clearly decide whether 2 objects are the same or different in surveyed characteristic , also possible to determine its rank .
e.g the intensity of pain , seriousness of diabetes mellitus , school classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

quantitative data

A

the metric scale is the unit of measurement is determined unambiguously . but at the start of the scale is not always determined .

  • numerical

split into discrete ( values are integer ( no of teeth )
and continuous ( values can take any number in range e,g time , weight , height )

  • ratio data

e.g the number of cars produced last year , capacity of lungs , number of blood elements

  • interval data
    the stat is not determined uniquely
    e.g measurement of temp on Celsius or Fahrenheit scale ?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

graphical presenting data

A

convenient to convey by diagrams , but they can be misleading should only be used in addition to numbers not a replacement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

chart types for graphical presenting data

A

scatter plot ( 2 quantitative variables )
line graph ( how a variable changes over time )
bar chart ( shows absolute + relative freq of values )
age sex pyramid ( distibution of various age groups in pop)
histogram ( freq distribution )
pie chart ( shoes relative freq for each category)
box and whisker plot ( shows distance between quartiles )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

probability rules

A

Suppose that two events A, B are mutually exclusive, i.e. when one happens the other cannot happen (symbolically P(A and B) = 0).
Then the probability that one or the other happens is the sum of their probabilities.

Symbolically P(A or B) = P(A) + P(B) - Additional rule.
For example, the throw of a dice may show a one or a two, but not both. The probability that it shows a one or a two = 1/6 + 1/6 = 2/6 = 1/3.

Mutually exclusive: cannot happen at the same time.
If A, B are not mutually exclusive, symbolically P(A and B) ≠ 0, then P(A or B) = P(A) + P(B) - P(A and B).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

condtional probability

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

QUESTION TO TRY

In a random sample of 140 men aged 40-50 years and suffering from hypertension a presence of the risk factor „hypercholesterolemia“ (event A) occurred in 37 patients and the risk factor „smoking“ (event B) in 98 patients. 31 patients had both risk factors. Estimate the probabilities of the following events A, B, C = (A and B) and D = (A or B). Use relative frequencies.

Estimate the conditional probability of the occurrence of „hypercholesterolemia“ (event A) given that the event „smoking“ (event B) occurred (P(AôB)).

Verify the independence of events „hypercholesterolemia“ (event A) and „smoking“ (event B).

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Example 1
Suppose we draw a card from a deck of playing cards.
What is the probability that we draw a spade?

Example 2
Suppose a coin is flipped 3 times. What is the probability of getting two tails and one head?

A

Solution 1 : The sample space of this experiment consists of 52 cards, and the probability of each sample point is 1/52. Since there are 13 spades in the deck, the probability of drawing a spade is P(Spade) = (13)(1/52) = 1/4

Solution 2 : For this experiment, the sample space consists of 8 sample points.
S = {TTT, TTH, THT, THH, HTT, HTH, HHT, HHH}
Each sample point is equally likely to occur, so the probability of getting any particular sample point is 1/8. The event “getting two tails and one head” consists of the following subset of the sample space.
A = {TTH, THT, HTT}
The probability of Event A is the sum of the probabilities of the sample points in A. Therefore, P(A) = 1/8 + 1/8 + 1/8 = 3/8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Example 3
A coin is tossed three times. What is the probability of getting three tails?

Example 4
An urn contains 6 red marbles and 4 blue marbles. Two marbles are drawn without replacement from the urn. What is the probability that both of the marbles are blue?

A

Solution 3
If you toss a coin three times, there are a total of eight possible outcomes. They are: HHH, HHT, HTH, THH, HTT, THT, TTH, and TTT. Of the eight possible outcomes, one has three tails (TTT). Therefore, the probability of getting three tails is 1/8.

Solution 4 : Let A = the event that the first marble is blue; and let B = the event that the second marble is blue. We know the following:
In the beginning, there are 10 marbles in the urn, 4 of which are blue. Therefore, P(A) = 4/10.
After the first selection, there are 9 marbles in the urn, 3 of which are blue. Therefore, P(B|A) = 3/9.
Therefore, based on the rule of multiplication (for dependent events):
P(A and B) = P(A) P(B|A)
P(A and B) = (4/10)*(3/9) = 12/90 = 2/15

22
Q

Example 5
A student goes to the library. The probability that she checks out a work of fiction is 0.40, a work of non-fiction is 0.30, and both fiction and non-fiction is 0.20. What is the probability that the student checks out a work of fiction, non-fiction, or both?

Example 6
Of all of Dr. Smiths patients, 20 % run every day (event R), 50 % drink two glasses of milk each day (event M), and 12 % do both. What is the probability that a patient runs every day, given that the patient is known to drink two glasses of milk daily?

A

Solution 5 :
Let F = the event that the student checks out fiction; and let N = the event that the student checks out non-fiction. Then, based on the rule of addition:
P(F or N) = P(F) + P(N) - P(F and N)
P(F or N) = 0.40 + 0.30 - 0.20 = 0.50

Solution 6 :
P(R|M) = P(R and M)/P(M) = 0.12/0.50 = 0.24

23
Q

Example 7
A card is drawn randomly from a deck of ordinary playing cards. You win $10 if the card is a spade or an ace. What is the probability that you will win the game?

A

Solution
Let S = the event that the card is a spade;
and let A = the event that the card is an ace.
We know the following:
There are 52 cards in the deck.
There are 13 spades, so P(S) = 13/52.
There are 4 aces, so P(A) = 4/52.
There is 1 ace that is also a spade, so P(S ∩ A) = 1/52.
Therefore, based on the rule:
P(S ∪ A) = P(S) + P(A) - P(S ∩ A)
P(S ∪ A) = 13/52 + 4/52 - 1/52 = 16/52 = 4/13

24
Q
A

A
C
D
B
C
C
C
C
A
D

25
Q

Screening test

A
  • to identify individuals who might have the disease . screening tests are generally cheap and patient friendly
    made on basis of cost/ benefit analysis . e.g blood pressure measurement

the issue with this test is detecting who really has the disease and who does not . we can do this through sensitivity analysis ( patients are also examined by other methods like gold standard )

PREDICTIVE VALUE OF A POSITIVE TEST: the proportion of positive tests that identify diseased persons,
= a/(a + b) = P(D+|T+); (probability of disease given positive test)

PREDICTIVE VALUE OF A NEGATIVE TEST: the proportion of negative tests that correctly identify nondiseased people,
= d/(c + d) = P(D-|T-); (probability of no disease given negative test)

27
Q

descriptive statistics

A

the main ones essentially

28
Q

measure of variability

how to work out variance

standard deviation

standard error of the mean

variation coefficient

29
Q

find the variance and standard deviation of these data

30
Q

find the standard error of the mean and the variation coefficient

33
Q

probability distribution

A

A probability distribution identifies either the probability of each value of an random variable (when the variable is discrete), or the probability of the value falling within a particular interval (when the variable is continuous).

The probability distribution describes the range of possible values that a random variable can attain and the probability that the value of the random variable is within any subset of that range.

a popular one is normal distribution which is the curve we know

34
Q

probability can be split into which distributions

A

discrete and continuous

35
Q

discrete distributions

A

uniform , binomial , poisson

36
Q

continuous distribution

A

Continuous distributions - normal distribution, standard normal distribution, t-distribution, chi-squared distribution

37
Q

who made the normal distribution

A

it is a continuous probability distribution describing the data cluster around a mean . known as the gaussian function or bell curve.

38
Q

explanation of normal distirbution , how does it work , whats the equation , what is its shape

39
Q

solve this question

40
Q

what is standard normal distirbution

41
Q

what is skewness

A

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point.

42
Q

what does positive , negative and o skewness mean

43
Q

what is kurtosis

A

Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak.

44
Q

discuss the kurtosis of this image

45
Q

what is confidence interval

A

Confidence Intervals - The Basics:

Instead of just using a single point estimate (the sample mean) to estimate a population mean, we use an interval that has a certain probability of containing the true population mean
This accounts for sampling variability - the fact that different samples will give different estimates
x̄ (x-bar) is the sample mean
s is the sample standard deviation
n is the sample size
α is the significance level (e.g., 0.05 for a 95% confidence interval)
tα,n-1 is the critical value from the t-distribution with n-1 degrees of freedom

Key Points:

The width of the interval indicates the precision of our estimate - narrower intervals mean more precision
The confidence level (1-α) is usually set at 95% or 99%, with 95% being most common
These intervals are built using the t-distribution (rather than the normal distribution) when the population standard deviation is unknown, which is almost always the case in practice
The “n-1 degrees of freedom” refers to the fact that we lose one degree of freedom when estimating the standard deviation from the sample

In everyday terms, if you calculate a 95% confidence interval, it means that if you were to take many samples and calculate the confidence interval for each, about 95% of these intervals would contain the true population mean.

46
Q

what is degrees of freedom

A

This explains the concept of degrees of freedom (df) in statistics:
When calculating confidence intervals using t-distributions, we use n-1 degrees of freedom (where n is the sample size).
The image explains this with a simple example:

If you have 3 values in your sample and know their mean is 7
You can freely choose 2 of those values (like 1 and 2)
But the third value must be 18 (to make the sum 21, giving a mean of 7)

You’ve “lost” one degree of freedom because once you know the mean and all but one value, the last value is determined. This is why we use n-1 degrees of freedom with a sample of size n.
This adjustment makes the t-distribution account for the additional uncertainty that comes from estimating the population standard deviation from the sample.

47
Q

how does t distribution relate to the shape of a graph

A

they have a table they have for it , that you can reference

48
Q

what is a confidence interval limit

49
Q

answer these questions

50
Q

answer these questions