Measurment and Descriptive Analysis Flashcards

Question

Is the Mean/Median affected by outliers?

Answer 1

Mean: Yes Median: No

Answer 2

Distribution or graph of the frequency of occurrence

Answer 3

-Symmetrical, bell-shaped -also called Gaussian distribution

Answer 4

-used to describe the variability of normally distributed data -gives an idea of the width of the curve, the spread of the data around the mean f.e.: mean = 75 -> SD = 10 --> 75 +/- 10 -most commonly used measure of data variability with medical and health data

Answer 5

1 SD: represents 68.2% 2 SD: represents 95.5% 3 SD: represents 99.7%

Answer 6

Standard deviation

Answer 7

-Number of patients (a small number of patients will result in a large SD, a large number of pt in a small SD) -outliers: will increase the SD in one direction, no longer bell-shaped, Gaussian distributed -> skewed

Answer 8

-the data is distributed to one side of the data curve -the standard deviation is the wrong measure to use bc SD is best used with a bell-shaped curve

Answer 9

-The number of Standard deviations SD away from the mean you are -f.e: Z=+1.65 represents 5% of the normal population, the rest (95%) are under the curve -f.e. Z= if a heart rate of 65bpm lies 1.5 SD below the mean it has a Z-score of 1.5

Answer 10

represents 5% outside of the curve and 95% under the curve

Answer 11

represents 2.5% outside of the curve and 97.5% under the curve -in the case of +/-1.96 it would be 2.5% on each side outside of the curve = 5% outside of the curve -> 95% under the curve

Answer 12

We include 95% of the data

Answer 13

-skewed to the left (negative Z-score) or to the right (positive Z-score) -Tail off to either the right or low end of the measurement

Answer 14

-The first quartile cuts off the lowest 25% of the data -The third quartile cuts off the highest 25% of the data -IQR = 25th to 75th percentile -Midspread is the Middle 50

Answer 15

so when given an IQR of 65-95 we know that most of the data is between 65-90 (50%) and 25% is in the lower and 25% in the higher quartile

Answer 16

-The Bottom of the box is the 25th percentile -The top of the Box is the 75th percentile -Black bar in the middle is the median -The whiskers on the bottom are the 10th percentile and the whiskers on the top are the 90th percentile -The dots represent values outside of the 90th percentile

Answer 17

-skewed with values at the higher end -it can not be Gaussian shaped because with 3 SD (-120) we can't go below 0

Answer 18

-possibly negatively skewed -we can't go over 100 with (2 SD or 3SD)

Answer 19

-Patients: Inversely -> the more patients the smaller the SD -Outliers: proportional -> the greater the distance of a data point from the mean -> the greater the SD

Answer 20

measures the share among the entire population that have died from the disease -CALCULATE: the number of deaths DIVIDED by the total population

Answer 21

-it can make a disease looke more harmless, because it takes the whole population into account, regardless if some were not even exposed to the disease

Answer 22

-the ones who died from the disease among all who were diagnosed with the disease OVER a period of time -The measure of disease severity -# of deaths in a period of time DIVIDED by the # of individuals diagnosed with the disease in that time X 100 (for percentage)

Answer 23

Bc only if we look at the people who actually have the disease, we can tell how deadly the disease is -> Exclude all those who don't have the disease

Answer 24

-it is not the same as the risk of death for an infected person -it is the ratio between the #of deaths from the disease and the #of confirmed cases (not total cases) -it is less accurate than the IFR because it doesn't take patients into account who were not diagnosed but still have the disease

Answer 25

#of deaths from a disease / #of ALL cases (not confirmed cases) -the IFR tells if someone is infected with the disease, how likely is it to die from it

Answer 26

Because the IFR takes all cases into account, whereas the CFR only refers to the #of confirmed cases (diagnosed)

Answer 27

-Occurrence of new cases of disease or injury in a population over time -Incidence = New cases / population * Timeframe

Answer 28

-in person-years f.e. 795.000 new cases in the US (324 million) 795.000/324 million = 0.25 -> meaning for every person in the US, there will be 0.25 new cases per year -> or 2.5 new strokes in 1000 people per year

Answer 29

Because some people may not be followed within the same period of time -> So the people that have been followed are multiplied by the period of time they have been followed -it normalizes the data and can be combined into one f.e. 10 people w/ stroke - 6 months = 20 people (person-year)

Answer 30

How many in the population have the disease in a period of time -> in percentage

Answer 31

The probability of getting a positive test result if the patient has the suspected disease True Positive

Answer 32

The probability of getting a negative test result if the patient does NOT have the disease True Negative

Answer 33

-Probability of getting a False positive test is low (3%) -The Specifity is high at 97% (disease absent and getting a negative test result)

Answer 34

-Combination of test approaches -Start with a test with a reasonably high Sensitivity to detect anyone who potentially has the disease (tested positive but could be false positive) -For those who tested positive with a low Sensitivity test -> test again with a test with high Specificity and high Sensitivity for clarification

Measurment and Descriptive Analysis Flashcards

(58 cards)