Measurment and Descriptive Analysis Flashcards

1
Q

Why is it important to classify the type of data?

A

-It determines the type of statistical test that is going to be used

-the type of data will determine how it is described

-when analyzing an article the type of data needs to be determined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data used for Qualitative data

A

-Categorial (Nominal data)

-Ordinal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How are qualitative data described?

A

-Qualitative
-No mathematical data
-fall into distinct and discrete categories (finite number of categories)
-Gender (1=male, 2=female)
-Pass/fail
-Race
-Eye color
-Clinical diagnosis (1=heart failure, 2=renal failure,..)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Characteristics of categorical data

A

-Qualitative
-There is no natural order between categories (eye color, dead or alive, male or female)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are dichotomous data?

A

If there are only 2 groups, data are
dichotomous (e.g., male/female)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an Ordinal data?

A

-Qualitative
-data with natural order
-Values/observations can be ranked (put
in order) or have a rating scale attached (f.e. rate your experience from good to bad)

-Numbers are not arbitrary in ordinal data (it has a meaning, f.e. the higher the better)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are examples of Ordinal data?

A

-Pain scale (ranked, but not continuous)
-Likert scale (Strongly agree=5, agree=4, undecided=3)
-both are not continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The average score of an ordinal data (from 1 to 5) is 4.75, what is wrong with that statement?

A

4.75 does not represent a given category
it is better to use the median (middlemost value) rather than the average, bc the median (=3) fits into a category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Quantitative data?

A

-have mathematical meaning
-derived from counts or measurements
-most biological systems are represented in quantitative data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What type of data is used for Quantitative data?

A

-Continuous data
-values can take on any number (also fractions)
-biomedical values are continuous
-temperature, blood pressure, weight, LDL, age

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Baseline Characteristics important for?

A

-Internal Validity: ensure that both groups are similar, thereby preventing cofounding

-External Validity: are the results generalizable to another location?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The duration of treatment:
The number of patients who are treated < 4 wk in (%):
Drug A: 60 (25.1%) n=239
Drug B: 44 (18.5%) n=238
Drug C: 55 (22%) n=250

What type of data is that?

A

-Dichotomous: YES OR NOR
-> The question is: are pt for a Drug treated for less than 4 weeks / YES OR NO

for Drug A: 60 were treated for < 4 weeks: YES (179 were not)
for Drug B: 44 were treated for <4 weeks
for Drug C: 55 were treated for <4 weeks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Concomitant psychotropic treamtment with
Trazodone: 23
Anxiolytics: 44
Seative or hypnotics: 19

What type of data?

A

Categorial bc it can be put into buckets, which are Trazodone/Anxiolytics/Sedative or hypnotics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How many patients had Fever/Cough/Ronny nose?
Fever: 213
Cough: 163
Runny nose:78

What type of data?

A

Categorial (Nominal)
-can be put in buckets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which type of data do percentages often fall into?

A

-Qualitative data
-Categorial, Nominal, dichotomous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Descriptive Data?

A

-Measures of Central Tendency (values around the mean)

-Measure of Variability: How scattered, dispersed are the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does “Measures of Central Tendency” mean?

A

The data has the tendency to convert on the most central value (Median and Mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How is the Measures of Variability expressed?

A

-Standard deviation SD
-standard error of the mean SEM
-confidence intervals
-range
-percentile
-interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the purpose of descriptive data?

A

-Describe, organize, or summarize actual data
-No statistical conclusions are drawn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a “Mean”?

A

-Arithmetic average of the data
-Affected by outliers (extreme values of
data distribution
-often used to describe normally distributed continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which value is less affected by outliers?

A

-The median

-bc we have more values in the normal range which are near to the median and outliers far away from the median -> thereby less affected by outliers -> giving a better picture of the average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the Median?

A

-Mid-most value (50th percentile)
-Half the data points are above and below
-Unaffected by outliers
-Often used to describe non-normally
distributed continuous data
-often used to describe ordinal data (Pain scale, Likert scale)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the median out of these values?

A

1, 2, 3, 4, 5 -> 3

1, 2, 3, 4, 5, 6, 7, 8 -> calculate the average of the middlemost values -> 4+5 = 9/2 = 4.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Can Mean/Median be used for Continuous Data and Ordinal data?

A

Continuous Data - Mean: Y Median: Y

Ordinal Data - Mean: N Median: Y

Nominal Data: Mean: No Median: No

25
Q

Is the Mean/Median affected by outliers?

A

Mean: Yes Median: No

26
Q

How can the distribution of data be organized?

A

Distribution or graph of the frequency of occurrence

27
Q

What does a normal Distribution look like?

A

-Symmetrical, bell-shaped
-also called Gaussian distribution

28
Q

What is the Standard deviation SD?

A

-used to describe the variability of normally distributed data

-gives an idea of the width of the curve, the spread of the data around the mean
f.e.: mean = 75 -> SD = 10
–> 75 +/- 10

-most commonly used measure of data variability with medical and health data

29
Q

What are the percentages of data represented by SD?

A

1 SD: represents 68.2%
2 SD: represents 95.5%
3 SD: represents 99.7%

30
Q

What do whiskers represent on bars in a graph?

A

Standard deviation

31
Q

What affects the SD?

A

-Number of patients (a small number of patients will result in a large SD, a large number of pt in a small SD)

-outliers: will increase the SD in one direction, no longer bell-shaped, Gaussian distributed -> skewed

32
Q

What does a skewed-shaped data curve imply?

A

-the data is distributed to one side of the data curve

-the standard deviation is the wrong measure to use bc SD is best used with a bell-shaped curve

33
Q

What are Z-scores?

A

-The number of Standard deviations SD away from the mean you are

-f.e: Z=+1.65 represents 5% of the normal population, the rest (95%) are under the curve

-f.e. Z= if a heart rate of 65bpm lies 1.5 SD below the mean it has a Z-score of 1.5

34
Q

How much data is under the curve with a Z-score of 1.5?

A

represents 5% outside of the curve and 95% under the curve

35
Q

How much data is under the curve with a Z-score of 1.96?

A

represents 2.5% outside of the curve and 97.5% under the curve
-in the case of +/-1.96 it would be 2.5% on each side outside of the curve = 5% outside of the curve -> 95% under the curve

36
Q

What does a confidence interval of 95% imply?

A

We include 95% of the data

37
Q

What does a Skewed data curve look like?

A

-skewed to the left (negative Z-score) or to the right (positive Z-score)

-Tail off to either the right or low end of the measurement

38
Q

What is the Interquartile range?

A

-The first quartile cuts off the lowest 25% of the data

-The third quartile cuts off the highest 25% of the data
-IQR = 25th to 75th percentile

-Midspread is the Middle 50

39
Q

Example of IQR

A

so when given an IQR of 65-95 we know that most of the data is between 65-90 (50%) and 25% is in the lower and 25% in the higher quartile

40
Q

Explain Box Plots

A

-The Bottom of the box is the 25th percentile
-The top of the Box is the 75th percentile
-Black bar in the middle is the median
-The whiskers on the bottom are the 10th percentile and the whiskers on the top are the 90th percentile

-The dots represent values outside of the 90th percentile

41
Q

If the mean of LDL values is 100 and the SD is +/- 40, what would be the shape of the data curve?

A

-skewed with values at the higher end

-it can not be Gaussian shaped because with 3 SD (-120) we can’t go below 0

42
Q

If the mean of the exam score is 85 and the SD is +/- 15, what would be the shape of the data curve?

A

-possibly negatively skewed
-we can’t go over 100 with (2 SD or 3SD)

43
Q

SD formula
How are the number of patients and outliers related to the SD?

A

-Patients: Inversely -> the more patients the smaller the SD

-Outliers: proportional -> the greater the distance of a data point from the mean -> the greater the SD

44
Q

What is the crude mortality rate?

A

measures the share among the entire population that have died from the disease

-CALCULATE: the number of deaths DIVIDED by the total population

45
Q

Why can the crude mortality rate be misinterpreted?

A

-it can make a disease looke more harmless, because it takes the whole population into account, regardless if some were not even exposed to the disease

46
Q

What is the Case Fatility rate CFR?

A

-the ones who died from the disease among all who were diagnosed with the disease OVER a period of time

-The measure of disease severity

-# of deaths in a period of time DIVIDED by the # of individuals diagnosed with the disease in that time X 100 (for percentage)

47
Q

Why is the CFR the measure of severity?

A

Bc only if we look at the people who actually have the disease, we can tell how deadly the disease is
-> Exclude all those who don’t have the disease

48
Q

How might the CFR be misinterpreted?

A

-it is not the same as the risk of death for an infected person

-it is the ratio between the #of deaths from the disease and the #of confirmed cases (not total cases)

-it is less accurate than the IFR because it doesn’t take patients into account who were not diagnosed but still have the disease

49
Q

What is the Infection Fatality Rate IFR?

A

of deaths from a disease / #of ALL cases (not confirmed cases)

-the IFR tells if someone is infected with the disease, how likely is it to die from it

50
Q

Why is the IFR more accurate than the CFR?

A

Because the IFR takes all cases into account, whereas the CFR only refers to the #of confirmed cases (diagnosed)

51
Q

What is the Incidence?

A

-Occurrence of new cases of disease or injury in a population over time

-Incidence = New cases / population * Timeframe

52
Q

How can the incidence be specified?

A

-in person-years
f.e. 795.000 new cases in the US (324 million)
795.000/324 million = 0.25 -> meaning for every person in the US, there will be 0.25 new cases per year -> or 2.5 new strokes in 1000 people per year

53
Q

Why is the period of time and the number of people combined -> person-years?

A

Because some people may not be followed within the same period of time

-> So the people that have been followed are multiplied by the period of time they have been followed

-it normalizes the data and can be combined into one

f.e. 10 people w/ stroke - 6 months = 20 people (person-year)

54
Q

What is the Prevalence?

A

How many in the population have the disease in a period of time -> in percentage

55
Q

What is Sensitivity?

A

The probability of getting a positive test result if the patient has the suspected disease
True Positive

56
Q

What is Specificity?

A

The probability of getting a negative test result if the patient does NOT have the disease
True Negative

57
Q

If the disease in a patient is Absent and the probability of getting a positive test result is 3%, what does that say about a diagnostic test?

A

-Probability of getting a False positive test is low (3%)

-The Specifity is high at 97% (disease absent and getting a negative test result)

58
Q

What is the strategy to prevent Interventions after False positive test results?

A

-Combination of test approaches

-Start with a test with a reasonably high Sensitivity to detect anyone who potentially has the disease (tested positive but could be false positive)

-For those who tested positive with a low Sensitivity test -> test again with a test with high Specificity and high Sensitivity for clarification