Describing Data Flashcards

1
Q

What does descriptive statistics do?

A

Helps to organise and summarise data in easily communicable mannger.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are measures of central tendency?

A

Mean
Median
Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Is the mean or median more affected by extreme values?

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What makes the mean more accurate?

A

Higher number of samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the unit of mean the same as?

A

The unit of original measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a geometric mean?

A

When individual observations are log transformed, averaged and then back-transformed using antilog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Advantage of geometric mean?

A

Will be closer to median if log-transformed data had symmetrical distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Difference between mean and geometrical mean?

A

Geometrical mean will be less

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is weighted mean?

A

Individual values are multiplied by weights (constants) attached to them before averaging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is weighted mean used?

A

When some individual observations are more or less valuable than others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Another name for the median?

A

50th percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What data is median preferable for?

A

Nominal data when treated as values (not as counts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does 5th percentile mean?

A

The value below which 5% of observations lie

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What type of data is mode mostly used for?

A

Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When can mode be useful for ordinal data?

A

To understand most common rating obtained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In which type of distribution are the mean, mode and median equal?

A

Normal, symmetric distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Where will median lie in skewed distribution?

A

Between mean and mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What happens to mean in positive skew?

A

Mean will be higher than median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Name some measures of variability

A

Range
Variance
SD
SE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is range?

A

Difference between highest and lowest scores in a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the interquartile range?

A

Difference between 75th and 25t percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Why does variance give more information than the range?

A

Includes scores in a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Formula for variance

A

Sum of squared differences of individual observations from mean/(number of observations - 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is degrees of freedom?

A

N-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
When is variance high?
When scores are widely scattered
26
How is variance expressed?
In squared units of the original measure
27
What is the formula for SD?
Square root of variance
28
What is the most commonly used measure of dispersion?
SD
29
What is coefficient of variation a measure of?
Relative spread of data
30
How does one calculate the coefficient of variation?
Sd / mean
31
Unit of coefficient of variation?
Percentage
32
Formula of SE?
SD / square root of sample size
33
What leads to smaller SE?
Larger sample
34
What do authors use SE for?
To describe variability of sample
35
What does SE give estimate of?
How the mean of the sample is related to the mean of the population Precision and uncertainty of how study sample represents population
36
What does SD estimate?
Variability in study sample
37
What does SE tell us of the mean?
How precise our estimate of the mean is
38
Graphs used for categorical and discrete numerical data
Bar chart | Pie chart
39
Graphs for continuous data
Histogram Dot plot Scatter diagram
40
Difference between bar chart and histogram
No gaps between bars so data is continuous
41
How to draw a dot plot
Dot placed for each observation along one axis
42
When does dot plot become a scatter gram?
When dot plot is extended to two axes
43
What measures can be plotted on a scattergram?
Two continuous measures
44
What happens in a steam and leaf plot?
Plot first few digits of numerical observation along vertical axis Then add numbers to one or both sides to represent individual values of observations
45
What is a box whisker plot?
Rectangle drawn encompassing 2nd and 3rd quartile of observations Median value is the line cutting through the rectangle
46
What do whiskers in box whisker plot show?
Minimum and maximum values of observation
47
Why is a normal distribution important?
A number of statistical tests assume data comes from normal distribution In a normal population, the mean and variance (and SD) are not dependent on each other Many natural phenomena are normally distributed Central limit theorem
48
What is the central limit theorem?
States that if we draw equally sized samples from a non-normal distribution, the distribution of the means of these samples will still be normal as long as the samples are large enough
49
What sample size is large enough to give normal distribution for experimental purposes?
30
50
Properties of normal distribution
Bell shaped Mean, median and mode are same value Curve is symmetric about the mean - skew is 0 Kurtosis is 0 Tials of curve reach close to x axis but never touch it
51
What is kurtosis?
Flatness of the curve
52
What parameters have to be specified to describe normal distribution
Mean - where the peak of the density occurs | SD - indicates spread of curve
53
At a given value for variance, what will higher mean to do a cure
Shift curve to right
54
At a given value for mean, what will higher Sd do to curve?
Decrease peakedness of curve
55
At a given value for a mean, what will lower SD do to a curve?
Increase peakedness
56
What is a leptokurtic curve?
Sharp peak
57
What is a standard normal distribution?
Normal distribution whose mean is 0 and SD is 1 unit
58
What is standard normal deviate expression denoted by?
z
59
What is the formula for standard normal deviate?
(random value 'x' - mean) / SD
60
Value of mean in negative skew?
Left of the median
61
What is the interquartile range?
Distance from value at 1st quartile to value at 3rd quartile
62
SE calculation
SD/square root of n
63
Calculation for CI for population mean
Mean +/- 1.96 x SE
64
What is Gaussian distribution?
Normal distrbution
65
What do one tailed tests do?
Examine only one direction of alternative hypothesis
66
What is usual value of beta?
0.2
67
What is an unpaired test?
2 groups have different subjects
68
What is a paired test?
Same subjects at different points in time
69
Descriptions of categorical data
Mode | Frequency
70
Descriptions of non-normal data
Median | Inter-quartile range
71
Descriptions of normal data
Mean | SD
72
Comparing two unpaired groups of categorical data
Chi-squared | Fischer's exact test
73
Comparing two paired categorical groups
McNemars
74
Comparing two unpaired non-normal groups
Mann-Whitney U Test
75
Comparing two paired non-normal groups
Wilcoxon's rank sum test
76
Comparing paired or unpaired normal data
Student's t test
77
Comparing > 2 paired categorial data
Chi-squared
78
Comparing >2 unpaired categorial groups
McNemars test
79
Comparing >2 unpaired non-normal groups
Kruskal-Wallis ANOVA
80
Comparing >2 paired non-normal groups
Friendman test
81
Comparing >2 normal data; paired or unpaired
ANOVA
82
What do statistical tests give us?
Value for p
83
What types of data are contingency tables used for?
Categorical
84
X and Y axis for contingency tables
X: Outcome Y: Risk/variable
85
Impact of small sample size on correlation coefficient?
Less the value of r
86
How can one dampen the effect of outlying values in small samples?
Using ranks of raw data instead of absolute numbers
87
What is used if both variables are normal
Pearson
88
What is used if 1 variable is normal, the other non-normal
Spearman
89
What is used if 1 variable is normal, the other categorical
Spearman
90
What is used if 1 variable is non-normal, the other normal
Spearman
91
What is used if both variables are non-normal?
Kendall
92
What is used if one variable is categorical and the other normal?
Spearman
93
What is used if both variables are categorical?
Spearman | Kendall
94
What does regression equation do?
Describes relationship between 2+ variables by an equation that has a predictive value
95
What is needed to construct a regression line?
Regressoin equation
96
What can a regression line represent?
Relationship between variables on a scattergraph
97
Where on the scattergraph is the IV?
X axis
98
Where on the scattergraph is the DV?
Y axis
99
Equation of best fit for regression line
y=a+bx
100
What is a in y=a+bx
intercept of the regression line on y axis
101
What is b in y=a+bx
Regression coefficient (slope of regression line)
102
What does b in y=a+bx describe
Strength of relationship
103
What is x in y=a+bx
Value of IV
104
What happens to PPV and NNV as prevalence of a disorder decreases?
PPV will decrease | NNV will increase
105
What is serial testing?
When 2 or more tests are used in sequence until the test returns a negative result A diagnosis is only confirmed if all tests return a positive test
106
Advantages of serial testing
Increases specificity | Useful if treatment is hazardous
107
What does larger AUC in ROC curve correspond to?
The better the test
108
AUC of 0.5 in ROC curve?
Worthless test
109
AUC of 1 in ROC cure?
Perfect test
110
How is cumulative survival probability calculated?
When end event occurs, survival probabilities are determined by using survival probability prior to event occurring and adjusting this using post event survival rate of remaining uncensored subjects.
111
Endpoint probability calculation?
1 - survival probability
112
What is hazard?
Probability that a subject will have an endpoint at a given time
113
What does hazard >1 mean
The factor increases risk of outcome
114
What does hazard <1 mean
Factor decreases risk
115
What does it mean if chi square is bigger than its degree of freedom?
Evidence of heterogeneity
116
How does forest plot show evidence of heterogeneity?
CI do not overlap with other studies