Describing Data Flashcards by Yazan Halasa

What does descriptive statistics do?

Helps to organise and summarise data in easily communicable mannger.

How well did you know this?

Not at all

Perfectly

What are measures of central tendency?

Mean
Median
Mode

How well did you know this?

Not at all

Perfectly

Is the mean or median more affected by extreme values?

Mean

How well did you know this?

Not at all

Perfectly

What makes the mean more accurate?

Higher number of samples

How well did you know this?

Not at all

Perfectly

What is the unit of mean the same as?

The unit of original measure

How well did you know this?

Not at all

Perfectly

What is a geometric mean?

When individual observations are log transformed, averaged and then back-transformed using antilog

How well did you know this?

Not at all

Perfectly

Advantage of geometric mean?

Will be closer to median if log-transformed data had symmetrical distribution

How well did you know this?

Not at all

Perfectly

Difference between mean and geometrical mean?

Geometrical mean will be less

How well did you know this?

Not at all

Perfectly

What is weighted mean?

Individual values are multiplied by weights (constants) attached to them before averaging

How well did you know this?

Not at all

Perfectly

When is weighted mean used?

When some individual observations are more or less valuable than others

How well did you know this?

Not at all

Perfectly

Another name for the median?

50th percentile

How well did you know this?

Not at all

Perfectly

What data is median preferable for?

Nominal data when treated as values (not as counts)

How well did you know this?

Not at all

Perfectly

What does 5th percentile mean?

The value below which 5% of observations lie

How well did you know this?

Not at all

Perfectly

What type of data is mode mostly used for?

Nominal

How well did you know this?

Not at all

Perfectly

When can mode be useful for ordinal data?

To understand most common rating obtained

How well did you know this?

Not at all

Perfectly

In which type of distribution are the mean, mode and median equal?

Normal, symmetric distribution

How well did you know this?

Not at all

Perfectly

Where will median lie in skewed distribution?

Between mean and mode

How well did you know this?

Not at all

Perfectly

What happens to mean in positive skew?

Mean will be higher than median

How well did you know this?

Not at all

Perfectly

Name some measures of variability

Range
Variance
SD
SE

How well did you know this?

Not at all

Perfectly

What is range?

Difference between highest and lowest scores in a distribution

How well did you know this?

Not at all

Perfectly

What is the interquartile range?

Difference between 75th and 25t percentile

How well did you know this?

Not at all

Perfectly

Why does variance give more information than the range?

Includes scores in a distribution

How well did you know this?

Not at all

Perfectly

Formula for variance

Sum of squared differences of individual observations from mean/(number of observations - 1)

How well did you know this?

Not at all

Perfectly

What is degrees of freedom?

N-1

How well did you know this?

Not at all

Perfectly

When is variance high?

When scores are widely scattered

How is variance expressed?

In squared units of the original measure

What is the formula for SD?

Square root of variance

What is the most commonly used measure of dispersion?

What is coefficient of variation a measure of?

Relative spread of data

How does one calculate the coefficient of variation?

Sd / mean

Unit of coefficient of variation?

Percentage

Formula of SE?

SD / square root of sample size

What leads to smaller SE?

Larger sample

What do authors use SE for?

To describe variability of sample

What does SE give estimate of?

How the mean of the sample is related to the mean of the population Precision and uncertainty of how study sample represents population

What does SD estimate?

Variability in study sample

What does SE tell us of the mean?

How precise our estimate of the mean is

Graphs used for categorical and discrete numerical data

Bar chart | Pie chart

Graphs for continuous data

Histogram Dot plot Scatter diagram

Difference between bar chart and histogram

No gaps between bars so data is continuous

How to draw a dot plot

Dot placed for each observation along one axis

When does dot plot become a scatter gram?

When dot plot is extended to two axes

What measures can be plotted on a scattergram?

Two continuous measures

What happens in a steam and leaf plot?

Plot first few digits of numerical observation along vertical axis Then add numbers to one or both sides to represent individual values of observations

What is a box whisker plot?

Rectangle drawn encompassing 2nd and 3rd quartile of observations Median value is the line cutting through the rectangle

What do whiskers in box whisker plot show?

Minimum and maximum values of observation

Why is a normal distribution important?

A number of statistical tests assume data comes from normal distribution In a normal population, the mean and variance (and SD) are not dependent on each other Many natural phenomena are normally distributed Central limit theorem

What is the central limit theorem?

States that if we draw equally sized samples from a non-normal distribution, the distribution of the means of these samples will still be normal as long as the samples are large enough

What sample size is large enough to give normal distribution for experimental purposes?

Properties of normal distribution

Bell shaped Mean, median and mode are same value Curve is symmetric about the mean - skew is 0 Kurtosis is 0 Tials of curve reach close to x axis but never touch it

What is kurtosis?

Flatness of the curve

What parameters have to be specified to describe normal distribution

Mean - where the peak of the density occurs | SD - indicates spread of curve

At a given value for variance, what will higher mean to do a cure

Shift curve to right

At a given value for mean, what will higher Sd do to curve?

Decrease peakedness of curve

At a given value for a mean, what will lower SD do to a curve?

Increase peakedness

What is a leptokurtic curve?

Sharp peak

What is a standard normal distribution?

Normal distribution whose mean is 0 and SD is 1 unit

What is standard normal deviate expression denoted by?

What is the formula for standard normal deviate?

(random value 'x' - mean) / SD

Value of mean in negative skew?

Left of the median

What is the interquartile range?

Distance from value at 1st quartile to value at 3rd quartile

SE calculation

SD/square root of n

Calculation for CI for population mean

Mean +/- 1.96 x SE

What is Gaussian distribution?

Normal distrbution

What do one tailed tests do?

Examine only one direction of alternative hypothesis

What is usual value of beta?

0.2

What is an unpaired test?

2 groups have different subjects

What is a paired test?

Same subjects at different points in time

Descriptions of categorical data

Mode | Frequency

Descriptions of non-normal data

Median | Inter-quartile range

Descriptions of normal data

Mean | SD

Comparing two unpaired groups of categorical data

Chi-squared | Fischer's exact test

Comparing two paired categorical groups

McNemars

Comparing two unpaired non-normal groups

Mann-Whitney U Test

Comparing two paired non-normal groups

Wilcoxon's rank sum test

Comparing paired or unpaired normal data

Student's t test

Comparing > 2 paired categorial data

Chi-squared

Comparing >2 unpaired categorial groups

McNemars test

Comparing >2 unpaired non-normal groups

Kruskal-Wallis ANOVA

Comparing >2 paired non-normal groups

Friendman test

Comparing >2 normal data; paired or unpaired

ANOVA

What do statistical tests give us?

Value for p

What types of data are contingency tables used for?

Categorical

X and Y axis for contingency tables

X: Outcome Y: Risk/variable

Impact of small sample size on correlation coefficient?

Less the value of r

How can one dampen the effect of outlying values in small samples?

Using ranks of raw data instead of absolute numbers

What is used if both variables are normal

Pearson

What is used if 1 variable is normal, the other non-normal

Spearman

What is used if 1 variable is normal, the other categorical

Spearman

What is used if 1 variable is non-normal, the other normal

Spearman

What is used if both variables are non-normal?

Kendall

What is used if one variable is categorical and the other normal?

Spearman

What is used if both variables are categorical?

Spearman | Kendall

What does regression equation do?

Describes relationship between 2+ variables by an equation that has a predictive value

What is needed to construct a regression line?

Regressoin equation

What can a regression line represent?

Relationship between variables on a scattergraph

Where on the scattergraph is the IV?

X axis

Where on the scattergraph is the DV?

Y axis

Equation of best fit for regression line

y=a+bx

What is a in y=a+bx

intercept of the regression line on y axis

What is b in y=a+bx

Regression coefficient (slope of regression line)

What does b in y=a+bx describe

Strength of relationship

What is x in y=a+bx

Value of IV

What happens to PPV and NNV as prevalence of a disorder decreases?

PPV will decrease | NNV will increase

What is serial testing?

When 2 or more tests are used in sequence until the test returns a negative result A diagnosis is only confirmed if all tests return a positive test

Advantages of serial testing

Increases specificity | Useful if treatment is hazardous

What does larger AUC in ROC curve correspond to?

The better the test

AUC of 0.5 in ROC curve?

Worthless test

AUC of 1 in ROC cure?

Perfect test

How is cumulative survival probability calculated?

When end event occurs, survival probabilities are determined by using survival probability prior to event occurring and adjusting this using post event survival rate of remaining uncensored subjects.

Endpoint probability calculation?

1 - survival probability

What is hazard?

Probability that a subject will have an endpoint at a given time

What does hazard >1 mean

The factor increases risk of outcome

What does hazard <1 mean

Factor decreases risk

What does it mean if chi square is bigger than its degree of freedom?

Evidence of heterogeneity

How does forest plot show evidence of heterogeneity?

CI do not overlap with other studies

Describing Data Flashcards

(116 cards)