Statistics Basics Flashcards

1
Q

What is continuous data?

A

Data that can take on any value and are not confined to take specific numbers. Their values are limited only by precision (e.g., number of decimal places (e.g., 6.2%, 6.24% or 6.238%).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is discrete data?

A

Data that can only take on certain values, which are usually integers (e.g., number of people in a particular underground carriage). They do not necessarily have to be integers and are often defined to be count numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a probability density function?

A

A function which shows the range of values that a continuous random variable can take and how likely each range of outcomes is to occur (e.g., a normal distribution).
The probability that a continuous variable takes on a specific value is always zero, since the variable could be defined to any arbitrary degree of accuracy (e.g. 0.1 vs 0.100001) and thus we can only calculate the probability that the variable lies within a particular range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the arithmetic mean?

A

The sum of all N observations divided by N.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the mode?

A

The most frequently occuring value in a set of observations/data. A set of data can have more than one mode or no mode at all.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the median?

A

The middle value in a series when the observations are arranged in ascending order. If the number of values is an even number, the median is the mean of the two middle numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is variance and standard deviation?

A

The spread of a series about its mean value. A measure of the variation of data in a dataset. Standard deviation is the square root of the variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the range of a dataset?

A

The difference between the largest and smallest data point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the semi-interquartile range?

A

The range of the central 50% of the data, the difference between the first (lower) and third (upper) quartile points in the series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the coefficient of variation?

A

A unit-free measure of the spread of a series.
CV = σ/μ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens to standard deviation if you add or subtract a constant number to each value?

A

The standard deviation would not change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What happens to standard deviation if you multiply each value by a constant number, k?

A

New standard deviation = k x Old standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is skewness?

A

A measure of the extent to which a distribution is asymmetric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does a positive skew affect the position of the mean relative to mode and median?

A

A positive (right) skew distribution will have a mean that is greater than the median and mode as it is being pulled by the large values in the tail of the distribution. The mode is always the peak of a distributon.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is kurtosis?

A

A measure of the extent to which a series is ‘fat’ or ‘thin’ tailed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are covariance and correlation? Which is better and why?

A

Covariance is a linear measure of association between two series.
Correlation is a unit-free measure of association that must lie betweem -1 and +1.
Correlation (Pearson’s correlation measure) is a better measure of association as it does not scale with standard deviation and is a standardised measurement (always same unit, so can be easily compared).