Basic stats Flashcards

1
Q

Descriptive statistics

A

Describing the data you have and presenting it in an easily understandable manner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Inferential statistics

A

Using data from a sample to make predictions or estimations about a larger group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Mean

A
  • Measure of centre
  • Average value of a dataset
  • Use it when data is normally distributed and doesn’t contain outliers
  • Calculated by summing up all the values in a dataset and dividing by the total number of values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Median

A
  • Measure of centre
  • Middle value in a dataset
  • More robust to outliers
  • Use it when the data is skewed or contains outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Mode

A
  • Measure of centre
  • Value that appears most frequently in a dataset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Variance

A
  • Measure of spread
  • Measures the dispersion or spread of the values in a dataset from the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Standard deviation

A
  • Measure of spread
  • Square root of the variance
  • Provides a measure of the average deviation of each data point from the mean
  • A higher standard deviation indicates greater variability in the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Range

A
  • Measure of spread
  • Difference between the maximum and minimum values in a dataset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

IQR: Interquartile Range

A
  • Measure of spread
  • Quantifies the spread of a dataset
  • Range between the first quartile (Q1) and the third quartile (Q3) of a dataset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sampling and simulations

A

Selecting a subset of individuals or observations from a larger population for analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Regression analysis

A

Allows us to find out and predict how changes in one variable are associated with changes in another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hypothesis testing

A

Used to determine whether there is enough evidence to support a claim about a population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Confidence interval

A

A range of values that is likely to contain the true value of a population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Probability

A

Measures the likelihood of an event occurring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Correlation

A

Measures the strength and direction of the relationship between two variables.
It ranges from -1 to 1, where:
1 indicates a perfect positive correlation
-1 indicates a perfect negative correlation
0 indicates no correlation
Correlation does not imply causation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Signal VS Noise

A

Signal: true underlying pattern or effect you’re interested in detecting and understanding
Noise: random, irrelevant variations or errors that can distort the signal/pattern

17
Q

Exploratory Data Analysis (EDA)

A

Aims to understand the characteristics and relationships of the data (mostly through visualisations) leading to potential insights

18
Q

Univariate analysis

A

Analysis of each variable’s distribution individually

19
Q

Bivariate analysis

A

Explores relationships between pairs of variables

20
Q

Multivariate analysis

A

Explores relationships between three or more variables

21
Q

Normal distribution

A
  • A symmetric, bell-shaped probability distribution where data clusters around the mean, with fewer data points in the tails
  • Used in statistical analysis due to their predictable properties