Statistics - Representations of data Flashcards

1
Q

What is an outlier?

A

An extreme value that lies outside overall pattern of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the common definition of an outlier?

A

Any value that is:
Greater Q3 + k x IQR
Less than Q1 - k x IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an anomaly?

A

Outlier that is removed from data since it is clearly an error and it would be misleading to keep it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is cleaning the data?

A

The process of removing anomalies from a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What features does a box plot show with lines?

A

Lowest value
Highest value
Q1, Q2, Q3
Outliers are a cross

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What should be done when comparing two box plots?

A

Use the same scale

Compare medians, IQR and extremes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is bivariate data?

A

Data which has pairs of values for two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is usually plotted on each axis?

A

x-axis - independent variable

y-axis - dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a causal relationship?

A

When a change in one variable causes a change in the other

Correlation doesn’t show causation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the regression line?

A

Straight line that minimises the sum of the squares of the distance of each data point from the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the equation of the regression line?

A

y = a + bx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When should you use a regression line to make predictions?

A

When values are within the range of the given data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are histograms used for?

A

Continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the equation for frequency density?

A

Frequency density = frequency / class width

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the equation for frequency and area?

A

Frequency = k x area of the bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does unimodal mean?

A

Data has one point where the distribution peaks

17
Q

What does bimodal mean?

A

Two points in data where the distribution peaks

18
Q

What is bivariate data?

A

Data made up of pairs

x,y

19
Q

What is a scatter diagram?

A

Each variable is plotted along one of the axes

20
Q

What are scatter diagrams used for?

A

Showing whether data is correlated

21
Q

What is important to remember about correlation?

A

Correlation does not mean causation

Could be linked by another factor

22
Q

What is linear regression?

A

Process used to find equation of regression line (line of best fit)

23
Q

What is the explanatory variable?

A

Independent variable - variable which is affecting the other, always on horizontal axis

24
Q

What is the response variable?

A

Dependent variable - variable being affected, always on vertical axis