Topic 1 - Summarising Data Flashcards

1
Q

What is a 5-number summary?

A

A five-number summary simply consists of the smallest data value, the first quartile, the median, the third quartile, and the largest data value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a first quartile ?

A

• First quartile (known as lower quartile or Q1): is the point between the
lowest 25% of values and the highest 75% of values. It is also called the
25th percentile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a second quartile?

A

• Second quartile (known as median or Q2): the midpoint of the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is third quartile?

A

• Third quartile (known as upper quartile or Q3): is the point between the
lowest 75% and highest 25% of values. It is also called the 75th percentile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to work out the location of first quartile?

A

Loc Q1 = 0.25(n+1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to work out the location of second quartile?

A

Loc Q2 = 0.5(n+1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to work out the location of third quartile?

A

Loc Q3 = 0.75(n+1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a maximum data point?

A

• Maximum: largest value in the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to work out location of a percentile ?

A

Loc =( p/100) x (n+1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to work out median?

A

Location = 0.5(n+1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a central tendency?

A

Measures of central tendency are examples of descriptive data statistics that depict an overall ‘central’ trend of a set of data. There are three key measures: Mode, Median , Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does spread include?

A
  • Range : the difference between the smallest value and the largest value in a dataset

Range = Maximum - Minimum

  • Percentile (quartile is a special case): the value below which a percentage of data falls

Loc = (p/100) x (n+1)

  • Interquartile range (IQR): the difference between the upper quartiles

IQR = Q3 - Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a variance ?

A

Variance is a statistical measurement used to determine how far each number is from the mean and from every other number in the set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a standard deviation

A

It tells you, on average, how far each score lies from the mean. In normal distributions, a high standard deviation means that values are generally far from the mean, while a low standard deviation indicates that values are clustered close to the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to calculate variance and standard deviation?

A

1.) calculate the range = maximum - minimum
2.) calculate IQR = IQR = Q3-Q1
3.) calculate variance ((x(i)-x)^2)/ (n-1) = s^2 ( 2 d.p)
- when calculating (x(i)) it represents the dataset
- when calculating (x) it is the mean of the dataset
- when calculating (n) it represents the total no. Of datasets
4.) calculate standard deviation find the square root of S from above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is frequency of a value?

A

Is the number of times it occurs in a dataset

17
Q

What is a frequency distribution

A

Is the pattern of frequencies of a variable. It is the number of times each possible value occurs in a dataset

18
Q

What is relative frequency distribution

A

The proportion (probability) of observations of each value or class interval of a variable

19
Q

What is a histogram

A

Is a plot that uses class/intervals on the x-axis and the frequency on the y-axis

20
Q

What is nominal data?

A

Nominal: it is a naming scale, where
variables are simply “named” or
labeled, with no specific order:
Gender, postcode, political
preference,

21
Q

What is ordinal data?

A

Ordinal: it has all its variables in a
specific order, beyond just naming
them: socioeconomic status, military
ranks, and letter grades for
coursework,

22
Q

What is discrete data?

A

Discrete: a data type that involves counting
rather than measurement. (a limited
number of values is possible): the number
of students attended class

23
Q

What is continues data

A

Continuous: a data type that refers to the
unspecified number of possible
measurements between two realistic points
(data that can take any value, infinite
number of values is possible): height,
temperature,

24
Q

What displays can be used for numerical data?

A

• A histogram displays the frequencies or relative frequencies in each
class
• A line chart plot shows data which occur at regular time intervals
• A box plot compares two or more numerical variables

25
What displays can be used for categorical data?
• A bar chart shows the frequencies of categorical data • A pie chart shows the proportion of the data in each category
26
What is population data?
• Population data is used when you are gathering data from every individual of interest.
27
What is sample data?
Sample data is used when you are gathering data from some of the individuals of interest.