Statistics - Summarising and presenting data Flashcards

Question 1

Q

What is the purpose of statistics?

Answer

A

To summarise and present the information contained in a data set
To handle and quantify variation and uncertainty in the data, to help to infer what they tell us about the underlying theory of interest.

Question 2

Q

What are the 5 main summary measures of any numerical data?

Answer

A

Mean, Median, Mode, range, and inter-quartile range (IQR)

Question 3

Q

How do you calculate Mean?

Answer

A

Add all the values together and divide by how many values there are

Question 4

Q

How do you calculate Median?

Answer

A

The median is the middle value. Arrange all of the values in size order and locate the middle value.

If there are 2 middle values calculate the number between the middle values.

Question 5

Q

How do you calculate inter-quartile range (IQR)?

Answer

A

Inter-quartile range (IQR) is the difference between the 75th and 25th percentiles of the data.

There are 4 rank -ordered even parts that give quartiles (Q1, Q2, and Q3):
- Q1 / lower quartile / 25%
- Q2 / the median / 50%
- Q3 / upper quartile / 75%

IQR = Q3 - Q1

Question 6

Q

How do you calculate range?

Answer

A

Range = largest value - smallest value

Question 7

Q

How do you calculate Mode?

Answer

A

Mode is the number or value which is repeated most often among all of the values.

Question 8

Q

What is standard deviation?

Answer

A

Standard deviation is the square root of the variance

Standard deviations (Std. Dev.) = √ (variance)

Question 9

Q

How do you calculate variance?

Answer

A

You can calculate the variance of a dataset by calculating the distances of values from the mean (e.g. the largest and smallest values in the dataset), and adding the results together, followed by dividing the number from the number of distances calculated.

In the case that there are negative values in the dataset in calculating distances from the mean, square them to make them positive before calculating distances.

Variance = Added distances / how many distances there are.

Question 10

Q

STATA can be used to run statistical tests when given a dataset, followed by variables and commands imputed. TRUE or FALSE?

Question 11

Q

Can STATA statistical software calculate mean, standard deviation, range, mode, median, and variance?

Answer

A

Yes it can, but you should still know how to calculate them all yourself.

Question 12

Q

When the variable ‘Age’ is selected in STATA, what is the command that should be used to calculate summary measures (Obs/Mean/Std. Dev./Min/Max)?

Answer

A

summarise Age

Question 13

Q

What command should be used in STATA to obtain more information following on summary measures (to find quartiles, median etc. rather than just mean/Std. Dev. etc.)?

Answer

A

summarise, Age, detail

Question 14

Q

If data presents in a graph as either positively or negatively skewed (not normally distributed), is finding the mean and standard deviation an appropriate measure?

Answer

A

No, median and inter-quartile range are more appropriate measures for data which is NOT normally distributed.

This is because skewed data shows the mean as either larger than the median (positively skewed/to the left) or smaller than the median (negatively skewed/to the right).

Question 15

Q

If data presents as normally distributed (distribution tail extended equally over both left and right sides) in a graph, is finding the mean and standard deviation an appropriate measure?

Answer

A

Yes, finding the mean and standard deviation is an appropriate measure for normally distributed data.

Question 16

Q

In positively skewed data (to the left of a graph), is the mean larger or smaller than the median?

Answer

A

The mean is larger than the median in positively skewed data.

Positively skewed data: mean > median

Question 17

Q

In negatively skewed data (to the right of a graph) is the mean larger or smaller than the median?

Answer

A

In negatively skewed data the mean is smaller than the median.

Negatively skewed data: mean < median

Question 18

Q

Name three main things which presenting data in graphs allows us to easily derive from the data.

Answer

A

Graphical representation of data enables us to get a feel for:
1. Typical (central) values and range of values
2. Shape and spread of the distribution of values
3. Interesting patterns and relationships in the data

Question 19

Q

Name two ways in which problems can be revealed in concern with data quality by using graphical displays (graphs) to present data.

Answer

A

Graphical displays can reveal problems concerning the quality of the data, including:
1. Identifying outlying / erroneous observations
2. Digit preference

Question 20

Q

Name three types of graph used in statistical analysis.

Answer

A

Bar charts
Histogrms
Line graphs

Question 21

Q

Name two types of tables used in statistical analysis.

Answer

A

Frequency tables
Cross tabulations (contingency tables)

Question 22

Q

What is the risk of having too few classes within your data set when using a histogram to present data?

Answer

A

If there are too few classes in the data set when using a histogram, it could be difficult to see any interesting patterns when the data is presented.

Question 23

Q

What is the risk associated with having too many classes within your data set when using a histogram to present data?

Answer

A

If there are too many classes when presenting data in a histogram, there may be only one observation per class as opposed to a group of observations. The number of observations per class should be no less than 2.

Question 24

Q

The optimal number of classes in a data set that is presented in a histogram ensures that interesting patterns are not unintentionally masked, unlike in the case that there are either too many or too few classes. TRUE or FALSE.

Question 25

Q

Is continuous data a type of quantitative data or categorical data?

Answer

A

Continuous data is a type of quantitative data.

Question 26

Q

Give an example of continuous data.

Answer

A

Any of the following:
- Blood pressure
- Age
- Concentration of a pollutant

Question 27

Q

Is discrete data a type of quantitative data or categorical data?

Answer

A

Discrete data is a type of quantitative data.

Question 28

Q

Give an example of discrete data.

Answer

A

Any of the following:
- Number of children (parity)
- Number of cigarettes per day
- Counts of death in small areas

Question 29

Q

Is ordinal data a type of quantitative data or categorical data?

Answer

A

Ordinal data (ordered categories) is a type of categorical data.

Question 30

Q

Give an example of ordinal data (ordered categories).

Answer

A

Any of the following:
- Grade of breast cancer
- Disease severity (mild/moderate/severe)
- Social class (I, II, III, IV, V)

Question 31

Q

Is nominal data a type of quantitative data or categorical data?

Answer

A

Nominal data (unordered categories) is a type of categorical data.

Question 32

Q

Give an example of nominal data (unordered categories).

Answer

A

Any of the following:
- Sex (male/female)
- Exposed/unexposed
- Ethnicity (white/asian/black/other)

Question 33

Q

What are factors?

Answer

A

‘Factors’ is the name often given to categorical covariate data.

Question 34

Q

What is dichotomous or binary data?

Answer

A

Categorical data which takes on only two distinct values is also known as dichotomous or binary data.

Question 35

Q

Categorical data can often be coded using numerical values. TRUE or FALSE?

Question 36

Q

Name a disadvantage that can present when using statistical packages to analyse coded categorical data.

Answer

A

It is important to declare that the data is categorical before running tests, as statistical packages will often treat numeric data (including coded categorical data) as quantitative unless explicitly declared as categorical.

Question 37

Q

Name one limiting factor of continuous observation.

Answer

A

One limiting factor of continuous observation is the accuracy of the measurement instrument.

Question 38

Q

It is possible to transform continuous data into categorical data in the case that the amount of detail provided by continuous data is not necessary. TRUE or FALSE?

Answer

A

TRUE

E.g. >2.5kg = 0 , and <2.5kg = 1
In a study of the effect of maternal smoking on birthweight, birthweight can be re-coded as shown above.

Question 39

Q

How can transforming data to a different scale sometimes be helpful?

Answer

A

It is sometimes helpful to transform data to a different scale to aid interpretation and/or statistical analysis.

Question 40

Q

Name a reason for transforming data.

Answer

A

Any of the following:
- To get improved approximation to normality
- To reduce skewness
- To linearise the relationship between two variables
- To make multiplicative relationships additive

Question 41

Q

Name a common transformation.

Answer

A

Any of the following:
- Natural logarithm (y = loge(x)  x = ey or exp(y), where e = 2.718…)
- Power transformations (y = x , y = x2 , y = x3 , etc.)

Question 42

Q

Which common transformation is the following example?

(y = x , y = x2 , y = x3 , etc.)

Answer

A

Power transformation (y = x , y = x2 , y = x3 , etc.)

Question 43

Q

Which common transformation is the following example?

(y = loge(x)  x = ey or exp(y), where e = 2.718…)

Answer

A

Natural logarithm (y = loge(x)  x = ey or exp(y), where e = 2.718…)

Question 44

Q

Name important things to check when displaying data in a spreadsheet - to ensure your data is ready for analysis

Answer

A

Coding - Check twice that your coding is correct (including identifying typos where you may have put in incorrect information or not typed a number correctly)
Check that relevant research data matches your findings
Compare your data with that of similar study cohorts, is it consistent?
Identify and develop methods on how you handle missing values

Question 45

Q

It is necessary to be able to distinguish between different types of data, such as continuous, discrete or categorical. TRUE or FALSE?

Question 46

Q

The most appropriate way to present data is dependent on the type of data. TRUE or FALSE?

Question 47

Q

What type of data are frequency tables most appropriate for?

Answer

A

Frequency tables are appropriate for all types of data.

Question 48

Q

What are two main tips for creating a good frequency table?

Answer

A

For quantitative data, it is important to think carefully about appropriate choice of classes/intervals to group data before display
Keep information in tables to the minimum necessary to convey the message you want to present (significant figures, number of variables/categories)

Question 49

Q

What type of graph is most appropriate for displaying categorical data?

Answer

A

Bar charts are appropriate for displaying categorical data.

Question 50

Q

What graphs are most appropriate for displaying quantitative data?

Answer

A

Histograms and box plots are appropriate for displaying quantitative data.

Question 51

Q

Which of the following statements is true for a positively skewed data?:
a) Mean = Median
b) Mean = Mode
c) Median < Mean
d) Median > Mean

Answer

A

c) Median< Mean

Question 52

Q

An appropriate summary measure for any skewed data is:
a) Mean and interquartile range
b) Mean and variance
c) Mean and mode
d) Mode and standard deviation

Answer

A

a) Mean and interquartile range

Question 53

Q

Daily death counts due to Covid-19 virus is a/an:
a) Continuous variable
b) Discrete variable
c) Ordered categorical variable
d) Unordered categorical variable

Answer

A

b) Discrete variable

Question 54

Q

Disease severity (mild/moderate/severe) is a/an:
a) Continuous variable
b) Discrete variable
c) Unordered categorical variable (nominal variable)
d) Ordered categorical variable (ordinal variable)

Answer

A

d) Ordered categorical variable

Question 55

Q

Which of the following is true for a negatively skewed data?
a) Median < Mean
b) Mean = Median
c) Median > Mean
d) Mean = Mode

Answer

A

a) Median > Mean in negatively skewed data

Question 56

Q

Which of the following statements is true for a positively skewed data:
a) Median < Mean
b) Median > Mean
c) Mean = Mode
d) Mean = Median

Answer

A

Median < Mean in positively skewed data

Question 57

Q

Which of the following statements is true for a normal distribution with the tail extended equally over both sides?:
a) Median and standard deviations are appropriate measure.
b) Median and interquartile range are appropriate measure.
c) Mean and standard deviations are appropriate measure.
d) Mean and interquartile range are appropriate measure.

Answer

A

c) Mean and standard deviations are appropriate measure.

Question 58

Q

Which of these subtractions gives the value of the interquartile range for a continuous variable?
a) 75th value - 25th value
b) Median - mean
c) Largest value - smallest value
d) Mean - median

Answer

A

a) 75th value - 25th value