Why do we need statistics? Flashcards

Question 1

Q

What is data?

Answer

A

Observations that have been collected e.g. measurements

Question 2

Q

What is a population?

Answer

A

The COMPLETE collection of subjects studied

Question 3

Q

What is a sample?

Answer

A

A SUBCOLLECTION of subjects from the population

Question 4

Q

What is random sampling?

Answer

A

Sampling so that each subject in the population is EQUALLY likely to be selected

Sample mean will be biased if subjects are not sampled randomly from the population

Question 5

Q

What is replication?

Answer

A

Repeats of the experiments to identify the degree of variation and confidence in results

Question 6

Q

What are the two different types of data?

Answer

A

Quantitative data: Data which can be counted, discrete or continuous e.g. temperature, height, lengths

Qualitative data: Data which can be separated into different categories and are distinguished by some non-numeric characteristics

Question 7

Q

What is the difference between nominal data and ordinal data?

Answer

A

Nominal: CANNOT be ordered e.g. yes or no , different colours, male or female

Ordinal: CAN be ordered e.g. short, medium, tall or none, few, many

Question 8

Q

What is the difference between observational studies and experimental studies?

Answer

A

Observational studies: Just monitoring= Not changing the patients, which allows us to say something about the correlation. There is no attempt to modify the subjects being studied

Experimental: Apply one or more treatments and observe their effect

In general, it is easier to infer causation from experimental study whereas observational studies tend to only reveal correlations

Question 9

Q

What is a parameter?

Answer

A

Measurement describing some characteristic of a POPULATION (what we would ideally like to know)

μ= population mean

Question 10

Q

What is a statistic?

Answer

A

Measurement describing some characteristic of a SAMPLE = Used as an estimate (what we find out from the study)

x̅= Sample mean

Population parameter is estimated by the sample statistic

Question 11

Q

What is a sampling error?

Answer

A

The difference between the statistic and the parameter

Question 12

Q

What are histograms used for?

Answer

A

X axis: Classes of data
Y axis: Frequency of data or portion of the data in the category

Portion: Number of observations in the category/ Total number of observations and the heights of the categories will sum to 1

Indicate variation, can be used to visualise continuous data

Question 13

Q

What are the different ways of measuring the centre of the data?

Answer

A

Mean

Median: Middle value of the data when arranged in order- It is robust from large data that does not fit the trend

Mode: Value that occurs most frequently
Bimodal: Two different values with the same greatest frequency
Multimodal: If more than two values occur with the same greatest frequency

Question 14

Q

What are the different ways of measuring similarity or dissimilarity in the data?

Answer

A

Range: The difference between the smallest and largest value in the data

Sample variance: Measure of squared distance of each data point from the mean

Question 15

Q

How do you measure the sample variance?

Answer

A

FORMULA FOR IT

Add together all the data values - the sample mean and divide by the number of data - 1

s² is a STATISTIC and high values= high dissimilarity in the data

σ² is a PARAMETER = If all members of the population are measured then it is the population variance and you divide by n instead of n - 1

s² can estimate σ²

Question 16

Q

What is the sample standard deviation?

Answer

A

Square root of the sample variance

Sample variance has units different to the data whereas the sample standard deviation has the same units

s= Sample standard deviation 
s²= Sample variance

σ= Population standard deviation 
σ²= Population variance

Rule of thumb: About 95% of the data should lie within 2 standard deviations from the mean

Question 17

Q

What does statistical inference make use of?

Answer

A

Information from a sample to draw conclusions about the population from which the sample was taken

Problem: Wish to know something about a population of interest, but cannot examine every member of the population

Question 18

Q

How can you increase the likelihood of the sample mean being a better estimate?

Answer

A

1) Increase sample size

2) The variation in observed is low

Question 19

Q

What test statistic would you carry out if a population mean was claimed to be μ and you knew that the population mean is?

Answer

A

t statistic

If the data really were generated from a distribution having mean μ
then the test statistic is consistent with the t-distribution

Expect t to be small (positive or negative)

Question 20

Q

What do parametric and non-parametric test assume?

Answer

A

Parametric: Some aspect of the data has a normal distribution

Non-parametric: Usually do not make such strict assumptions about the shape of the data= Less likely to detect patterns in the data, therefore less likely to reject the null even if it is false