Introduction to Statistics Flashcards

1
Q

What is ‘statistics’?

A

Statistics are the study of a representative sample to find out what happens in an entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When do we use statistics? What is the objective?

A

When we want to study a characteristic and we can only access a sample of the population, not everyone on the planet. The objective is to explain the variability/ to reduce uncertainty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a ‘variable’?

A

A variable is a characteristic that varies. Why this variable varies is what we study in statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is probability theory?

A

This is the theory underlying statistics. The idea we have an educated guess for the population based on our sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do we use descriptive statistics?

A

To familiarise ourselves with the data. Make sure you do not have errors, mistypes or typos in your data. Data cleaning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a categorical variable?

A

Qualitative - words. Can be nominal or ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a numeric variable?

A

Quantitative - numbers. Can be discrete or continous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an Ordinal variable and give an example

A

Words, but ranked. For example, income: poor, average, wealthy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a Nominal variable and give an example

A

Words, with no order or rank. For example, gender, eye colour, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a binary variable and what is an ordinal variable?

A

Binary is when there are only TWO choices, e.g. gender male or female - can be ordinal or nominal but must be only two categories.
An ordinal variable is when there are more than two choices. It can be nominal or ordinal data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a discrete variable?

A

Distinct value. Whole numbers, not decimals or precise numbers. No ranking is required. For example, shoe size, Age 1yr/2yrs/3yrs etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a continuous variable?

A

Specific numeric value which is on a scale (interval) and can be ranked. For example, weight, height, IQ, BMI, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If we have categorical data, what kind of descriptive statistic should we use?

A

We are interested in the frequency, how many people are in each category. Hence, we need to create a frequency table. We can also create bar charts or a pie chart. This to visualise the data to see what we have.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In a frequency table, it will show us the ‘percentage’, the ‘valid percentage’ and the ‘culmuative perecentage’. What do each of these percentages mean and which one should we pay the most attention too?

A

The percentage, is the percentage including the missing values and outliers.
The valid percentage is the percentage removing the misisng values.
The culmuative percentage adds the percentages of each region from the top of the table to the bottom, culminating in 100%. This is more useful when the variable of analysis is ranked or ordinal, as it makes it easy to get a sense of what percentage of cases fall below each rank. (wikipedia)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If we use a frequency table and realise there is a missing value or a piece of incorrect data, what is our two options?

A
  1. remove and delete the missing or incorrect participant data (although loose information on all the other variables, so want to avoid) - Although remember to keep an original data document
  2. Fill in with the predicted value of what it is supposed to be
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What type of descriptive statistic would we use for numerical data?

A

Histograms and Box Plots. This is because we are interested in the VARIANCE, not the frequency. We would also look at the central tendacy measurements, and the measurements of dispersion.

17
Q

With numerical scale data, we are not interested in the frequency of the catergories, we are interested in ….

A

Variance

18
Q

What is the mean, median and mode?

A

Central tendancy measurements. The median is the middle value in the data, the mode is the most common value, and the mean is the average of all the values

19
Q

Why is the mean sometimes not useful?

A

The mean is not useful as if the range is really big it could skew the mean value. It also does not tell us about the distribution of the values.

20
Q

How do you work out the variance?

A

Caluclate the difference between the mean value and each individual value. The variance is the average of the values’ distance from the mean. This is how much the data varies from the mean

21
Q

How would you work out the standard deviation?

A

The standard deviation is the distance either side of the mean. This is the square root of the variance.

22
Q

What are the measures of dispersion?

A

SD, Variance and min/max values

23
Q

What is the interquartile values?

A

We need to order the values in order, then we split the values at 50%, and then 25% and 75%, the values at these percentages are the interquatile values

24
Q

What is a standardised value?

A

A standardized value is what you get when you take a data point and scale it by population data. It tells us how far from the mean we are in terms of standard deviations.

25
Q

What does the central limit theorem in probability theory mean?

A

Even if the events follow different distributions to the curve of distribition, given enough data they become normal, that is they end up to look bell shaped

26
Q

If the data is symmetrical you report the…

A

mean and SD

27
Q

If the data is asymmertical/skewked you report the…

A

median and range/min/max

28
Q

If the curve of distribution is skewed …… then it should go mode, median, mean

A

Positive

29
Q

If the curve of distribution is skewed …. then it should go mean, median, mode

A

Negative

30
Q

If a box plot’s 50% line is skewed to the left, what kind of skew is this?

A

Positive

31
Q

If a box plot’s 50% line is skewed to the right, what kind of skew is this?

A

negative

32
Q

What is the descriptive indices for:
A) Catergorical variables
B) Numerical variables

A

A) frequencies and proportions

B) Measurements of centeral tendacy and measures of dispersion

33
Q

What are the plots/displays for:
A) Catergorical variables
B) Numerical variables

A

A) pie charts, bar charts

B) Histograms, box plots

34
Q

Ordinal data can be considered either categorical or discrete numerical.
If there are less than 4 points -> treat them as ?????
If there are 5 or more than 4 points -> treat them as ?????

A
  1. categorical

2. numerical