Lecture 2: descriptive statistics Flashcards

1
Q

What are the goals of analysis?

A

1- To summarise data from a sample included in an experiment or observational study
2- To test hypothesis, and make interferences to the larger population from which a sample was drawn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the types of statistical analysis?

A

Descriptive statistics

Inferential statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is descriptive statistics?

A

Methods used to summarise or describe the main features of a collection of data

Describe the characteristics of a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Inferential statistics?

A

Methods used to make inferences from the sample to the larger population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the mothers of Descriptive statistics?

A

Graphical techniques- Diagrams : Histograms, box-and-whisker plots, scatterplots, bar charts, pie charts

Numerical techniques- Summary Statistics: Mean, standard deviation, range, median, inter-quartile range (IQR), mode, frequencies, percentages (incl. incidence, prevalence, risk, odds)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of Diagrams are used for Numerical data in Descriptive Statistics?

A

–Histogram
–Box-and-whisker plots (boxplots) for comparison by a categorical variable (e.g. sex)
–Scatterplots – relationship between two interval variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What type of Diagrams are used for Categorical data in Descriptive Statistics?

A

–Bar charts, pie charts

–Clustered or stacked bar charts for comparison by a second categorical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What type of Diagrams are used for Numerical data?

A
  • Histogram (for continuous data)
  • Box-and-whisker plots (boxplots)
  • Scatterplots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the characteristics of a Normal distribution?

A

–Symmetrical or bell-shaped

–Exactly half of the values are to the left of the center and the other half to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the characteristics of a Skewed distribution?

A

–Asymmetric distribution
–Right or positive skew – extreme values to the right
–Left or negative skew – extreme values to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the 5-number summary used in Box-and-whisker plot ?

A
Minimum = Min
1st Quartile= Q1
Median= Q2
3rd Quartile= Q3
Maximum = Max
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Boxplots are useful for?

A

Comparing groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Scatter plots are useful for?

A

Showing correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the Diagrams used for Categorical data (and Quantitative discrete)?

A
  • Bar charts
  • Clustered or stacked bar charts
  • Pie charts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the simplest way to present data?

A

By using Frequencies (counts) or percentages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the different ways you can display frequencies and percentages?

A

Table
Bar chart
Frequency distribution
Pie chart

17
Q

What are the two preferred methods for numerical summaries in Descriptive Statistics?

A
  • Measures of central tendency

* Measures of dispersion/spread

18
Q

What are Measures of central tendency?

A

Also know as AVERAGES

Used to identify the “centre” around which data are distributed.
–Mean: arithmetic average
–Median: middle value of a data set
–Mode: most frequently occurring value

19
Q

What is the Mean?

A

Arithmetic average

Mean=sum of data point/ number of data points

20
Q

What is the Median?

A

Middle value of a data set

Divides the data into 2 equal sets

  • If there is an odd # of elements, median is the middle number
  • If there is an even # of elements, median is the average of 2 middle numbers
21
Q

What is the Mode?

A

Most frequently occurring value

22
Q

What does Numerical descriptive statistics measure?

A

Measures of central tendency
–Mean
–Median
–Mode

23
Q

The choice of summary measure is determined by?

A

The distribution of the data

24
Q

In a symmetric distribution, mean and median?

A

Are the same

25
Q

If median and mean are different, this indicates that?

A

The data are Skewed

26
Q

What are Measures of variability/dispersion?

A

The spread of the distribution - how widely the observations are spread out around the measure of central tendency

27
Q

What are the commonly used measures of dispersion to indicate how spread-out the data is?

A
  • Range (min , max)
    –Interquartile Range IQR (the 25th and 75th percentiles)
    –Standard Deviation SD (measure of variability around the mean)
28
Q

What is the Range?

A

The difference between the highest and lowest value

Max - min

29
Q

What is the cons if the Range?

A

Not very representative

30
Q

What is the Interquartile range- IQ?

A

Splits ordered data into 4 quartiles and measures the range covered by 50% of the distribution.

IQ= Q3 - Q1

31
Q

What is the standard deviation?

A

Average difference of all data points from the sample mean

32
Q

What is the Empirical rule?

A

For data with symmetric shape, the standard deviation has the following characteristics:
•68% of sample data falls within ± 1 SD
•95% of sample data falls within ± 2 SD

33
Q

For data with symmetric shape, the standard deviation has the following characteristics?

A
  • 68% of sample data falls within ± 1 SD

* 95% of sample data falls within ± 2 SD

34
Q

Empirical rule example
•At a 1 year review weight loss meeting, the mean weight lost by patients was 10kg

•The standard deviation of the group was calculated to be 2.5kg

A

Therefore:
± 1 S.D
= 10kg ± 2.5kg
= 7.5kg and 12.5 kg

Therefore, we can state that 68% of patients lost between 7.5 and 12.5kg

± 2 S.D
= 10kg ± 2(2.5kg)
= 5kg and 15 kg

Therefore, we can state that 95% of patients lost between 5kg and 15kg

35
Q

What kind of distribution it is when Mean & standard deviation are present?

A

Normal distribution

36
Q

What kind of distribution it is when Mercian & Interquartile range are present?

A

Skewed

–Left skew – extreme values on the left
–Right skew – extreme values on the right