descriptive statistics Flashcards

1
Q

<!--StartFragment-->

Tools for Continuous data<!--EndFragment-->

A

—Graphs
—Histograms
—Cumulative Relative Frequency plots (ogives)
—Stem and leaf plots
—Boxplots
—Line chart for data against time
—Middle – mean, median
—Spread – variance, range, quartiles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

types of data

A

—categorical or numerical

—Other sorts of information – e.g. comments in interview/survey – qualitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does categorical data consist of

A

—nominal

ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does numerical data consist of?

A

—discrete (count data) or continuous (also called interval)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is nominal data?

A

type of categorical data

faculty of study, eye colour, job, no order,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is ordinal data?

A

type of categorical data
—there is order: rank teaching as poor/fair/good/verygood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

tools for categorical data

A

—Graphs
—Bar charts (different to histograms – you can mix up order of columns and it still makes sense)
—Pie charts (use with caution)
—Numeric
—Mode – most frequently occurring observation
—Frequency of each category
—Examples in Minitab

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are different types of modality a histogram can show?

A

—uni-modal, bi-modal, tri-modal, multi-modal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

bimodal histogram

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how to interpret histogram?

A
  • *—Modality** – uni-modal, bi-modal, tri-modal, multi-modal
  • *—Modal class** – class with highest number of observations (“modal class is centred at approximately…”)
  • *—Skewness vs symmetry**—

Could choose instead a relative frequency histogram: replace frequency for each class by

class frequency/total number of obs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a disadvantage of using a histogram?

how may this be resolved?

A

—actual observation is lost by being grouped

—The stem and leaf display is an attempt to get similar information as a histogram, but without losing the actual observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

steps to creating a stem and leaf plot

A

—Step 1: split each observation into a stem and a leaf, e.g.
—If observations are (1.2, 1.5, 2.9 ….); stem = unit; leaf = decimal
—If observations are (42.1, 38.4, 53.8….); stem = tens, leaf = units (or unit-decimal)

—Step 2: write stems in left column; put leaves in right column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

features of stem and leaf plot

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is Cumulative Relative Frequency Distribution

A

—Relative frequency distribution histogram – proportion in each class
—Cumulative relative frequency distribution – proportion up to and including that class
—Ogive – graph of cumulative relative frequencies; also called empirical cumulative density function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does cumulative density function look like?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how to calculate length of whisker?

A

1.5 x interquartile range

17
Q

how to calculate location of percentiles in boxplot?

A
18
Q
A
19
Q

Which measure of centre is best?

for continuous data

A

Mean generally most commonly used; but it is sensitive to extreme values
—If data skewed/extreme values present, median better, e.g. real estate prices (we say the median is robust to outliers)
—Mode generally best for categorical data – e.g. restaurant service quality (below): mode is very good. (ordinal)

20
Q

differences between population and sampel

A

—Populations have parameters – certain true values which describe them. E.g. if we measure every individual, we can calculate the exact average and exact variance of the population. These are called population parameters.

a sample we calculate statistics, or estimates of the parameters. Use sample statistics to approximate population parameters. E.g. from a sample we can estimate the true population mean by using the same mean; if we have a sample standard deviation, we can estimate the population standard deviation.

21
Q

calculate covariance and correlation

A
22
Q

correlation formula for population and for sample

A