STAT MOD 1: Chapter 2 Flashcards

Graphical Methods for Describing Data Distributions

1
Q

Identifying the type of data (know for type of graphical display):

What is univariate?

A

data that consists of observations on a single variable made on individuals

  • Collected one variable on each subjects

Ex: height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Identifying the type of data (know for type of graphical display):

What is bivariate?

A

data that consists of pairs of numbers from two variables for each individual

  • Collected two variables

Ex: height and weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Identifying the type of data (know for type of graphical display):

What is multivariate?

A

data that consist of observations on two or more variables

  • Many variables for each subjects

Ex: height, weight, sit-ups, mile time, distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Identifying the type of variable:

What is categorical (qualitative)?

A

categorical or non-numerical responses

Ex: eye color, major, neighborhood, numbers that are labels are qualitative (room number, section number)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Identifying the type of variable:

What is numeric (quantitative)?

A

measurements take on numerical values

Ex: number of units, hours, minutes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the types of numeric variable?

A

1) Discrete
2) Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Identifying the type of variable:

What is discrete?

A

variable that are isolated points along a number line, counts of items

Ex: stop lights, units, number of jobs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Identifying the type of variable:

What is continuous?

A

variable that can be any value in a given interval

Ex: time, height, weight, age

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Know how to choose appropriate type of display for given data/variable type

A

Refer to graphical display chart with notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is distribution?

A

describes how often the possible responses occur

  • How often the variable takes on a value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is frequency?

A

a listing of all categories along with their frequencies (count)

  • How many people are in your count (pivot table)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is relative frequency?

A

a listing of all categories along with their relative frequencies expressed
- as a proportion (between 0 and 1)
- percent (between 0 and 100)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What charts do you use for categorical variables?

A

bar charts, comparative bar charts, pie charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you get the frequency of a bar chart?

A

if asking for frequency of a certain category, you need take height of bar and divide by size of dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are characteristics of pie chart?

A
  • the area corresponds to the fraction in the category
  • need to add up to 100%
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What graphical displays do you use for small data sets?

A
  • dot plots
  • stem and leaf displays
  • comparative stem and leaf
  • split stem and leaf (used when there’s a lot of leaves in a category)
17
Q

What graphical displays do you use for large data sets?

A
  • histogram
  • comparative histograms
18
Q

What must you be able to interpret from these graphical displays (dotplot, stem and leaf display, histograms)?

A

center, spread, shape, gaps, outliers

19
Q

How do you construct a histogram (general)?

A
  • Create bins
  • Count how many of your data observations fell into each bins
  • frequency/relative frequencies are plotted on y-axis of histogram
20
Q

Shape of distribution:

What is symmetric?

A

distribution looks similar on both sides

  • Bell-shape
21
Q

Shape of distribution:

What is skewed right?

A

more data on the left side of the distribution, so the right tail is longer

  • Positive skew (has longer tail or outliers on right side)
22
Q

Shape of distribution:

What is left skewed?

A

more data on the right side of the distribution, so the left tail is longer

  • Negative skew (has longer tail or outliers on left side)
23
Q

What is mode? What are the types?

A

the most frequent value of a data set

unimodal - shape in which there is one prominent peak

bimodal - shape in which there are two prominent peaks

24
Q

Give examples of variables you would expect to have each shape.

A
  • symmetric (scores)
  • right skewed (income)
  • left skewed (grade)
25
Where is skewness situated?
Skewness is in direction of longer tail, where outliers fall
26
How do you construct and interpret a scatter/time plot?
- Two numeric variables for one unit (one dot) - Be able to recognize trends (going up or down) for time plot
27
What are some poor practices that produce misleading plots?
- violating the area principle - Have the appropriate type of plot for data type (refer to chart) - Unequal time spacing in time series plots - Graphs with broken axes (axes that don’t start at 0) - vertical axis should never be broken for bar charts/histograms
28
What is the area principle?
area should be proportional to frequency/relative frequency/magnitude of the number being represented Ex: bars, rectangles are good whereas money bags to represent money
29
What are broken axes?
graphs that axes that don't start at 0 - acceptable for scatter plot but not acceptable for time series plot (exaggerate the magnitude of change over time)