Chapter 2-Summarizing Data Flashcards

1
Q

Non linear

A

Plots that have not visible sign of curvature or straight lines in the data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Scatter plot

A

A plot that shows each data point in an experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dot plot

A

A one variable scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mean

A

Also commonly referred to as the average. It is a way to measure the distribution or spread of the data. Compute by adding up all the individual data points and dividing by the number of observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Histogram

A

A type of plot that groups the data into categories and shows frequency within the categories. Provide view off data density. Useful for showing shape of the data distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data density

A

Frequency of data in a given category. What you see in a histogram.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Right or Left Skewed

A

When data trails of (diminishes) towards the right, it is right skewed.

When the data is less on the left, long thinner trail on the left, it is left skewed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mode

A

Prominent peak in the distribution. Can be unimodal (1peak), bimodal (2peaks), or multimodal (greater than 2 peaks)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Deviation

A

How far an observation/data point is from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Standard deviation

A

How far away the typical data point/observation is away from the mean. This varies from deviation in that it looks at the “typical” observation while deviation looks at the individual datum. Derived from the square root of the variance.

About 70% data within 1 standard deviation and 95% within 2 standard deviations, not a hard rule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Variance

A

Average squared distance from the mean.

Sum of all deviation)^2/(#observations-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Box plot

A

Summarizes data set using 5 statistics (median, interquartile range, first quartile, third quartile, whiskers) while also plotting unusual observations, outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Median

A

Splits the data in half. Often confused with the mean. It takes the value from the observation that lands in the middle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Interquartile range

A

The middle 50% of the data/observations. IQR for short.

IQR=Q3-Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

First quartile

A

The first 25% of the observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Third Quartile

A

The 75th percentile our the last 25% of the observations

17
Q

Whiskers

A

The data inside the box within 1.5xIQR

18
Q

Outliers

A

Data bound the whiskers, 1.5xIQR

Purposes:

  • ID strong skew in the distribution
  • ID possible data collection or data entry errors
  • insight into interesting properties of the data
19
Q

Transformations

A

Rescaling the data using functions to more readily make statistical models without destroying the statistical integrity

20
Q

Intensity Map

A

A map with colors to show variations of intensity

21
Q

Contingency Table

A

Summarizes data for two (categorical) variables, each value represents #of times that particular combination of variables outcomes occurred

22
Q

Stacked bar plot

A

Graphical display of contingency table.

Most useful when one variable =explanatory and one variable =response.

23
Q

Pie chart

A

A bar plot representing contingency table data useful for giving a high level overview

24
Q

Side-by-side box plot

A

Traditional tool for comparing across groups

25
Q

Null hypothesis (H-sub0)

A

Represents status quo; general statement or default position where there is no difference between two measured phenomena or that the two samples derive from the same general population

Ex: H-sub0 : p >0.6

26
Q

Alternative Hypothesis (H-subA)

A

Position which states something is happening, a new theory is preferred instead of an old one.

Ex: H-subA : p < 6