Unit 1 - Exploring One-Variable Data Flashcards by Gary Zheng

How can statistics be used to help answer important, real-world questions based on data that vary?

Collect data
Analyze data
Interpret results

How well did you know this?

Not at all

Perfectly

Individuals

May be people, animals, or things described by a set of data

How well did you know this?

Not at all

Perfectly

Variable

Characteristic that changes from one individual to another

How well did you know this?

Not at all

Perfectly

Cateogrial variable

Take values that are category names or labels

How well did you know this?

Not at all

Perfectly

Quantitative variable

Takes numerical values for a measured or counted quantity

How well did you know this?

Not at all

Perfectly

Not all variables that take numerical values are

quantitative

How well did you know this?

Not at all

Perfectly

It is possible to make a quantitative variable categorical by

grouping values

How well did you know this?

Not at all

Perfectly

How can we represent categorical data in tabular form

With a frequency table or relative frequency table

How well did you know this?

Not at all

Perfectly

How does these tabular representations help us describe categorical data?

Counts & relative frequencies of categorical data reveal information that can be used to justify claims about data in context

How well did you know this?

Not at all

Perfectly

Frequency table

gives the number of individuals or counts in each category

How well did you know this?

Not at all

Perfectly

Relative frequency table

Gives the proportion or percent of individuals (cases) in each category

How well did you know this?

Not at all

Perfectly

How to represent categorical data

Bar chart
Pie chart
Frequency/ relative frequency table

How well did you know this?

Not at all

Perfectly

Making bar charts for categorical data

Label axes
Scale axes
Draw bars

How well did you know this?

Not at all

Perfectly

Label axes

Variable name on horizontal axis

Frequency/ Relative frequency on vertical axis

How well did you know this?

Not at all

Perfectly

Scale axes

Category labels spread out along horizontal axis

Start scaling vertical axis at 0 and go up in equal increments until you equal or exceed maximum frequency or relative frequency

How well did you know this?

Not at all

Perfectly

Draw bars

Make the bars equal in width and leave gaps between them

Heights of the bars represent the category frequencies or relative frequencies

How well did you know this?

Not at all

Perfectly

Pie charts

Include legend or key to indicate what each part means

How well did you know this?

Not at all

Perfectly

Relative frequencies can make it easier to compare

distributions of data with different number of parts

How well did you know this?

Not at all

Perfectly

Charts can be made to be based off of whatever variables is

stronger or more supportive of situations

How well did you know this?

Not at all

Perfectly

Discrete variable

Usually involves counting

Variable that can take on countable numbers of values with gaps

How well did you know this?

Not at all

Perfectly

Example of discrete variable

number of siblings

How well did you know this?

Not at all

Perfectly

Continuous variable

Usually involves measuring

Variable that can take on infinitely many values

How well did you know this?

Not at all

Perfectly

Example of continuous variable

Height

How well did you know this?

Not at all

Perfectly

Dotplot

Shows every single point in a data set
Easy to see shape of distribution
May be difficult to make for large data sets

How well did you know this?

Not at all

Perfectly

Stem/ Leaf plot

Shows all points in a data set | Easy to see shape

Histogram

Easier for large data sets Easy to see shapes Lose the single point in a data set

4 factors to consider when describing distribution

Shape Unusual Features Center Variability

Shape

``` Symmetric -> about same data on both sides Skewed left -> more data on high end Skewed right -> more data on low end Unimodal -> one peak Bimodal -> two peak Uniform -> same data across ```

Unusual Features

Outliers or gaps/clusters

Center

Which value in distribution best describes the typical response?

Variability

Are values in distribution packed close together?

Mean

Sum of all data values divided by number of values

Median

Middle value of an ordered data set (odd number of values) Average of the two middle values of an ordered data set (even number of values)

First quartile is median of first half of ordered data set

Third quartile is median of second half of ordered data set

Range

Difference between the max and min

Interquartile Range

Difference between third and first quartiles

Standard Deviation

Typical distance that each value is away from the mean Xs = square root (1/n-1 * sum of(observed value - mean)^2)

Square of standard deviation is

variance

What summary stats can be used to describe the center of a distribution of quantitative data?

Mean Median Q1 and Q3

What summary stats can be used to describe the variability of a distribution of quantitative data?

Range IQR standard deviation

What is the 5 number summary?

``` Max Min Median Q1 Q3 ```

How do we use the 5 number summary to make a boxplot?

Use it to split data into quartiles

In a skewed right distribution, how does the mean and median compare?

mean > median

In a skewed left distribution, how does the mean and median compare?

mean < median

In a symmetric distribution, how does the mean and median compare?

mean = median

Boxplot

Shows the 5 number summary and outliers Splits the data into quartiles Does not show every individual value Can hid certain features of the shape of a distribution

how can we determine if a value in a data set is an outlier

Less than 1.5 IQR below Q1 or more than 1.5 IQR above Q3 2 ore more standard deviations away from the mean

Which summary statistics are resitant and whicha re not

Resistant - median or IQR Nonresistant - mean, SD, rnage

Which measures of center & variability are best for describing a skewed distribution?

Median | IQR

Which measures of center & variability are best for describing a symmerticdistribution?

Mean | SD

Low outlier

< Q1 - 1.5 IQR OR < mean - 2SD

High outlier

> Q3 + 1.5 IQR OR < mean + 2SD

What are important characteristics to discuss when comparing distributions of quantitative data?

``` think SOCS Shape Outlier / Unusual features Center Spread/ Variability ```

What is needed for a complete response when comparing distributions of quantitative data?

Address the 4 important characteristics Use comparative words Include context

Percentile

Percent of data values less than or equal to a given value

How to interpret the percentile

"The value of _____ is at the pth percentile. About (p) percent of the values are less than or equal to _____"

Standardized score

Calculated as data value - mean / standard deviation

How to interpret the z-score

"The value of _____ is (z-score) standard deviations above or below the mean."

Percentiles and z-scores can be calculated for

Distributions with any shape

If a number repeats, use the last value of the repeated number to

calculate the percentile. Exp - 2 2 2 2 2 3 4 Use the 5th "2"

Normal distribution

Mound-shaped and symmetric Determined by mean and standard deviation

Many quantitative variables in the real world can be modeled by

normal distribution

Within 1 SD of the mean, about

68% of the data exists

Within 2 SD of the mean, about

95% of the data exists

Within 3 SD of the mean, about

99.7% of the data exists

Empirical rule

68-95-99.7

How can we use the z-score to find the percent of data values in a given interval for a normal distribution?

Calculate a z-score and then use Table A

How can we use z-score to find the percent of data values in a given interval for a normal distribution?

Left : get area from Table A Right : 1 - area from Table A Between: subtract two areas from Table A

How do we find a value, given an area from a normal distribution

Use Table A to find z-score | Set up equation and solve

Unit 1 - Exploring One-Variable Data Flashcards

(70 cards)