Exam 1 (Modules 1-3) Flashcards by Brittany Harden

What should be avoided in constructing “good” graphs?

minimize white space, avoid clutter on graph, avoid 3D effects

How well did you know this?

Not at all

Perfectly

Determine the five-number summary

(put data set in ascending order): minimum, Q1, Median (Q2), Q3, maximum

How well did you know this?

Not at all

Perfectly

Define “statistics”

Science of collecting, organizing, summarizing, and analyzing information. To describe and understand sources of variation in data.

How well did you know this?

Not at all

Perfectly

Define “the lurking variable”

“Correlation does not equal causation!”

How well did you know this?

Not at all

Perfectly

Define “statistic”

numerical summary of a SAMPLE (Roman letters)

How well did you know this?

Not at all

Perfectly

Define “descriptive statistics”

organizing and summarizing data (numerical summaries, graphs, tables)

How well did you know this?

Not at all

Perfectly

Define “inferential statistics”

take result from a sample, extend it to the population, and measure the reliability of the result

How well did you know this?

Not at all

Perfectly

Define “parameter”

numerical summary of a POPULATION (Greek letters)

How well did you know this?

Not at all

Perfectly

Discrete variable

quantitative variable that has either a finite number of possible values OR a countable number of possible values. *Count to get the value. EX: number of pets, number of college credits, number of seats in an auditorium

How well did you know this?

Not at all

Perfectly

Continuous variable

quantitative variable that has an infinite number of possible values that are not countable. *Measure to get the value. EX: distance, total rainfall, age, data use on a cell phone per month

How well did you know this?

Not at all

Perfectly

The Process of Statistics

1) Identify the research objective (what questions need to be answered?). 2) Formulate the research question (with at least 1 variable). 3) Collect the data needed to answer the question(s). 4) Describe the data. 5) Perform inference.

How well did you know this?

Not at all

Perfectly

Define “statistical thinking”

using statistics to analyze and critique information you come across, in order to be an informed consumer of information

How well did you know this?

Not at all

Perfectly

Qualitative variable

contains a classification system for its variable values. May be text or numeric. EX: gender, zip code, nationality, phone number, numbers on team shirts

How well did you know this?

Not at all

Perfectly

Quantitative variable

the variable values are a numerical range that can be added or subtracted to provide meaningful results. Equal interval magnitude scale. Can be discrete OR continuous. EX: height, weight

How well did you know this?

Not at all

Perfectly

Frequency distribution

lists each category of data and the number of occurrences in each category of data. Frequency column = number of observations.

How well did you know this?

Not at all

Perfectly

Relative frequency

proportion/percent of observations within a category. RF = frequency / number of observations

How well did you know this?

Not at all

Perfectly

Pareto chart

Study These Flashcards

bar graph whose bars are drawn in decreasing order of frequency or relative frequency

Classes

Study These Flashcards

categories in which data are grouped (i.e., 25-34, 35-44). Class width = difference between consecutive lower class limits.

Class width value (CWV)

Study These Flashcards

CWV = (largest data value - smallest data value) / number of classes (between 5-20)

Describe what can make a graph misleading or deceptive

Study These Flashcards

scale of the graph, inconsistent scale, misplaced origin (aka not starting at 0), use of 3D effects

What makes a “good” graph”?

Study These Flashcards

Not too much white space, avoid “prettifying,” avoid 3D effects

3 characteristics of distribution

Study These Flashcards

shape (bell-shaped, skewed), center (average value), spread (how far data goes from average value)

Population arithmetic mean

Study These Flashcards

(u - mu; N = size of population). u = (x1 + x2 + … xN) / N

Sample arithmetic mean

Study These Flashcards

(x-bar; n = size of sample). x-bar = (x1 + x2 + …xn) / n

Median ("typical value")

(n = number of observations). If n is odd, M = (n + 1) / 2 ||| If n is even, M = [(n/2) + (n/2 + 1)] / 2

Resistant

numerical summary of data is resistant if extreme values (very large or very small) relative to the data do not affect its value substantially. EX: median, quartiles, IQR = resistant, mean = NOT resistant

Mode

most frequent observation of the variable that occurs in the data set. "No mode," "bimodal," or "multimodal" (not usually reported). Only measure of central tendency that can be determined for nominal data (i.e., location of injuries)

Define "dispersion"

the degree to which the data are spread out

Range

the difference between the largest and the smallest data value. Simplest measure of dispersion. NOT RESISTANT

Standard deviation

Typical spread from the mean. The farther the observation is from the mean, the larger the [absolute value of] deviation. xi - u = deviation about the mean for the ith value of a population. xi - x-bar = deviation about the mean for the ith value of a sample.

Population standard deviation

(o~) = square root of ((sum xi**2) - [(sum xi)**2 / N]) / N

Sample standard deviation

(s) = square root of ((sum xi**2) - [(sum xi)**2 / n]) / n-1

Variance

SQUARE of the standard deviation (so the answer BEFORE you take the square root for the standard deviation formula)

the Empirical Rule

*bell-shaped only. 68% = 1 standard deviation. 95% = 2 standard deviations. 99.7% = 3 standard deviations.

Chebyshev's Inequality

*any shape graph. AT LEAST (1 - 1/k**2) x 100% of the observations lie within k standard deviations, where k > 1.

The variance of a population is the arithmetic average of the squared deviations about the population mean. (T/F)

TRUE

z-score

represents the distance that a data value is from the mean in terms of the number of standard deviations. Can be positive, negative, or zero. Provides a way to compare apples to oranges.

z-score formulas (population & sample)

population: z = (x - u) / o~ ||| sample: z = (x - x-bar) / s

five-number summary

minimum, Q1, M (Q2), Q3, maximum

find lower and upper fences

lower = Q1 - 1.5(IQR) ||| upper = Q3 + 1.5(IQR)

Which variable has more dispersion? Why?

Variable y the interquartile range of variable y is larger than that of variable x.

Exam 1 (Modules 1-3) Flashcards

(41 cards)