Unit 1: Exploring One-Variable Data Flashcards

1
Q

Categorical Variable/Qualitative Data

A

A variable that takes on values that are category names or group labels. (Think: WORDS)
(EX: Dominant hand, name, college degree)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Quantitative Variable/Data

A

A variable that takes on numerical values for a measured or counted quantity. (Think: NUMBERS)
(EX: Age, height, count)
Can be discrete or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Frequency table

A

gives the number of cases in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Relative frequency table

A

gives the proportion of cases in each category(percentage)
[Note: percentage, relative frequency, and rates provide the same information as proportions]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Bar Chart/Graph

A

A graph to display counts or proportions for a categorical variable only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Pie Chart

A

A chart to display proportions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Discrete Quantitative Variable

A

A variable that can take on a countable(finite or countably infinite) number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Continuous Quantitative Variable

A

A variable that can take on infinitely many values, but those values cannot be counted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Dot Plots

A

Best for discrete variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Steam and Leaf plots

A

Stem: the number(s) on the left of the plot and the number (EX: stem of 34 is 3)
Leaf: the number on the right of the plot and the number (EX: leaf of 34 is 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Histogram

A

[NOT a bar graph]
gives a discrete Y-value but a continuous X-axis due to the bars connecting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Population

A

The collection of all individuals or items under consideration in a statistical study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sample

A

part of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Inferential statistics

A

Drawing and measuring the reliability of conclusions about a population based on information obtained from a sample of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Skewed Right

A

More data on the right with a left tail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Skewed Left

A

More data on the left with a right tail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Symmetric Data

A

A distribution that is symmetric (peak in middle(unimodal) or peaks on each side(bimodal))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

uniform data

A

The data is all the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

census

A

Information for the entire population of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Sampling

A

How to obtain an appropriate subset of people/items from the population. There are 2 types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

[SRSWR] Simple random sampling with replacement

A

Where a member of the population
can be selected more than once

22
Q

[SRS] Simple random sampling without replacement

A

Where a member of the population can be selected at most once.

23
Q

Statistic vs Parameter

A

Statistic: value from sample
Parameter: value from population

24
Q

Systematic Random Sampling

A

Elements from a larger population are selected at regular intervals after choosing a random starting point.

25
Q

Cluster Sampling

A

A population is divided into groups/clusters, and entire clusters are randomly selected for study. Often used when a population is too large or widely spread out for SRS.

26
Q

Stratified Sampling

A

A population is divided into subgroups (strata) based on a shared characteristic, and then a random sample is taken from each stratum. Focus on the representation of groups.

27
Q

Mean X̅

A

Average of a dataset
Affected by outliers
Will move towards the tail in a skewed graph

28
Q

Median

A

Middle value of an ordered dataset
AKA 2nd Quartile
Not affected by outliers
[Formula: (n+1)/2]

29
Q

Percentile

A

Percent of data values less than or equal to a certain value
EX: The p-th percentile means that p% of the data falls below that value.

30
Q

Standardized Score / Z-Score

A

A measure of how many standard deviations a data point is from the dataset of any shape(not exclusive to normal distribution)

31
Q

Normal Distribution

A

Bell-curved
Middle is the μ
σ standard deviations
Emperical rule(68%-95%-99.7% rule)

32
Q

How to read a Z-score table

A

Y-Axis: Ones and tenths place of z-score
X-Axis: hundredths place of z-score
The result shows the percentage area of the distribution.

33
Q

Skewed Right/Positive Shape Properties

A

Right tail
Highest bar on the left
Mean > Median

34
Q

Skewed Left/Negative Shape Properties

A

Left tail
Highest bar on the right
Mean < Median

35
Q

Symmetrical Shape Properties

A

Highest bar in the middle
Mean ≈ Median ≈ Mode

36
Q

Mean formula

A

x̄ = Σxᵢ / n

37
Q

Median formula

A

Middle of data

38
Q

Range formula

39
Q

IQR formula

A

IQR = Q₃ − Q₁

40
Q

Outlier rule

A

Outlier IF:
x < Q₁ − 1.5×IQR
OR
x > Q₃ + 1.5×IQR

41
Q

Standard deviation (sample) formula

A

s: sample standard deviation
xᵢ: any possible outcomes
x̄: Sample mean
s = √[ (Σ(xᵢ − x̄)²) / (n − 1)]

42
Q

Z-score formula

A

x: random variable
z = (x - μ) / σ

43
Q

Chebyshev’s Theorem

44
Q

Empirical Rule

A

A rough estimate of the standard deviation
s ≈ range / 4

45
Q

SOCS

A

Shape, Outliers, Center, Spread

46
Q

Socs

A

Shape: Symmetric, Skewed, Uniform, and Bell-shaped

47
Q

sOcs

A

Outliers: values outside of an overall pattern

48
Q

soCs

A

Center: The median or mean of a curve

49
Q

socS

A

Spread/variability: scope of values from smallest to largest

50
Q

Cluster

A

A subgroup of which values fall into based on category(age range, school, sex, tax rate)