Chapter 2 Flashcards
What are the two types of variables?
1) Categorical. 2) Quantitative.
Categorical Variable
These variables that sort and organize the data into different categories or classifications are called categorical variables.
T/F One categorical variable will have multiple categories
True.
Name the variable and the categories for the following:
sex(male or female.)
Sex is the categorical variables. Categories are female and male.
Quantitative variables
Numerical data that gives us some quantity is called quantitative.
T/F All quantitative variables are numeric, but not all numeric variables are quantitative.
True. If you can take an average, its probably quantitative. If not, its likely categorical.
What are the two types of quantitative variables?
1) Continuous
2) Discrete.
Continuous variables
Can take on any value within an interval, and there is no theoretical distance between two observable values. Ex: Height, Weight, Temperature.
Discrete Variable
Has a minimum distance between its measurements. 4 births, 5 births, 6 births. Number of students in a class.
Frequency
Simply means the number of occurrences of a categorical variable. 12 students had red hair.
Bar Graph
Displays categorical varibles. Categories on horizontal axis. Frequencies on the vertical axis.. Height of rectangles are equal to the category’s frequency.
Pie Chart
Displays categorical data in terms of percentages of a circle.
What are the weaknesses of pie charts.
If a subject can fall into more than one category, the chart can sum up to more than 100 percent. If there are too many categories, it can be difficult to determine which slices are relatively larger.
Contingency Table
Shows how two categorical variables relate to each other. Country and regional food preferences.
Conditional Distribution
Restricts categorical variables to show distribution for just the cases that satisfy a certain condition. For instance murdered and not murdered.
Dot Plot
Most simplistic way to display quantitative data. Displays dots which represent values and stack up on values on the x axis.
Histogram
A graph that uses bars to portray the frequencies of the possible outcomes of a quantitative variable. Y is observations within a certain range. X axis ranges of values a variable can take. IQ frequencies for 7th graders.
Distribution
The layout of a quantitative datasets. The curve of distributions
Modalitity
Mode is the high point(s) or most frequent observations.
Unimodal
Has only one high point
Bimodal
Two high points.
Uniform
Relatively flat data with no peaks.
Symmetric
Quantitative distributions are close to mirror images of each other
Skewed Left
Distribution has a long left tail. Many high observations but few low observations.
Skewed right
Long right tail. Many low observations, but few high observations.
Mean
Sum of the observations divided by the number of observations.
Median
The point that splits the data in two half of the observations are below it, half of the observations are above it.
T/F The mean is resistant to extreme values of skewness, while the median is sensitive to extreme values in the data
False, the MEDIAN is generally resistant to extreme values in the data. The MEAN is sensitive to extreme values in the data set.
Range
The difference between the smallest and largest observations. No resistant to extreme values by definition.
Standard Deviation
How far observations in the data set typically fall from the mean. Distance of a data point from the mean.
Variance
Square of the standard deviation.
T/F Standard deviation and variance are not resistance to outliers
True.
Standard deviation pample
s
Standard deviation population
o
Variance Sample
s^2
Variance Population
o^2
Sample Size
n
Population Size
N
Empirical Rule
If a distribution is b ell shaped then approximately:
1) 68 percent of the observations fall within one standard deviation of the mean.
2) 95 Percent of observations fall within 2 standard deviations of the mean.
3) Nearly all observations fall within 3 standard deviations of the mean.
The pth percentile
The percentage of observations such that p percent fall at or below that value. 90th percentile SAT.
Quartiles
25 percent is the first quartile.
50 percent second quartile (median
75 percent is the third quartile.
Interquartile Range
Distance between Q1 and Q3
T/F The interquartile Range is not resistance to skewness and extreme values
False, it is resistant to skewness and extreme values because it only looks at values surrounding the median.
Outleier
Is an unusually small or unusually large observation.
How do you flag a observation as being an outlier.
An observation is an outlier if the following are true. Q1-1.5 less than or Q32+1.5IQR greater than.
Outlier Check 2
Any observation that falls 3 standard deviations above or below the mean.
Five number summary of a data set.
1) minimum value.
2) First quartile,
3) Median
4) 3rd quartile
5) Maximum value.
Box Plot
Graphical depiction of the 5 number summary.
T/F Use mean and standard deviation only for reasonably symmetric distribution
True. Otherwise use five-number summary.
Match the Following:
Standard Deviation Mean
IQR Median
Use mean and stanard deviation together since they are not resistant to skewness and extreme values. Use only for unimodal and symmetric data sets. Use median and IQR for skewed or non unimodal data. Resistant to skewness and extreme values.
Sid by SIde Boxplots
Compare two or more groups on a quantitative variable.
Z Score
The score of an observation tells us how many standard deviations the observation falls from the mean.
z= observation-mean/standard deviation- observation-x/s
Neg if below mean pos it above mean. If greater than 3 or less than -3 then it is an outlier.
Variable
any characteristics observed in a study.
Pareto Chart
Bar graph from highest to lowest frequency.
Pareto Principle
That a small subset of categories often contain the most observations.
Time Series
Data set collected over time
Time Plot
Charts each observation on the vertical scale against the time it was measured on the horizontal scale. Ex Population over time.