Intro to sraristics Flashcards
What are the two categories used to classify data?
Numerical and categorical
What are the two types of numerical variables?
Continuous and discrete
What are the two types of categorical variables?
Ordinal and nominal
Describe continuous variables.
When a continuum of values is possible. For example,height (m). E.g. 1.87m, 1.58m, 1.77m.
Describe discrete variables.
When only discrete values can be used (a whole number). For example, Number of people. E.g. 0, 1, 2.
Describe ordinal variables.
Categories that have an order. For example, size. E.g. small, medium, large.
Describe nominal variables.
Categories that have no order. For example, eye color. E.g. brown, blue, hazel.
What graph is most suitable to represent nominal data?
A Pareto chart.
What graph is most suitable to represent ordinal or discrete data?
A bar chart.
What graph is most suitable to represent continuous data.
A histogram or bar chart.
What are five ways the shape of the distribution of a histogram described?
- Symmetrical or bell shaped (uni-modal (one peak))
- Skewed to the left (left side is the tail)
- Skewed to the right (right side is the tail)
- Symmetrical and bi-modal (two peaks)
- Symmetrical and uniform (flat)
What are the three numerical summaries for center or location?
Mode, median and mean.
What are the three numerical summaries for spread?
Range, inter-quartile range (IQR) and standard deviation.
What is the mode?
The value that occurs the most.
What is the median?
The middle value located after the values are arranged from highest to lowest. Defined for ordinal,discrete and continuous data. If there are an even number of variables there can be two values for the median (M).
What is the mean?
The average.
What is range when measuring spread?
The difference between the largest value and the smallest value.
What are quarterlies?
When ordered data is divided into four equal quarters.
What is inter-quartile range when measuring spread?
Is simply the ranged spanned by the 1st quarter and the 3rd quarter.
What is the 1.5 IQR rule?
It identifies outliers determined if the values are lower (lower threshold) or higher (upper threshold) than the inter-quartile range when multiplied by 1.5 and measured from below Q1 and above Q2.
What is a 5-number summary?
A summary of data using the minimum, Q1, median, Q3 and the maximum.
How are 5-number summaries represented?
A boxplot
What is standard deviation?
Describes the variation about the mean.
Calculated
What is standard deviation?
Describes the variation about the mean.
Calculated by dviding the sum of squared diviants (value minus the mean)2 by the degrees of freedom (n-1) and finally square rooting that value.
Does correlation imply causation?
No.
What is correlation?
The strength of the linear relationship between two continuous variables x and y.