Week 1 Flashcards
Data
Morsel of information describing a population
Primary Data
Data you or your organization has collected.
Secondary Data
Data that somebody else has collected and made available for others to use
Parameter
Data that looks at the entire population and describes an interesting attribute.
Statistic
Data that describes something interesting about population.
True Zero Point
Something with a true zero value does not exist when its numerical value is zero. (ex., $0)
Population
Entire group of things being studied or calculated
Sample
A subset of a population, a small group being studied where the results will be attributed to the entire group.
Qualitative Data
Data that uses descriptions.
Quantitative Data
Data that uses numerical values.
Nominal Level of Measurement
Qualitative data. Descriptive data.
Examples:
- Male or Female - Dog Breeds - State Names
Ordinal Level of Measurement
Qualitative Data. Descriptive observations that can be ranked.
Examples:
- 1st, 2nd, and 3rd - Education Level Reached (High School, College, etc.) - Star Rating (*, **, ***, ****)
Interval Level of Measurement
Quantitative data. Ranked but with even distribution between levels. Does not have a true zero.
Examples:
- Temperature - IQ Score - Letter Grades (A, B, C, D, or F)
Ratio Level of Measurement
Quantitative data. Has a true zero point.
Examples:
- Amounts - Length - Height - Distance - Price
Time Series Data
Looks at data for one population over a spread of time (years, months, days, etc.)
Cross-Sectional Data
Data that compares data from several populations during one specific moment in time
Descriptive Statistics
Summarizes data and facts. Does not look to draw conclusions.
Inferential Statistics
Arrives at new conclusions regarding populations. Creates new information.
Class
Category, What each bar represents in a bar graph
Continuous Data
Data points (number, rank, etc.) that continue across a graph without break. (ex., 1-under 5, 5-under 6).
There are no gaps between bars in a histogram when the data is continuous.
Culmulative Percentage Polygon (Ogive)
A line graph charting the cumulative relative frequency distribution of a population
Culmulative Relative Frequency Distribution
Takes frequency distribution chart and calculates accumulating percentages of classes. (ex. 1=0.10, 1&2 = 0.35, 1&2&3= 0.80, 1&2&3&4=1.0)
Discrete Data
Data points are amounts usually. They cannot contain decimals and percentages. In histograms, the bars do not touch.
Frequency Distributions
Data organized into a chart without calculations.
Histogram
A frequency chart in graph form
Percentage Polygon
A histogram drawn as a line graph. Usually compares multiple populations.
Relative Frequency Distribution
Takes a frequency distribution chart and calculates the percent of each class. (ex. 1=0.10, 1&2 = 0.25, 1&2&3= 0.45, 1&2&3&4=0.20)
Symmetrical Distribution
When the left and right sides of a histogram mirror each other. Bell curve.
Clustered Bar Chart
| _ _ | _ |O| |X| | |X||O| |X| _ | |X||O| |X||O| --------------------
Bar Chart
| | |X| | |X| |X| | |X| |X| ---------------
Horizontal Bar Chart
| |XXXXXX |XXXX |XXXXXXXXX |
Stacked Bar Chart
| | | X X | O X X | O O O -----------------------
Line Chart
| / | /\ / | /\/ \ / | / \/ ----------------
Scatter Plot
| . . . . | . . . . | . .. | . . -------------
Dependent Variable
Placed on the vertical axis of a scatter plot.
Independent Variable
Placed on the horizontal axis of a scatter plot
Pie Chart
Relative frequency distribution proportionally put into a circle graph that resembles a pie
Pareto Chart
Used in Quality Control.
Contains a bar chart and a line chart.
Contingency Table
A frequency distribution type chart, but one that contains multiple classes for multiple populations
| Right Hand | Left Hand | Total ----------- |---------------- |---------------- |-------- Males | 43 | 9 | 52 ----------- |---------------- |---------------- |-------- Females | 44 | 4 | 48 ----------- |---------------- |---------------- |--------- Totals | 87 | 13 | 100
Stem & Leaf Display
You take the [tenth] number of a multiple digit number and follow it with the [ones]
(61,72,60,60,78,63,74)
6|0013
7|248
Categorical Data
Data classified into categories.
Examples:
- Male or Female - Married, Single, or Divorced - Yes or No
Index Point
Determines the position of the median.
Left Skewed Distribution
Where the mean is larger than the median
| X | X X | X X X | X X X X ----------------------
Mean
The average of the data set.
Measures of Central Tendency
One number that gives us the central point of the data. There are many options for this number (mean, median, etc.)
Median
Midway point in data set. Arrived at by placement, not numerical value
Mode
The number (or data point) most repeated. It’s possible to have a data set with two mode (bimodial).
Outliers
Extremely high or low data points
Right Skewed Distribution
Where the median is higher than the mean.
| X | X X | X X X | X X X X ---------------------
Weighted Mean
Average of a data set, but where some of the data points are given more weight than others. (Like a class grade.)
Measures of Variability
Shows how spread out (or not spread out) the data set is.
Range
The numerical difference between the highest data point and the lowest data point.
Standard Deviation
The square root of the variance. It gets rid of all negative numbers and gives each data point a “score.”
Variance
Measurement of how spread out the data is. Different Methods can be used.
Coefficient of Variation
Ratio between standard deviation and the mean. The lower number indicates that a data set has better consistency between data points.
z-Score
Tells us the ranking of a particular data point. (Ex., Your GRE test score against all GRE test scores.)
Midpoint
When a class contains a numerical spread (like 20-25 year olds vs. 26-30 year olds), this gives you the class average so you can make calculations with the data.
Box and Whisker Plot
A line is drawn showing the spread of entire data set. A box is drawn around the spread of data between Q1 & Q3. The box is split with a vertical line showing Q2. Then, a dotted line is horizontally drawn on both sides of the box representing the IRQ. If there are any outliers, these are notated with asteriks.
Five Number Summary
Min Q1 Q2 Q3 Max
Interquartile Range (IRQ)
Q3 - Q1
Measures of Relative Position
Compares one particular data point against the entire data set.
Percentiles
Measure the approximate percentage of data values below the value of interest.
Percentile Rank
Tells you what percentile you fall into
Pth Percentile
Any number between 1-100 where at least P% of the data falls below P
Quartiles
Divides data into quarters.
Central Tendency
Tells the center point of data set