Lesson 1: Data Collection and Presentation Flashcards
is the science of collecting, organizing, analyzing, and interpreting numerical data to assist in making effective decisions
Statistics
Uses the data to provide descriptions of the population, either through numerical calculations, graphs and tables.
Descriptive Statistics
makes inferences and predictions about a population based on a sample of data taken from the population in question.
Inferential Statistics
is a collection of all possible individuals, objects, or measurements of interest
Population
is a portion, or part of the population of interest
sample
the total number of things in the sample
sample size
Types of Variables
Quantitative and Qualitative
examples of Qualitative Variables
Brand of PC, Marital status, Hair color
Types of Quantitative variables
Discrete and Continuous
types of discrete variables
Children in a family, strokes on a golf hole, TV sets owned.
Types of continuous variables
amount of income tax paid, weight of a student, yearly rainfall in Tampa, FL
Result when a single variable is measured on an experimental unit
Univariate data
result when two variables are measured on a single experiment unit
Bivariate data
result when more than two variables are measured
multivariate data
Qualitative data are popularly summarized using
Bar graph and Pareto chart
is a simple technique
for prioritizing possible changes by
identifying the problems that will be
resolved by making these changes By
using this approach, you can prioritize
the individual changes that will most
improve the situation
Pareto Analysis
Quantitative data are commonly summarized using
Histogram and dotplots
What are the shapes of distribution?
Bell-shaped, Uniform, Right-skewed, Left-skewed, Bimodial, U-shaped
Bivariate quantitative data are summarized using
scatterplots
Multivariate quantitative data are summarized using
Scatterplots
Data collected over time are generally summarized using
Time-series plots
the data value located exactly at the centermost position when the data set is arranged in order.
Median
TRUE or FALSE: The median may be preferred to the mean if the data are highly skewed.
True
the most frequently occurring data value
Mode
If all the elements in the data set have the same frequency of occurrence, then the data set is said to have.
No mode
If the data set has one value that occurs more frequently than the rest of the values, then the data set is said to be.
unimodal
If two elements of the data set are tied for the highest frequency of occurrence, then the data set is said to be
bimodal.
measures the spread of the middle 50% of an ordered data set.
Interquartile Range
largest value – smallest value
Range
a way to measure how far a set of numbers is spread out.
Variance
the average amount of variability in your dataset
Standard Deviation
means that most of the numbers are close to the average.
Low standard Deviation
means that the numbers are more spread out.
High standard deviation
measures the distance between an observation and the mean, measured in units of standard deviation.
Z-score
divide a data set into 100 equal parts. It is simply a measure that tells us what percent of the total frequency of a data set was at or below that measure.
Percentiles
As the name suggests, quartiles break the data set into 4 equal parts.
Quartiles
steps in in constructing a frequency distribution table
- Find the range.
- Identify the number of classes using sturges rule
- Determine the interval size by dividing the range by the desired number of classes.
- Determine the class limits of the class intervals.
- Determine the midpoints by averaging the lower and the upper class limits of each class.
- Tally the frequencies for each interval then get the sum.
these are numbers defining the class consisting of the end numbers called the class limits (upper limit and lower limit)
Class Interval
shows the number of observations falling in the class
Class Frequency (f)
these are the so-called “true class limits”
Class Boundaries
– middle value of the lower class limit of the class and the upper class limit of the preceding class
Lower Class Boundary (LCB)
– middle value between the upper class limit and the lower limit of the next class
Upper Class Boundary (UCB)
the difference between two consecutive upper limits or two consecutive lower limits
Class size
– midpoint or the middle value of a class interval
Class Marks (CM)
what are the MEASURES OF CENTER OF DATA DISTRIBUTION
Mean, Median, Mode
what are the MEASURES OF VARIABILITY
RANGE & INTERQUARTILE RANGE, VARIANCE, and STANDARD DEVIATION
SIGNIFICANCE OF STANDARD DEVIATION
Chebyshev’s Theorem and The Empirical (Normal) Rule
what are the MEASURE OF RELATIVE STANDING
Z-SCORE