Descriptive part 2 Flashcards
Three concepts that are traditional statistics
Measures of Central Tendency
Measures of Variation
Measures of Position
these are the statistical and parametric measurements of the data and how they are centered
Measures of Central Tendency
Measures of Central Tendency
mean, median, or mode
statistical and parametric measurement of how dispered the data are
Measures of Variation
describing the position of the data value in relation to the data set
Measures of Position
a characteristic or measure obtained by using the data values from a sample.
statistic
a characteristic or measure obtained by using all the data values from a specific population.
parameter
Population size
N
sample size
n
describes where the distribution may be ‘centered’
Measures of Central Tendency
Measures of Central Tendency or also known as…
Average
center of gravity
Mean
value in the middle
Median
most typical value
Mode
the average of the values: equal to the sum total of all values divided by the number of values
Mean
The central tendency that is affected by the presence of outliers in the data
Mean
________ letters are used to denote parameters
Greek
________ letters are used to denote statistics
Roman
What are the two types of mean?
parametric/population mean
statistical/sample mean
Mean: values in the data set are of the whole population.
parametric/population mean
parametric/population mean is represented by the greek letter
μ (mu)
Mean: values that comprise samples.
statistical/sample mean
statistical/sample mean is represented by the roman letter______
x̄ (x bar)
TWO WAYS OF COMPUTING THE MEAN
Mean for Ungrouped Data
Mean for Grouped Data
Mean: comes from the raw data
Mean for Ungrouped Data
ROUNDING RULE FOR THE MEAN
The mean should be rounded to one more decimal place than occurs in the raw data.
Mean: comes from the frequency distribution table
Mean for Grouped Data
the procedure for finding the mean for grouped data uses the _________ of the classes.
midpoints
the middlemost value
Median
the midpoint of the data array
Median
the symbol for the median
MD
obtained by sorting the values from lowest to highest and getting the value in the middle (halfway point)
Median
preferred to be used as a typical value (or center) than mean when distribution is skewed (outliers)
Median
most frequently occurring value in a data set, most typical
Mode
most descriptive when distributions are highly-peaked(leptokurtic), suggesting large concentration on a single value
Mode
one value occurs with the greatest frequency
unimodal
two values with the same greatest frequency
Bimodal
more than two values occurring at the same greatest frequency
Multimodal
no data value occurs more than once
No mode
a rough estimate of the middle
Midrange
found by adding the lowest and highest values in the data set and dividing by 2
Midrange
a very rough estimate of the average and can be affected by one extremely high or low value
Not reliable, as values in between data sets are not put into consideration
Midrange
What is the symbol used for midrange?
MR
find the mean of a data set in which not all values are equally represented
Weighted mean
Symbol for Mode
None
a single bar stands out
Unimodal
two bars stand out
Bimodal
more than two bars stand out
Multimodal
all bars are of the same height
No mode
describes the symmetry of a histogram
Skewness
right-side is mirror image of the left-side
Symmetric
asymmetric distribution and describing where tapering of the sides (tails) are different
Skewed
the right tail is longer; more values concentrated on the left (more lower values)
if the mean is greater than the median
Skewed to the right (positively skewed)
the right tail is longer; more values concentrated on the left (more lower values)
if the mean is greater than the median
Skewed to the right (positively skewed)
the left tail is longer; more values concentrated on the right (more higher
if the mean is less than the median
Skewed to the left (negatively skewed)
measures the dispersion of the data values, how flat or how peak the value the peak is
Kurtosis
heavy tails
Flat (Platykurtic)
sharp peaks
Highly-peaked (Leptokurtic)
rounded peak, symmetric tails
Bell-shaped (Mesokurtic)
measure the spread or variability of the values from each other
Measures of variation
measure the spread or variability of the values from each other
Measures of variation
What are the 5 measures of variation
range
variance
standard deviation
coefficient of variation
interquartile range (IQR)
simplest measure of dispersion, used to get a quick idea of the spread
Rangw
the difference between the highest and lowest value
range
rest of the values are not used in the calculation
waste of information
one of the weakest measures of dispersion
Range
average of the squared deviations of values from the mean
Variance
takes into consideration all of the values
measured in square of the original units
makes it a problem for interpretation
Variance
square root of the variance
measured in the unit as that of the data values
Standard variation
The symbol ‘__’ represents the population standard deviation.
σ
population variance is symbolically represented by
σ^2
ROUNDING RULE FOR SD
The rounding rule for the standard deviation is the same as that for the mean. The final answer should be rounded to one more decimal place than that of the original data.
an estimate of the population variance/standard deviation
Sample variance and SD
Sample variance is denoted by
s^2
standard deviation of a sample is denoted by
s
ratio of the standard deviation to the mean
Coefficient of Variation
used to compare the measure of spread between sets of data that are measured in different units
Coefficient of Variation
measures that describe the position or location of particular values along the cumulative distribution
Measures of position
they are sometimes useful for determining cut-off points for certain categories
Measures of position
3 types of measures of position
Standard Score (z Score)
Percentile (quantiles)
Quartiles and Deciles (quantiles)
number of standard deviation that a data value is above or below the mean
Standard score
Standard score is also known as
Z score
if a standard score is zero, then the data value is the same as the
Mean
in a normal distribution curve, _______ measures how far a value is from the mean
z-score
if z-score is 2, the value is 2 standard deviations away from the _____
mean
divide the data into 100 equal parts
Percentile
indicate the position of an individual in a group
Percentile
divide the data into 10 equal parts
Decile
divide the data into 4 equal parts
Quartile
the range of values bounded by the 25th and 75th percentiles (P25 and P75)
Interquartile Range
it gives information on the values of the middle 50% of the data
Interquartile range
the higher the IQR, the larger the _______- in the middlemost values of the data
variation
IQR formula
IQR = Q3-Q1
an extremely high or an extremely low data value when compared with the rest of the data values
Outlier
relatively less affected by outliers than a nonresistant statistic
resistant statistic
when a distribution is skewed or contains outliers, ______ may more accurately summarize the data than traditional
EDA - Exploratory Data Analysis
A ________ can be used to graphically represent the data set. These plots involve five specific values called the five-number summary of the data set
boxplot
a graph that show some of the most important statistics in the data set, specifically:
- the median (central tendency);
- P25 and P75 (location and variation)
- some extreme values (outliers)
Boxplot
it is a very versatile graph for showing distributions, comparisons and associations between variables
Boxplot