MMW FINALS Flashcards
Arrangement of raw data into class
intervals and frequency
Frequency Distribution Table
either nominal or ordinal level of
measurement
Qualitative Data
Vertical bars that have no gaps
because of class boundaries
Histogram
Five-number summary
Minimum, lower quartile, median, upper quartile,
maximum
Box-and-Whisker Plot
Used when there are extreme values in the dataset,
Outliers
utliers lie beyond the ranges of values and can be determined by using:
lower quartile – 1.5IQR
upper quartile + 1.5IQR
presents the score values and their frequency of occurrence. When presented in a table, the score values are listed in rank order, with the lowest score value usually at the bottom of the table.
frequency distribution
The steps for constructing a frequency
distribution of grouped scores are as
follows:
- Find the range of the scores.
Range = Highest Score – Lowest Score - Determine the tentative number of classes (K).
𝐾 =1+[3.322(log 𝑁 )] - Determine the width of each class interval (i).
𝑖= 𝑅
𝐾 - List the interval, placing the interval containing the lowest score
value at the bottom. - Tally the raw scores into the appropriate class intervals.
- Add the tallies for each interval to obtain the interval frequency.
Displays data by using bars of equal
width on a grid. The bars may be vertical or horizontal. Bar graphs are used for comparisons.
Displays a bar for each category with
the length of each bar representing the frequency of that category.
Bar Graph
ordered from
highest to lowest
frequency.
A Pareto Chart
to show how data represent
portions of one whole or
one group.
Circle Graph (Pie Chart)
Notice that each sector is
represented by %
Circle Graph (Pie Chart)
joined by line segments to
show trends over
time.
Broken Line Graph
which points on
the line between the plotted
points also have meaning.
Sometimes, this is a “best fit”
graph where a straight line is
drawn to fit the data points.
Continuous Line Graph
Notice that the
independent variable is
on the x-axis, & the
dependent is on the y-
axis.
Continuous Line Graph
Uses pictures and
symbols to display data;
each picture or symbol
can represent more than
one object; a key tells
what each picture
represents.
Pictograph
A graph of data that is a set of points.
Scatter Plot
IQR (Interquartile Range)
Q1 – Lower Quartile
Q3 – Upper Quartile
A value that “lies outside” (is much smaller or larger than) most of the other values in a set of data
outlier
One way to determine if a data point is an outlier is to use the interquartile range (IQR) method.
Lower Boundary : Q1 – 1.5 IQR
Upper Boundary : Q3 + 1.5 IQR
formula Variance
S2 =
Where:
x – scores
- mean
n – number of samples
formula Standard Deviation
S =
Where:
x – scores
- mean
n – number of samples
The variance can be found by following these four steps
- Find the mean.
2.Subtract the mean from
each of the five
samples/observations. - Squaring these deviations
from the mean - Taking the average of these
squared deviations.
These are unit-less and are used
when one wishes to compare
the scatter of one distribution
with another distribution.
Measures of Relative Dispersion
It measures how many standard
deviation is above or below the
mean.
Standard Score
It is computed as
and the sample counterpart is
Standard Score
occurs when the values of variables appear at regular frequencies and often the mean, median, and mode all occur at the same point. If a line were drawn dissecting the middle of the graph, it would reveal two sides that
mirror one other.
symmetrical distribution
lack of symmetry
can be right skewed distribution or left skewed distribution
asymmetric distribution
is a measure or a criterion on how asymmetric the distribution of data is from the mean.
Skewness
is a method developed by Karl
Pearson to find skewness in a
sample using descriptive statistics
like the mean and mode.
Pearson Coefficient of skewness
Symmetrical distribution and mode occur when the values of variables occur at regular frequencies and the mean, median at the same point
Symmetrical distribution
(or right-skewed) distribution is
a type of distribution in which most values are clustered around the left tail of the distribution while the right tail of the distribution is longer
a positively skewed
is a type of distribution in which more values are concentrated on the right side (tail) of the distribution graph while the left tail of the distribution graph is longer.
Negatively skewed
is a measure of whether the data are heavy- tailed or light-tailed relative to a normal distribution
Kurtosis
data sets with high kurtosis
Data sets with low kurtosis
tend to have heavy
tails, or outliers
tend to have
light tails, or lack of outliers.
indicates a positive excess
kurtosis. The leptokurtic distribution
shows heavy tails on either side,
indicating large outliers.
Leptokurtic
shows a negative excess
kurtosis.
platykurtic distribution
reveals a distribution with flat tails
kurtosis
The characteristic of a frequency distribution that ascertains its symmetry about the mean is
called.
skewness
means the relative pointedness of the standard bell curve, defined by the frequency distribution.
Kurtosis
is a measure of the
degree of lopsidedness in the
frequency distribution.
Skewness
is a measure of degree of
tailedness in the frequency
distribution
kurtosis
is an indicator of lack of
symmetry, i.e. both left and right sides of the curve are unequal, with respect to the central point.
Skewness
As against this,___ is a measure of data, that is either peaked or flat, with respect to the probability
distribution.
kurtosis
Continuous probability distribution
* Uses interval or ratio level of
measur
Normal Distribution (z-distribution)
Normal Distribution is a unique
arrangement of values in that if the
values are graphed, the curve takes a
distinct bell-shaped and symmetrical
form.
Normal Distributio
CHARACTERISTICS OF NORMAL
DISTRIBUTION
- The curve is continuous.
- The curve is bell-shaped.
- The curve is symmetrical about the mean.
- The mean, median, and mode are located at the
center of the distribution and are equal to each
other. - The curve is unimodal.
- The curve never touches the x-axis
(asymptote). - The total area under the normal curve is equal
to 1.
Is a point in the distribution such that is a given number of cases is below it.
Is a measure of relative standing.
It is a descriptive measure of the
relationship of a measurement to the rest of the data.
PERCENTILES
He was an influential English mathematician
and biostatician.
Karl Pearson (1857-1936)
It is a statistical method used to determine
whether a relationship between two variables
exists.
It also measure of the direction and strength of
linear relationship between two variables.
Direction maybe positive, negative or zero.
Correlation
3 Types of Correlation
Positive correlation
Negative correlation
Zero correlation
exists when high
values of one variable correspond to high values in the other variable or low values in one variable correspond to low values in the other variable.
positive correlation
exist when high
values of one variable correspond to low
values in the other variable or low values in
one variable correspond to high values in the
other variable
negative correlation
exists when high values
in one variable correspond to either high or
low values in the other variable.
zero correlation
can be perfect, strong or high, moderate, low, zero or no correlation.
Strength
A _ used to show how each point collected from a set of bivariate data are scattered on the Cartesian
plane
scatter plot (or scatter diagram) i
It gives a good visual picture between the two
variables.
It is a graphical representation of the
relationship between two variables.
scatter plot
The most widely used in statistics to measure the degree of the relationship between the linear
related variables.
Pearson Product-Moment Correlation
The _ would require both variables to be normally distributed
Pearson r correlation
observed the value against their frequency
histogram
observed values against normally distributed data
q-q probability plots
if the data is normally distributed, the points in a q-q plot will lie on a straight diagonal line
q-q probability plots
are typically not very
useful when the sample size is small.
Graphical methods
One of the most popular tests for normality assumption diagnostics which has good properties of power and is based on correlation within given observations and
associated normal scores
Shapiro-Wilk Test
(Numerical method)
The sample data follows a normal distribution
Ha: alternative hypothesis.
The sample data does not follow a normal distribution.
Ho: null hypothesis
When we are testing normality
- If P-value is greater than the alpha, it means that the data are normal.
- If P-value is less than the alpha, it means that the data are NOT normal.
correlation coefficient and strength of relationships
0.00 no correlation, no relationship
±0.01 to ±0.20 very low correlation, almost negligible relationship
±0.21 to ±0.40 slight correlation, definite but small relationship
±0.41 to ±0.70 moderate correlation, substantial relationship
±0.71 to ±0.90 high correlation, marked relationship
±0.91 to ±0.99 very high correlation, very dependable relationship
±1.00 perfect correlation, perfect relationship
To identify if there is a significant
relationship between.
- Step 1. State the null and alternative
hypotheses. - Step 2. Determine the value of alpha.
- Step 3. Identify the test statistics.
- Step 4. Determine the degrees of freedom, computed t-value, and critical t-value.
- Step 5. If the computed t is greater than or equal to the critical value of t then reject the null hypothesis.
If the computed t is less than the critical value of t then accept the null hypothesis. - Step 6. Formulate your conclusion and interpretation
if there is a significant relationship between two set of scores.
t-test
This tells us how much of dependent variable () is due to or can be attributed to independent variable
This is denoted as r^2.
Coefficient of Determination
is one type of fee paid for the use of money
Simple interest
the percentage charged or earned
rate of interest
the amount of money borrowed or invested
principal
money is borrowed or invested (in years)
time
amount formula
P + I = A
formula for Rate of Interest
p+I=
after that, substitute from the both side
It is calculated on the principal amount and also on the accumulated interest of previous periods, and can thus be regarded as “interest on interest”.
Compound Interest
compounding frequency:
annually
semi annually
quarterly
monthly
number of compounding periods:
1
2
4
12