descriptive statistics lec Flashcards
Observationofonevariablemay be shown visually by putting the variable’s on one axis and putting the frequency on the other.
Visual Presentation of Data
A bar graph wherein the number of units observed is on the y-axis (_____) while the measurement levels are on the _____
frequency; x-axis; histogram
in histogram the bars are..
The bars are visually proportional to each other.
A figure that is shorthanded
presents a histogram.
Frequency Polygon
A ___ is placed at the center of the top of the bars and connected to form a polygon. This better ennuncuates the data shape.
dot
Basic graphs that can illustrate one or more data sets in one graph.
Line Graph
two types of line graph and its difference
-Arithmeticlinegraphs
○ Have both x and y-axes on
an arithmetic scale.
○ Both values are numerical.
● Semi-logarithmic line graph ○ Has the y-axis as a
logarithmic axis
Parameters of a Frequency Distribution
central tendency and dispersion
Frequencydistributionsfrom continuous data are defined by types of descriptors, known as _____.
parameters
● Defined as the value used to represent the center or the middle (average) of a set of data values.
● Locates observations on a measurement scale.
Central Tendency
● Describes the spread of values in a given data set.
● Suggests how widely spread out the observations are.
Dispersion
dispersion prefers…
Prefers low values, low variance, low standard deviation = not spread out data, results are not far from each other.
Measures of Central Tendency
mean, median, mode
Average value or the sum (Σ) of all
the observed values (𝑥𝑖) divided
mean
has the most mathematical
properties and most representative of the dataset if not for our outliers.
mean
The middle observation data when
data has been arranged from ______. When the dataset is an even number (hence no natural middle point), the two middling variables are averaged to find a median.
highest to lowest, median
Rarely used to make inferential conclusions from, but is used frequently in-healthcare and economics.
median
Most commonly observed value
(the value most frequently
observed).
mode
The downside to using the mode
a set of data may have no mode, or it may have more than one mode.
Measures of Dispersion
Variance and sd, mean deviation
A statistical measurement of the
spread between numbers in a data
set.
● It measures how far each number
in the set is from the mean (average), and thus from every other number in the set.
Variance
formula for variance
lamo ne yen
sample, degree
offreedom
N-1
Average amount of variability in
your dataset.
● It tells you, on average, how far
each value lies from the mean.
Standard Deviation
A high standard deviation means
that values are generally far from the mean,
low standard deviation means
values are clustered close to the mean.
di erence between the observed value of a data point and the expected value is known as deviation in statistics.
Mean Deviation
the average deviation of a data point from the mean, median, or mode of the data set.
mean deviation or mean absolute deviation
Values that split sorted data or a probability distribution into equal parts.
Quantiles
A statistical term that
describes a division of observations into four
defined intervals based on the values of the data and how they compare to the entire set of observations.
Quartiles
A type of quantiles, obtained by adopting a subdivision into 100 groups.
Percentiles
Calculated by dividing an ordered set of data into 100 equal parts.
percentiles
Di erence between the highest and lowest values.
○ Size of the narrowest interval which contains all the data.
range
○ Di erence between the
third and the first quartile.
○ Size of the narrowest
interval which contains all the data.
InterquartileRange
A measure of the
asymmetry of a
distribution.
skewness
A distribution is
asymmetrical when
its left and right side are not mirror images.
T/F: A distribution can have right (or positive), left (or negative), or zero skewness.
true
other term for skewness
horizontal imbalance
A descriptive statistic used to help
measure how data disperse between a distribution’s center and tails, with larger values indicating a data distribution may have “heavy” tails that are thickly concentrated with observations or that are long with extreme observations.
Kurtosis
other term for Kurtosis
vertical imbalance
Xi means
FOr each individual observation
Xi means
Or each individual observation
The di erence between the observed value of a data point and the expected value is known as
Deviation
Uses boxes and lines to depict the distributions of one or more groups of numeric data.
Box plot
indicate the range of the central 50% of the data, with a central line marking the median value.
Box plot