Descriptive Statistics(2) Flashcards
variable
variable: a population characteristic
which takes on different values for
the elements comprising the
population
define:
population
sample
parameter
statistic
population: the total set of elements (objects, persons, regions,
neighbourhoods, rivers, etc.) under examination in a particular study
sample: a subset of the elements in a population, which is used to make
inferences about certain characteristics of the population as a whole
parameter: a quantity that defines a certain characteristic of a population
- If you take the average of a POPULATION that is a PARAMETER
statistic: a quantity that defines a certain characteristic of a sample
two ways to present descriptive statistics-describe both
tables are great for summarizing large quantities of complex data, but
can be a challenge to read and interpret
-frequency tables
graphs can be used for simpler datasets, and are easily interpreted by
the reader
..Difference between bar graph and histogram
-bar graph is good for categorical(nominal/ordinal) data
-gaps are between in bar graph because they are distinct bars
-histogram has continuous data on the bottom, for interval and ratio data
-Boxplot is not used to often because it isn’t nice to look at and doesn’t convey much
natural break points
.natural break points: can be where frequency is 0 or low
Frequency Tables-5 important things to not when making one
- use intervals with simple bounds
- respect natural breakpoints
- the intervals must not overlap and must include all observations
- all intervals should be the same width
- select an appropriate number of classes
• this is hardest to determine
a histogram must have _____ breaks
equal
Line Graphs Vs Scatterplots
.Line is used for categorical data
-you CANNOT use the line to guess values, only for seeing pattern
.Scatter is used for interval/ratio with continuous data
The rose diagram is used for
o directional data has its own specific visual descriptive – the rose
diagram
-.By making intervals of wedges increase to the outside, it makes more sense as the wedges are bigger towards the outside
CENTRAL TENDENCY:
define each one below
midrange
mode
median
mean
o midrange: the midpoint between the largest and smallest values of a variable
in the data set
-the midrange is strongly affected by extreme values
o mode: the value of the most common/frequent value of a variable in the data
set
-what is the mode of a data set with no repeating values? No Mode
=the midrange and mode are crude statistics, and often do not provide an accurate
measure of centrality
o median: the value of a variable that divides the observations in half
o mean: the average value of a variable in the data set
Arithmetic Mean:
population mean symbol
sample mean symbol
u with extended vertical line at front
x with horizontal line overtop
Geometric Mean
notice that the
values are not evenly influenced(weighted differently) – we need a geometric mean
Arithmetric Vs. Geometric Mean
o the arithmetic mean is used when each data point has the same influence or
“weight” as all the other points
o the geometric mean is used when each data point has an associated frequency,
influence, or weight attached to it, such that some data points are more important
than others
in geometric mean the f(i) stands for the …
weight of the value
Ranges
ex: $0-10000
we need to start making assumptions – first, assume that the midpoint
of each range is a suitable option (is it always?)
.then we can determine the geometric mean
Which measure of central tendency is best?
o the centrality statistic should represent the typical value of the data set
o only the mean considers all of the values in the data set; the other statistics
only rely on specific values
o if you change any value in the set, the mean will also change
o usually, the mean is considered the best because of this property, but there
are some exceptions