midterm 1 Flashcards
what are the 4 meanings of the word “statistics”?
1) data
2) functions of data (ex: mean and range)
3) techniques for collecting, analyzing and interpreting data for subsequent decision making
4) the science of creating and applying these techniques
4 categories of users of statistics
- people who need to be able to read/understand statistical presentations used in their field. *
- people who need to select, apply, and interpret statistical procedures in their work. *
- applied statisticians
- mathematical statisticians
population
a collection of all people, objects, or events that share one or more specified characteristics
(ex from ch3 hw: white women students in the university)
concrete population
population that has a finite number of elements (all the people listed in a telephone directory)
conceptual population
population that has an infinite number of elements (flipping a coin for eternity)
element
an element is a single person, object, or event of the population (ex from ch 3 hw: a white women student)
*there is either a finite or infinite number of elements within a population
observation (datum)
the observation is the number or label used to represent an element of the population.
-this is a measurable characteristic of the element
(ex from ch 3 hw: career ambivalence)
sample
a subset of a population
descriptive statistics
tools for depicting or summarizing data so that they can be more readily comprehended
(reduces data into a comprehendible number, and induction)
induction
a process in which the researcher reasons from a sample to draw a general conclusion about a population
inferential statistics
tools for inferring the properties of one or more populations by inspecting samples of the population
(makes it easier to draw conclusions despite variations in observations)
variations can be due to:
4 possibilities
- inherent variability in the phenomenon being observed/differences among participants
- errors of measurement
- undetected changes in conditions
- combination of the 1st 3
random sampling
method of drawing samples from a population so that every possible sample has an equal chance of being selected
non random sampling
method of sampling based on haphazard or purposeless choices (this is not good for making inferences about a population)
large sample of a population
the larger the sample, the closer it becomes to resembling the population.
variable
characteristic that can take on different values
represented by a letter @ end of alphabet
range- set of elements for which the variable stands
value- each element of the range
constant
characteristic that does not vary
represented by a letter @ beginning of alphabet
range of a constant consists of a single element
2 classifications of variable
qualitative variable and quantitative variable
qualitative variables
symbol whose range consists of attributes or non quantitative characteristics of people, objects or events
-these variables are mutually exclusive, and collectively exhaustive
(can be ordered or unordered)
ordered variables
categories suggest an order or rank
(freshman, sophomore, junior, senior)
unordered variables
categories do not suggest an order or rank
(race categories or gender( men and women)
quantitative variables
symbol whose range consists of a count or a numerical measurement of a characteristic.
these can be discrete or continuous
discrete variables
range can assume only a finite number of values or a number of values that is countably infinite. (measurements are exact)
continuous variables
range is uncountably infinite (measurements are approximations)
what are the four levels of measurement? (keyword levels)
nominal
ordinal
interval
ratio
nominal measurement
elements assigned to mutually exclusive/exhaustive equivalence classes
properties: distinctness (each element is different from one another)
transformation: one-to-one
- you can substitute element labels without changing the meaning of the classes
ordinal measurement
elements are assigned to mutually exclusive and exhuastive equivalence classes that are ranked or ordered with respect to one another
label properties: distinctness and order (denoted by numbers or other ordered symbols
transformation: strictly increasing monotonic (new labels must retain rank/order of the classes)
interval measurement
assigning elements numbers that denote equal differences as the differences in magnitude of the elements
-numbers have the properties of distinctness, order, and equivalence of intervals.
transformation: positive linear (a variables value can be multiplies by a positive constant and then added by a constant and still retain its meaning X’ = a+bX
ratio measurement
similar to interval except that the origin of the scale represents the absence of the measured characteristics
properties: distinctness, order, equivalence of intervals, and zeros as the absence of measurement
transformation: multiplication by a positive constant (transformation that preserves all the properties of a ratio scale)
national statistics- oldest of the three types
- enumerative and descriptive in character
- used to keep records of taxes and other state resources
- caesar ordered all citizens to report to the tax collector
- john graunt kept records of birth and death statistics
- first work to summarize and interpret data
probability theory (more info on history in notes)
earliest traces found in the orient concerning the gender or the expected child
experimental statistics (historical people in notes)
statistical procedures and principles that help guide the design of experiments.
graphs for qualitative variables
bar graph
pie chart
graphs for quantitative variables
histogram
frequency polygon
cumulative polygon
stem and leaf display(tukey)
frequency distribution
tables showing the equivalence classes and the frequency with which their score values occur
equivalence class for a single score value (quantitative)
is ungrouped
equivalence class for a collection of scores (quantitative)
is grouped
equivalence class for a qualitative category
is ungrouped
class intervals
are the equivalence classes of a frequency distribution
problems with frequency distributions
qualitative information is lost
advantage of grouped frequency distributions for quantitative variables
they are easier to interpret than ungrouped frequency distributions when theres a large amount of data
nominal limits (upper and lower)
used to represent the class intervals.
real limits
-used to show that there are no gaps between the class intervals.
-used to compute class interval size
upper limit= .5 above nominal upper limit
lower limit= .5 below nominal lower limit
7 guidelines for constructing a frequency distribution
- class intervals should be mutually exclusive
- for quantitative variables , there should be no gaps between the class intervals
- all quantitative class intervals should have the same width or size
- the distribution should have 10 to 20 class intervals unless the number of scores is very small
- preferred class interval sizes are 1, 2, 3, 5, 10, 15, 20…
- nominal lower limit of each quantitative class interval should be divisible by the class interval size
- the class interval containing the largest score value should be placed at the top left.
- for qualitative variables, the order of the class intervals should reflect the order inherent in the variable
- if variable is unordered they can be ordered alphabetically
grouping scores results in
loss of information
-grouped distributions are typically used if the spread of the scores is large.
relative frequency distributions
a distribution that shows the proportionate or percentage frequency for each class interval
proportionate frequency (prop f) propf= f/n
percentage (%f)
%f= f/n x 100
why are relative frequency distributions useful
they are useful for comparing two frequency distributions with different sample sizes (n)
cumulative frequency distribution
shows the number, proportion, or percentage of scores that occur below the real upper limit of each class interval
equivalence classes are ordered alphabetically if
there is no inherent order to the variables