Mid-Term Exam Flashcards
population
the group of all items (data) of interest.
- frequently very large; sometimes infinite.
sample
a sample of items (data) drawn from the population of interest.
- potentially large but much less than population.
- the sample is a subset of the population.
parameter
a descriptive measure of a population.
- Ex. population mean
statistic
a descriptive measure of a sample.
- Ex. sample mean
statistical inference
sample statistics are used to make inferences about population parameters, meaning an estimate, prediction or decision can be produced about a population based on sample data. therefore what is known about a sample can be applied to the larger population.
numerical data
- values are real numbers
- all calculations are valid
- data may be treated as ordinal or nominal
nominal data
- values are the arbitrary numbers that represent categories
- only calculations, such as proportions based on the frequencies of occurrence are valid
- data may be treated as ordinal or numerical
ordinal data
- values must represent the ranked order of the data
- calculations based on an ordering process are valid
- data may be treated as nominal but not as numerical
bar chart
a bar chart is mainly used for nominal data and graphically represents the frequency of each category as a bar rising vertically from the horizontal axis.
- bar height is proportional to frequency of the corresponding category
pie chart
a circle that is subdivided into slices whose area are proportional to the frequencies, therefore displaying the proportion of occurrences of each category.
- popular tool to represent proportions of appearance for nominal data
steps to building a histogram (3)
1) collect the data
2) create a frequency distribution for the data
- determine number of classes
- determine class width
3) draw a histogram of rectangle bars using the class intervals and the corresponding frequencies
class width
generally best to use equal class widths. unequal class widths are used when the frequency associated with some classes is too low, then: - several classes are combined together to form a wider and more populated class - it is possible to form an open-ended class at the higher or lower of the histogram
relative frequency
proportion of observations falling into each class, and should be used when comparing two or more histograms, each with different numbers/observations.
- often preferable than the frequency itself
class relative frequency (formula)
(class frequency) divided by (total number of observations)
equal class width (formula)
(largest value - smallest value) divided by (number of classes)
cumulative frequency of a class
the number of measurements less than the upper limit of that class.
to obtain the cumulative frequency of a class
add the frequency of that class with the frequencies of all previous classes.
cumulative relative frequency of a particular class
the proportion of measurements that are less than the upper limit of that class.
arithmetic mean
most popular and useful measure of central location.
- all values are used
- it is unique
- the sum of the deviations from the mean is 0
- calculated by summing the values and dividing by the number of values
median of a set of measurements
the value that falls in the middle when the measurements are arranged in order of magnitude.
- unique median for each data set
- commonly used measure of central location
mode of a set of observations
the value that occurs most frequently.
- data set may have one, two or more modes (modal classes)
- useful for all data, mainly used for nominal
- for large data sets, modal class is more relevant than a single-value mode
which measure of central location?
- mean is generally first selection unless outliers are present in the dataset, then the median should be used.
- mode is seldom the best measure of central location.
- median is not as sensitive to extreme as is the mean.
variance
this measure of dispersion reflects the values of all the measurements.
standard deviation
the square root of the variance of the measurements.
empirical rules
- approximately 68% of all observations fall within 1 standard deviation of the mean
- approximately 95% of all observations fall within 2 standard deviations of the mean
- approximately 99.7% of all observations fall within 3 standard deviations of the mean
probability of an event
the probability P(A) of event A is the sum of the probabilities assigned to the simple events contained in A.
intersection of event A and B
the event that occurs when both A and B occur.
joint probability of A and B
the probability of intersection A and B.
conditional probability
conditional probability is used to determine how two events are related; that is, it can be determined the probability of one event given the occurrence of another related event.
discrete random variable
one that takes on a countable number of values (integers).
continuous random variable
one whose values are not discrete, not countable (real numbers).
discrete probability distribution
a table, formula or graph that lists all possible values a discrete random variable can assume, together with their associated probabilities.