Stats Midterm 1 Flashcards
What is Statistics
Collecting Data
e.g. Survey
Characterizing Data
e.g. Mean and Median
Analyzing Data
e.g. Trends and Patterns
Interpreting Data
e.g. Conclusions and Decisions
What is Statistics (cont’d)
Statistics is the science of data. It involves
collecting, classifying, summarizing,
organizing, analyzing, and interpreting
numerical information.
Descriptive Statistics
consists of methods for organizing and summarizing information.
includes the construction of graphs, charts, and tables and the calculation of various descriptive measures such as averages, measures of variation, and percentiles
Population
The collection of all individuals or
items under consideration in a statistical study
Sample
That part of the population from which
information is obtained.
Census
The collection of data from every member
of a population
Inferential statistics
consists of methods for drawing and
measuring the reliability of conclusions about a population based on information obtained from a sample of the population.
(draw conclusions)
observational study
is a data-collection method where the experimental units sampled are observed in their natural setting
designed experiment
is a data-collection method where the researcher exerts full control over the characteristics of the experimental units sampled.
Simple random sampling
A sampling procedure for which each possible sample of a given size is equally likely to be the one obtained. Also, called
probability sampling
Simple random sample
A sample obtained by simple random sampling
representative sample
exhibits characteristics typical of those possessed by the population of interest
Simple random sampling with replacement
(SRSWR)
whereby a member of the population can be selected more than once
Simple random sampling without
replacement (SRS)
whereby a member of the population can be selected at most once
experimental units
In a designed experiment, the individuals or items on which the experiment is performed are
subject
When the experimental units are humans
Principles of Experimental Design
Control: Two or more treatments should be compared.
Randomization: The experimental units should be randomly divided into groups to avoid unintentional selection bias in constituting the groups.
Replication: A sufficient number of experimental units should be used to ensure that randomization creates groups that resemble each other closely and to increase the chances of detecting any differences among the treatments.
The group receiving the specified treatment is
treatment group
the group receiving placebo is
control group
Response variable
The characteristic of the experimental outcome that is to be measured or observed
Factor
A variable whose effect on the response variable is of interest in the experiment
Levels
The possible values of a factor
Treatment
Each experimental condition
completely randomized design
all the experimental units are assigned randomly among all the treatments
randomized block design
the experimental units are assigned randomly among all the treatments separately within each block
histogram
displays the classes of the quantitative
data on a horizontal axis and the frequencies (relative frequencies, percents) of those classes on a vertical axis
Important Uses of a Histogram
- Visually displays the shape of the distribution of the data
- Shows the location of the center of the data
- Shows the spread of the data
- Identifies outliers
dotplot
is a graph in which each observation is
plotted as a dot at an appropriate place above a horizontal axis
Features of Dotplot
– Displays the shape of distribution of data.
– It is usually possible to recreate the original list of data values
stem-and-leaf diagram (or stemplot)
each observation is separated into two parts, namely, a stem-consisting of all but the rightmost digit- and a leaf, the rightmost digit.
Features of Stem-and-leaf
– Shows the shape of the distribution of the data.
– Retains the original data values.
– The sample data are sorted (arranged in order).
distribution of a data set
is a table, graph, or formula that provides the values of the observations and how often they occur
Population data
The values of a variable for the entire population
Sample data
The values of a variable for a sample of the population
population distribution, or the distribution of the variable
The distribution of population data is
sample distribution
The distribution of sample data is
Drawings of objects
pictographs
Variable
a characteristic that varies from one person (item/object) to another
Data
all the values of the variable
Data set
collection of all observations for a variable; also has more than one variable
Quantitative (or numerical) data collection
consist of numbers representing counts or
measurements
Categorical (or qualitative or attribute) data
consist of names or labels (not numbers that represent counts or measurements)
Discrete data
data values are quantitative, and the number of values is finite, or “countable.”
Example of Discrete data
Examples: The number of tosses of a coin before getting tails or the number of students in this class
Continuous data
result from infinitely many possible quantitative values, where the collection of values is not countable
Examples of Continuous Data
Examples: Heights, weights, lengths, temperature
frequency distribution
of qualitative data is a listing of the distinct values and their frequencies
relative-frequency distribution
of qualitative data is a listing of the distinct values and their relative frequencies
pie chart
is a disk divided into wedge-shaped pieces proportional to the relative frequencies of
the qualitative data
bar chart
displays the distinct values of the qualitative data on a horizontal axis and the relative frequencies (or frequencies or percents) of those values on a vertical axis.
Lower class limits
The smallest numbers that can belong to each of the different classes (categories)
Upper class limits
The largest numbers that can belong to each of the different classes (categories)
Class mark or midpoint
– The values in the middle of the classes.
– Each class midpoint can be found by:
▪ adding the lower class limit to the upper class limit and dividing the sum by 2.
Class width
The difference between two consecutive lower class limits (or two consecutive lower class boundaries) in a frequency distribution.
Lower class cutpoint
The smallest value that could go in a class
Upper class cutpoint
The largest value that could go in the next-higher class (equivalent to the lower cutpoint of the next- higher class)
Class width
The difference between the cutpoints of a class
Class midpoint
The average of the two cutpoints of a
class
Gaps
The presence of gaps can show that the data are from two or more different populations.
central tendency
of the set of measurements-that is, the tendency of the data to cluster, or center, about certain numerical values
mean
is the sum of the measurements divided by the number of measurements for the variable
A statistic is resistant
if the presence of extreme values (outliers) does not cause it to change very much.
mode
is the measurement that occurs most frequently in the data set
bimodal
When two data values occur with the same greatest frequency, each one is a mode, and the data set is
multimodal
When more than two data values occur with the same greatest frequency, each is a mode, and the data set is
no mode
When no data value is repeated, we say
A data set is said to be skewed
if one tail of the distribution has more extreme observations than the other tail.
variability
the spread of the data
range
Max - Min
sample variance
for a sample of n (represents the number of data values in a sample) measurements is equal to the sum of the squared deviations from the mean divided by (n – 1).
standard deviation
is a measure of how much data values
deviate away from the mean
Smaller values tell us that our data is not spread out
They indicate that most of the data values are clustered around the mean
Larger values tell us that our data is spread out
They indicate that most of the data values are not clustered around the mean
Three-Standard-Deviations Rule
Almost all the observations in any data set lie within three standard deviations to either side of the mean