Introduction to statistics Flashcards
[used with a plural verb] Facts or data, either numerical or
non numerical, organized and summarized so as to
provide useful and accessible information about a
particular subject.
Statistics
[used with a singular verb] the science of organizing and
summarizing numerical or non numerical information
Statistics
The collection of numerical information often leads to
large masses of data which, if they are to be understood,
or presented effectively, must be summarised and
analysed in some way.
Statistics
The art of learning from data. It is concerned with the
collection of data, their subsequent description, and their
analysis, which often leads to the drawing of conclusions.
Statistics
The collection of all individuals or items
under consideration in a statistical study.
Population
That part of the population from which
information is obtained
Sample
Two types of statistics
descriptive and inferential
Concerns the summarization of data. This entails
calculating numbers from the data, called
descriptive measures, such as percentages, sums,
averages, and so forth.
Descriptive Statistics
A conclusion drawn about the population
from which the data originated. In which data collected from a “sample”
population is used to infer properties of a larger
population. Consists of methods for drawing and measuring
the reliability of conclusions about a population
Inferential Statistics
Descriptive statistics and Inferential
statistics are
Interrelated
Aims to describe a chunk of raw data using
summary statistics, graphs, and tables
Descriptive statistics
It uses samples to draw
inferences about larger populations
Inferential statistics
The tools used in inferential statistics:
•hypothesis testing
•regression analysis
used to test whether a hypothesis about a
population is true or not
hypothesis testing
the purpose of the study is to examine and explore
information for its own intrinsic interest only
descriptive
the information is obtained from a sample of a
population and the purpose of the study is to use that
information to draw conclusions about the population
inferential
types of data
•quantitative data
•qualitative data
___ is any piece of collected
information
datum
is a collection of
data related to each other in some way
data set
Are any data that measure or are associated with a measurement
of the quantity of something. They invariably assume numerical values
Quantitative data
2 categories of quantitative data:
•discrete data
•continuous data
It take values in a finite or countably infinite set of
numbers, that is, all possible values could (at least in principle)
be written down in an ordered list.
Discrete data
It take values in an interval of numbers. These
are also known as scale data, interval data, or measurement
data
Examples: height, weight, length, time, etc.
Continuous data
It take values in an interval of numbers. These
are also known as scale data, interval data, or measurement
data
Examples: height, weight, length, time, etc.
They are often characterized by fractions or decimals: 3.82,
7.0001
Continuous data
Are any type of data that are not a
numerical, or do not represent numerical
quantities.
Examples:
subject’s name, gender, race/ethnicity,
political party, socioeconomic status, class
rank, driver’s license number, and social
security number.
Qualitative data
2 types of factors of qualitative data
•nominal
•ordinal
What do you call the possible values of a factor?
Levels
Factors that have levels that correspond to
names of the categories with no implied
ordering. (hair, gender, race or political party)
Nominal
Factors that have some sort of ordered
structure to the underlying factor levels.
(socioeconomic status; levels: income, education
and occupation, class rank, cancer stage)
Ordinal
Factors that have some sort of ordered
structure to the underlying factor levels.
(socioeconomic status; levels: income, education
and occupation, class rank, cancer stage)
Ordinal
It normally has columns which show the class intervals, class mid-points, class frequencies, and cumulative frequencies, the last of these being a running total of the frequencies themselves.
Grouped frequency distribution table
What table is being constructed from the raw data without having first arranged the values in rank order?
Tallied frequencies
___ shows, at a glance, how many items in the data are less than a specified value
cumulative frequency
It is sometimes more useful to use the ratio of the cumulative frequency to the total number of observations. This ratio is called the ___
relative cumulative frequency
It presents each distinct value along with
its frequency of occurrence.
frequency table
A diagram which is directly related to a grouped frequency
distribution table and consists of a collection of rectangles
whose height represents the class frequency (to some suitable
scale) and whose breadth represents the class width.
Histogram
A diagram which is directly related to a grouped frequency
distribution table and consists of a collection of rectangles
whose height represents the class frequency (to some suitable
scale) and whose breadth represents the class width.
Histogram
Data from a frequency table can be graphically pictured
by a line graph, which plots the successive values on the
horizontal axis and indicates the corresponding frequency by the
height of a vertical line.
Line graph
Sometimes the frequencies are represented not by lines
but rather by bars having some thickness. These graphs, called
bar graphs, are often utilized.
Bar graph
Another type of graph used to represent a frequency
table is the frequency polygon, which plots the frequencies of
the different data values and then connects the plotted points
with straight lines.
Frequency polygon
Another type of graph used to represent a frequency
table is the ___, which plots the frequencies of
the different data values and then connects the plotted points
with straight lines.
Frequency polygon
A plot of the relative frequencies looks
exactly like a plot of the absolute frequencies,
except that the labels on the vertical axis are
the old labels divided by the total number of
observations in the data set.
Relative frequency graphs
A plot of the relative frequencies looks
exactly like a plot of the absolute frequencies,
except that the labels on the vertical axis are
the old labels divided by the total number of
observations in the data set.
Relative Frequency Graphs
It is often used to plot relative
frequencies when the data are nonnumeric.
Pie charts
It is used to show the
relationship between two variables. It is the
most accurate way to display correlations, as
illustrated in the example below. However,
some analysts prefer to use bar charts, as
scatter plots can be difficult to interpret.
Scatter plot