Review Slides Flashcards
Statistics
A collection of methods for collecting, displaying, analyzing and drawing conclusions from data.
Descriptive Statistics
the branch of statistics that involves organizing, displaying and describing data.
Inferential Statistics
the branch of statistics that involves drawing conclusions about a population based on info contained in a sample taken from that population.
Population
any specific collection of objects of interest.
Sample
any subset of the population.
Census
A sample that consists of the whole population.
Measurement
a number or attribute computed for each member of a population or sample.
Sample data
The collective measurements of sample elements.
Parameter
number that summarizes some aspect of the population as a whole.
Statistic
number computed from the sample data.
Individuals
the individual units of observation (members of a population or a sample)
variable
any characteristic of an individual.
Distribution
tells us what values the variable takes, and how often it takes these values.
Qualitative data/variables
measurements for which there is no natural numerical scale, but which consists of attributes or other non-numerical characteristics.
Quantitative variables/data
measurements for which there is a numerical scale.
Categorical Variable
codes whether each one in a set of observations is in a particular category.
Nominal variable
assigns numerical labels to qualitative variables that represent different categories that cannot be ranked.
Ordinal variable
assigns numerical labels to qualitative variables that represent different categories that can be ranked.
data list
explicit listing of all the individual measurements made on a sample.
data frequency table
table listing each distinct value (x) and its frequency.
Frequency of a value x
is the number of times it appears in the data set.
Sturges’ Rule
the desirable number of classes = k, the closest integer to:
1+3.3log(n)
Sample size
The number of individuals in a sample
Absolute class frequency (or class frequency)
the number of measurements in the data set that are in the class
absolute frequency distribution
a tabular summary of a data set that shows the absolute class frequency for each class.
Relative Frequency
proportion of all measurements in the data set that are in the class.
Relative frequency distributions
tabular summary of a data set that shows the relative class frequency for each class.
cumulative class frequency
the sum of all class frequencies up to and including the class in question
Frequency Histogram
Graphical device showing how data are distributed across the range of their values by collecting them into classes and indicating the number of measurements in each class.
Relative Frequency histogram
Graphical device showing how data are distributed across the range of their values by collecting them into classes and indicating the proportion of measurements in each class.
Internal Data
Data that is created as by-products of regular activities
External Data
Data that is created by entities other than the person, firm or government that wants to use the data.
Survey
To survey is to ask individuals question or series of questions in order to gather information about what they do or what they believe.
Nonprobability Sample
a sample taken from a population in a haphazard fashion, without the use of some randomizing device assigning each member a known probability of selection.
probability sample
is a sample taken with the help of a randomizing device that assures each member a known probability selection.
random sample
is a sample obtained using a randomizing device that assures each member of the population has an equal chance of being in the sample.
Nonresponse Bias
a systematic tendency for elementary units with particular characteristics not to contribute to data.
response bias
the tendency for answers to survey questions to be systematically wrong.
sample mean
x bar = sum of X / n
Population mean
miu = sum of X / N
Sample median:
odd
even
Odd number of measurements is the middle measurements when the data is assigned in numerical order.
Even number of measurements is the mean of the middle two measurements when the data is in numerical order.
Sample mode
most frequently occurring value.
Range
R = Xn - X1
Skewness
a measure of the frequency distributions deviation from symmetry.
Kurtosis
a measure of the heavy tailedness of its frequency distribution.
Coefficient of a variation is the ratio…
of the standard deviation to the mean
Sample Proportion (P)
the frequency of observations in a particular category as a fraction of the sample size.
Given an observed value of X in a data set, X is the Pth percentile of the data if
the percentage of the data that are less than or equal to X is P.
If X is the Pth percentile of the data, then the number P
is the percentile rank of X.
Quartiles
3 percentiles that cut the data into fourths.
Q2
Q1
Q3?
Q2 is the median
Q1 is the 25th percentile (first quartile)
Q3 is the 75th percentile (third quartile)
IQR
Q3-Q1
Five number summary consists of
smallest, Q1, median (Q2), Q3, largest.
Outlier
a measurement that is far removed from most or all of the remaining measurements.
Boxplot
Graphical summary of the distribution of the data based on the fivenunmsum
z-score
z = (X-miu) / st dev.
empirical rule says that if a data set
hint: 68, 95, 99.7
has an approximately bell shaped frequency histogram, then approximately
- 68% of the data lie within 1 standard deviation of the mean
- 95% of the data lie within 2 st. dev. of the mean
- 99.7% of the data lies within 3 st. dev. of the mean.
Chebyshev’s Theorem says that for any numerical data set
at least
-3/4 of the data lie within 2 st. Dev of the mean
-8/9 of the data lie within 3 st. dev. of the mean
-(1 - 1/k^2) lie within k st. dev. of the mean.
k is any positive whole number greater than 1.
Probability Theory
a branch of mathematics concerned with the analysis of random phenomena.
Statistical Inference
a set of techniques to turn sample evidence into valid conclusions about populations of interest.
Experiment
any repeatable process from which an outcome, measurement or result is obtained.
Random Experiment
An experiment that produces a definite outcome that cannot be predicted with certainty
Trial
one repetition of a random experiment
sample space (outcome space) associated with a random variable
is the set of all possible outcomes
event
a subset of the sample space.
an event E is said to occur on a particular trial of an experiment if
the outcome observed is an element of the set E.
simple event
is any basic outcome from a random experiment
composite event
any combination of 2 or more basic outcomes from a random experiment.
probability of an outcome e in a sample space S is
the number p between 0 and 1 that measures the likelihood that e will occur on a single trial of the experiment.
the probability of an event A (p(A)) is
the sum of the probabilities of the individual outcomes of which it is composed.
Factorials
n! = n x (n-1) x (n-2) x …. x 1
0! = 1
A permutation of n different things taken x at a time is
an arrangement in a specific order of any x of the n things
nPx = n! / (n-x)!
Combination of n things taken x at a time is
an arrangement of any x of these things without regard to order
nCx = n! / (n-x)! x!
Intersection of events A and B is the collection
of all outcomes that are elements of both A and B
Events A and B are mutually exclusive or disjoint if
they have no elements in common
probability rule for mutually exclusive events is that events A and B are
mutually exclusive only if p(A inter B) = 0
union of events A and B is the
collection of all outcomes that are elements of one or the other of the sets A and B or both of them
the special addition law says that
for any 2 mutually exclusive events A and B,
p(A union B) = P(A) + P(B)
unconditional probability is the
likelihood that a particular event will occur regardless of whether another event occurs
Joint probability p(A inter B) is the
likelihood that 2 or more events will simultaneously occur (jointly)
Conditional probability of A given B is the
probability that A has occurred in a trial of a random experiment, given that B has also occurred.
collectively exhaustive events
when the union contains all the basic elements of the sample space
partition
a set of elements is mutually exclusive and collectively exhaustive
unconditional from joint rule
to obtain an unconditional probability from joint probabilities, we sum the joint probabilities over all possible events in a partition
joint probability table shows
Frequencies or relative frequencies for joint events
in the context of joint probability tables, we also refer to _____ as _______
unconditional probability as marginal probability
General multiplication law
p(A inter B) = p(A) x p(B|A)
joint = unconditional x conditional
events A and B are dependent when
the probability of occurrence of A is affected by the occurrence of B so that p(A) not equal to p(A|B)
A and B are independent only if
p(A) = p(A|B)
special multiplication law
the joint probability of A and B is the product of the unconditional probabilities of A and B
posterior probability
is a revised probability
RV
a numerical quantity that is generated by a random experiment
a set of possible values is countable if
all possible values can be listed one after the other
Discrete RV
a RV that has either a finite or countable number of possible values
continuous RV
a RV for which the possible values contain a whole interval of real numbers
cumulative distribution function (CDF) of a RV X is
the probability that X is less than or equal to a particular value of x
the probability distribution of a discrete RV X is a
list of each possible value of X together with the probability that X takes that value in one trial of the experiment
a Bernoulli process with parameters p and n consists of a of n identical trials of a random experiment such that each trial
- Produces one of two possible complementary outcomes, which have probability p=success and q=failure
- stands independent of any other trial
Binomial RV with parameters n and p
a discrete RV that counts the number of successes in a Bernoulli process with parameters n and p
binomial probability distribution is a
list of each possible value of a binomial RV X together with the probability that X takes that value
probability distribution of a continuous RV X is an assignment of probabilities to intervals of decimal numbers using a function f(x) called a ________ in the following way:
density function
the probability that X assumes a value in the interval [a, b] = the area under curve bounded below by the x-axis and bounded on the left and right by x = a and x = b
normal distribution with mean (miu) and st dev is
the probability distribution corresponding to the density function for the bell curve with parameters miu and st dev.
normally distributed RV
a continuous RV whose probabilities are described by the normal distribution with mean and st dev
standard normal RV is a
normally distributed RV with mean = 0 and st dev = 1