Exam 1 Flashcards
Range = ?
Range = Maximum - Minimum
Define Sample
A sample is a set of data drawn from the population.
Define Population
— a population is the group of all items of interest to a statistics practitioner.
define parameter
A descriptive measure of a population
Define Statistic
A descriptive measure of a sample.
define descriptive statistics
Descriptive statistics deals with methods of organizing, summarizing, and presenting data in a convenient and informative way.
Define inferential statistics
Inferential statistics is a body of methods used to draw conclusions or inferences about characteristics of populations based on sample data.
We use __________ to make inferences about _____________.
We use statistics to make inferences about parameters.
Define confidence level
The confidence level is the proportion of times that an estimating procedure will be correct.
define significance level
the significance level measures how frequently the conclusion will be wrong in the long run.
_______ and _________ are popular numerical techniques to describe the location of the data.
The mean and median are popular numerical techniques to describe the location of the data.
The _______, ________, and ______ _______ measure the variability of the data
The range, variance, and standard deviation measure the variability of the data
Define Variable
A variable is some characteristic of a population or sample. Usually represented by an uppercase letter like X, Y, Z, etc
define values of variable
The values of the variable are the range of possible values for a variable.
Three types of data and information
Interval Data, Nominal Data, Ordinal Data
Define Interval Data
Real numbers, i.e. heights, weights, prices, etc. Intervals between each value are equally split. Arithmetic operations can be performed on Interval Data
Define Nominal Data
The values of nominal data are categories EX: marital status: Single = 1, Married = 2, Divorced = 3, Widowed = 4 Usually data fits into classification category
Nominal data are also called _________ or _________.
Nominal data are also called qualitative or categorical.
Interval data are also called _________ or ____________.
Interval data are also called quantitative or numeral
Define Ordinal Data
Ordinal Data appear to be categorical in nature, but their values have an order; a ranking to them: College course rating system: poor = 1, fair = 2, good = 3, very good = 4, excellent = 5
______ _____ refers to quantities that have a natural ordering.
Ordinal Data refers to quantities that have a natural ordering. With ordinal data you cannot state with certainty whether the intervals between each value are equal. Small, Medium, Large (small may not be the same distance from medium as medium is from large)
Interval Data Summary
Interval Values are real numbers. All calculations are valid. Data may be treated as ordinal or nominal.
Ordinal Data Summary
Ordinal Values must represent the ranked order of the data. Calculations based on an ordering process are valid. Data may be treated as nominal but not as interval.
Nominal Data Summary
Nominal Values are the arbitrary numbers that represent categories. Only calculations based on the frequencies of occurrence are valid. Data may not be treated as ordinal or interval.
The only allowable calculation on nominal data is to ______ ___ ________ of each value of the variable.
The only allowable calculation on nominal data is to count the frequency of each value of the variable.
What does a relative frequency distribution do? (%)
A relative frequency distribution lists the categories and the proportion with which each occurs.
what is a frequency distribution How Frequent a Category was chose
We can summarize the data in a table that presents the categories and their counts called a frequency distribution.
Bar Charts show ___________.
Bar Charts show frequencies
Pie Charts show __________.
Pie Charts show relative frequencies.
Histograms and stem & leaf displays are used to graphically describe ________ ____.
Histograms and stem & leaf displays are used to graphically describe interval data.
Define a Histogram
A Histogram is a graphical display of data using bars of different heights. It is similar to a Bar Chart, but a histogram groups numbers into ranges Histograms are great for illustrating the frequency of continuous data (no gaps), but if the data is categorical, use a bar chart (gaps)
Observations measured at successive points in time are called _________ data. _________ data graphed on a line chart.
Observations measured at successive points in time are called time-series data. Time-series data graphed on a line chart,
what does a scatter diagram do
Scatter diagram (plots two variables against one another) Describe the relationship between two variables How two interval variables are related
The Independent variable is and is on the
X Horizontal
The Dependent variable is and is on the
Y Vertical
Three patterns of scatter diagrams
positive linear relationship, negative linear relationship, weak or non-linear relationship
What kind of data do you use histograms for
Interval data
Measures of central location
Mean, Median, Mode
Measures of Variability
Range, Standard Deviation, Variance, Coefficient of Variation
Measures of relative standing
Percentiles, Quartiles
Measures of Linear Relationship
Covariance, Correlation, Determination, Least Squares Line
Mean = ?
Mean = Sum of the Observations/Number of observations
When referring to the number of observations in a population, we use ___________
When referring to the number of observations in a population, we use uppercase letter N
When referring to the number of observations in a sample, we use __________
When referring to the number of observations in a sample, we use lower case letter n
The arithmetic mean for a population is denoted with Greek letter “mu”:
The arithmetic mean for a population is denoted with Greek letter “mu”: u with a tail
The arithmetic mean for a sample is denoted with an “x-bar”:
XBAR
Population mean Formula
Population Mean Formula

Sample Mean Formula
sample mean formula

The _______ is calculated by placing all the observations in order; the observation that falls in the middle is the ________.
The median is calculated by placing all the observations in order; the observation that falls in the middle is the median.
The ____ of a set of observations is the value that occurs most frequently. ____ is useful for all data types, though maily used for nominal data.
The mode of a set of observations is the value that occurs most frequently. Mode is useful for all data types, though maily used for nominal data.
Compute the Mean to
Describe the central location of a single set of interval data
Compute the Median to
Describe the central location of a single set of interval or ordinal data
Compute the Mode to
Describe a single set of nominal data
The range is the simplest measure of ______, calculated as: Range = ?
The range is the simplest measure of variability, calculated as: Range = Largest observation – Smallest observation
_______ and its related measure, _______ ________, are arguably the most important statistics. Used to measure variability, they also play a vital role in almost all statistical inference procedures.
Variance and its related measure, standard deviation, are arguably the most important statistics. Used to measure variability, they also play a vital role in almost all statistical inference procedures.
Population variance is denoted by
Population variance is denoted by
(Lower case Greek letter “sigma” squared) σ ²
Sample variance is denoted by
Sample variance is denoted by
(Lower case “S” squared) s²
The variance of a population is: EQUATION
The Variance of a population is :

The Variance of a sample is: EQUATION
The Variation of a sample is:

The _______ __________ is simply the square root of the __________
The standard deviation is simply the square root of the variance
Population standard deviation looks like
Population standard deviation looks like
σ

Sample standard deviation looks like:
Sample standard deviation looks like: s

Empirical Rule, which states:
Approximately 68% of all observations fall within one standard deviation of the mean.
Approximately 95% of all observations fall within two standard deviations of the mean.
Approximately 99.7% of all observations fall within three standard deviations of the mean.
_______: the Pth percentile is the value for which P percent are less than that value and (100-P)% are greater than that value.
Percentile
We have special names for the 25th, 50th, and 75th percentiles, namely __________.
quartiles
The three quartiles are as follows:
The first or lower quartile is labeled Q1 = 25th percentile.
The second quartile, Q2 = 50th percentile (which is also the median).
The third or upper quartile, Q3 = 75th percentile.
Location of Percentiles: EQUATION
Location of Percentiles:

Interquartile Range = ?
Interquartile Range = Q3 - Q1
two numerical measures of linear relationship that provide information as to the strength & direction of a linear relationship between two variables
They are the covariance and the coefficient of correlation.
Population Covariance looks like

Sample Covariance Looks like
Sample Covariance Looks Like

When two variables move in the same direction (both increase or both decrease), the covariance will be a _____ _______ number.
When two variables move in the same direction (both increase or both decrease), the covariance will be a large positive number.
When two variables move in opposite directions, the covariance is a ______ _______ number.
When two variables move in opposite directions, the covariance is a large negative number.
When there is no particular pattern, the covariance is a ______ number.
When there is no particular pattern, the covariance is a small number.
Define Coefficient of Correlation
The coefficient of correlation is defined as the covariance divided by the standard deviations of the variables:

Sample Coefficient of Correlation looks like:
Sample Coefficient of Correlation

The coefficient of correlation is
The advantage of the coefficient of correlation over covariance is that it has fixed range from -1 to +1, thus:
If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship).
If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship).
No straight line relationship is indicated by a coefficient close to zero.
Symbol Table:

Symbol Table:

A survey ……
A survey solicits information from people
Key design
principles of a survey:
Key design
principles of a survey:
Keep the questionnaire as short as possible
Ask short, simple, and clearly worded questions
Start with demographic questions to help respondents get started comfortably
Use dichotomous (yes/no) and multiple choice questions
Use open-ended questions cautiously
Avoid using leading-questions >>>
the ______ population and the ______ population should be similar to one another.
the sampled population and the target population should be similar to one another.
A ______ ______ is a method or procedure for specifying how a sample will be taken from a population.
A sampling plan is a method or procedure for specifying how a sample will be taken from a population.
3 common methods of sampling plans
Simple random sampling
Stratified random sampling
Cluster sampling
Define Simple Random Sampling
A simple random sample is a sample selected in
such a way that every possible sample of the
same size is equally likely to be chosen.
Example: Drawing three names from a hat containing all the names of the students in the class is an example of a simple random sample (any group of three names is as equally likely as picking any other group of three names)
Define Stratified Random Sampling
A stratified random sample is obtained by separating the population into mutually exclusive sets, or strata, and then drawing simple random samples from each stratum.
Divide population into two or more subgroups (called strata) according to some common characteristic
A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes
Samples from subgroups are combined into one
This is a common technique when sampling population of voters, stratifying across racial or socio-economic lines
Define Cluster Sampling
A cluster sample is a simple random sample of groups or clusters of elements (vs. a simple random sample of individual objects).
This method is useful when it is difficult or costly to develop a complete list of the population members or when the population elements are widely dispersed geographically
Compare the sampling methods
Simple random sample
Simple to use
May not be a good representation of the population’s underlying characteristics
Stratified random sample
Ensures representation of individuals across the entire population
Cluster sample
More cost effective
Less efficient (need larger sample to acquire the same level of precision)
The ______ the sample size is, the more accurate we can expect the sample estimates to be
The larger the sample size is, the more accurate we can expect the sample estimates to be
Define Sampling Error
Sampling error refers to differences between the
sample and the population that exist only
because of the observations that happened to be
selected for the sample.
Increasing the sample size will reduce this error
Define Nonsampling errors
Nonsampling errors are more serious and are due to mistakes
made in the acquisition of data or due to the sample
observations being selected improperly.
(Note: increasing the sample size will not reduce this type of error.)
3 types of nonsampling errors:
Errors in data acquisition
Nonresponse errors
Selection bias
Errors in data acquisition
…arises from the recording of incorrect responses
Define Selection Bias
…occurs when the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample
______ occurs when the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample
Selection Bias occurs when the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample