Module 1 Notes - Defining & Collecting Data Flashcards
Categorical (Qualitative)
Variables take categories as their values such as “yes”, “no”, or “blue”, “brown”, “green”.
Numerical (Quantitative)
Variables have values that represent a counted or measured quantity
Discrete Numerical Values
Variables arise from a counting process
Continuous Numerical Variables
Variables arise from a measuring process
Variable Type EX: Do you have a car?
Categorical
Variable Type EX: How many classes are you taking this semester?
Numerical (Discrete)
Variable Type EX: How long did you watch TV last night?
Numerical (Continuous)
Data classified into distinct categories in which no ranking is implied Nominal Scale (Measurement Scales)
Nominal Scale (Measurement Scales)
Data classified into distinct categories in which ranking is implied
Ordinal Scale (Measurement Scales)
Measurement Scales EX: Cellular Providers - AT&T, Sprint, Verizon, Other, None
Nominal Scale
Measurement Scales EX: Student Class Designation - Freshman, Sophomore, Junior, Senior
Ordinal Scale
An ordered scale in which the difference between measurements is a meaningful quantity but the measurements do not have a true zero point
Interval Scale (Measurement Scales)
An ordered scale in which the difference between the measurements is a meaningful quantity and the measurements have a true zero point or character of origin
Ratio Scale (Measurement Scales)
Numerical Variable EX: Temperature (in degrees Celsius or Fahrenheit)
Level of Measurement - Interval
Numerical Variable EX: Salary (in USD or YEN)
Level of Measurement - Ratio
Nominal (Defined Categories) & Ordinal (Ordered Categories)
Categorical Variables
Discrete (Counted Items) & Continuous (Measured Characteristics)
Numerical Variables
A __________ contains all of the items or individuals of interest that you seek to study
Population
A ______ contains only a portion of a population of interest
Sample
Collecting data via sampling is used when doing so is (1.)
-Less time consuming than selecting every item in the population
Collecting data via sampling is used when doing so is (2.)
- Less costly than selecting every item in the population
Collecting data via sampling is used when doing so is (3.)
- Less cumbersome and more practical than analyzing the entire population
A __________ _________ summarizes the value of a specific variable for a population
Population Parameter
A ______ _________ summarizes the value of a specific variable for sample data
Sample Statistic
Sources of data arise from the following activities (1.)
- Capturing data generated by ongoing business activities
Sources of data arise from the following activities (2.)
- Distributing data compiled by an organization or individual
Sources of data arise from the following activities (3.)
Compiling the responses from a survey.
Sources of data arise from the following activities (4.)
Conducting a designed experiment & recording the outcomes
Sources of data arise from the following activities (5.)
Conducting an observational study & recording the results
A bank studies years of financial transactions to help them identify patterns of fraud.
Ex of data collected from ongoing business activities
Marketing companies use of tracking data to evaluate the effectiveness of a website
Ex of data collected from ongoing business activities
Financial Data on a company provided by investment services
Ex of data distributed by an organization or individual
Stock prices, weather conditions, and sports statistics in daily newspapers.
Ex of data distributed by an organization or individual
A survey asking people which laundry detergent has the best stain-removing abilities
Ex of Survey Data
Political polls of registered voters during political campaigns
Ex of Survey Data
Consumer testing different versions of a product to help determine which product should be pursued further
Ex of data from a designed experiment
Market testing of alternative product promotions to determine which promotion to use more broadly
Ex of data from a designed experiment
Measuring the time it takes for customers to be served in a fast food establishment
Ex of Data collected from observational studies
Market researchers utilizing focus groups to elicit unstructured responses to open-ended questions
Ex of Data collected from observational studies
(Sources of data) - The data collector is the one using the data for analysis
Primary source
(Sources of data) - The person performing data analysis is not the data collector
Secondary source
- Data used from a political survey
- Data collected from an experiment
- Observed data
Primary source
- Analyzing census data
- Examining data from print journals or data published on the Internet.
Secondary sources
The ________ _____ is a listing of items that make up the population
Sampling Process
Inaccurate or biased results can result if a frame excludes certain groups or portions of the population
Sampling Process
In a ______________ sample, items included are chosen without regard to their probability of occurrence
Nonprobability
In ___________ sampling, items are selected based only on the fact that they are easy inexpensive, or convenient to sample
convenience (nonprobability)
In a _________ sample, you can get the opinions of pre-selected experts on the subject matter.
Judgement (nonprobability)
Simple &/or Random, Systematic, Stratified, Cluster
Probability Sample
in a ___________ sample, items in the sample are chosen on the basis of known probabilities
probability
-Every individual or item from the frame has an equal chance of being selected
-Selection may be with replacement or without replacement.
-Samples obtained from a table of random numbers or computer random number generators.
Simple Random Sample
-Based on sample size (n)
-Divide frames of (N) individuals into groups (k): (k=N/n)
-Randomly select one individual from the 1st group
Select every k^th individual thereafter
Systematic Sample
-Divide population into two or more subgroups (called strata) according to some common characteristics.
-A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes.
-Samples from subgroups are combined into one.
-This is a common technique when sampling population of voters, stratifying across racial or socio-economic lines.
Stratified Sample
-Population is divided into several “clusters”, each representative of the population.
-A simple random sample of clusters is selected
-All items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling technique
-A common application of sampling involves election exit polls where certain election districts are selected and sampled.
Cluster Sample
-Simple to use.
-May not be a good representation of the population’s underlying characteristics.
Sample random sample & Systemic sample
Ensures representation of individuals across the entire population
Stratified Sample
-More cost effective
-Less efficient (need larger sample to acquire the same level of precision).
Cluster sample
Judgement & Convenience Samples
Non-probability Samples
Simple &/or Random, Stratified, Systemic, Cluster
Probability Samples
Exists if some groups are excluded from the frame and have no chance of being selected.
Coverage error or selection bias
People who do not respond may be different from those do respond.
Nonresponse error or bias
Variation from sample to sample will always exist
Sampling error
Due to weaknesses in question design and/or respondent error
Measurement error
What would be considered a discrete quantitative (numerical) variable?
The number of employees of an insurance company
To monitor campus security, the campus police office is taking a survey of the number of students in a parking lot each 30 minutes of a 24-hour period with the goal of determining when patrols of the lot would serve the most students. If X is the number of students in the lot each period, then X is an example of
a discrete variable
Researchers are concerned that the weight of the average American school child is increasing implying, among other things, that children’s clothing should be manufactured and marketed in larger sizes. If X is the weight of school children sampled in a nationwide study without rounding, then X is an example of
a continuous variable.
The chancellor of a major university was concerned about alcohol abuse on her campus and wanted to find out the proportion of students at her university who visited campus bars on the weekend before the final exam week. Her assistant took a random sample of 250 students. The answer on “whether you visited campus bars on the weekend before the final exam week” from students in the sample is an example of __________.
a categorical variable.
The manager of the customer service division of a major consumer electronics company is interested in determining whether the customers who have purchased a Blu-ray player made by the company over the past 12 months are satisfied with their products. Referring to this scenario, the possible responses to the question “What is your annual income rounded to the nearest thousands?” are values from a
discrete numerical variable
The manager of the customer service division of a major consumer electronics company is interested in determining whether the customers who have purchased a Blu-ray player made by the company over the past 12 months are satisfied with their products. Referring to this scenario, the possible responses to the question “How would you rate the quality of your purchase experience with 1 = excellent, 2 = good, 3 = decent, 4 = poor, 5 = terrible?” are values from a
categorical variable
The manager of the customer service division of a major consumer electronics company is interested in determining whether the customers who have purchased a Blu-ray player made by the company over the past 12 months are satisfied with their products. Referring to this scenario, the possible responses to the question “Out of a 100-point score with 100 being the highest and 0 being the lowest, what is your satisfaction level on the videocassette recorder that you purchased?” are values from a
discrete numerical variable
The possible responses to the question “How many times in the past three months have you visited a city park?” are values from a discrete variable. (T or F)
True.
The amount of coffee consumed by an individual in a day is an example of a discrete numerical variable. (T or F)
False
Whether the university is private, or public is an example of a nominal scaled variable. (T or F)
True
Marital status is an example of an ordinal scaled variable. (T or F)
False
A Wall Street Journal poll asked 2,150 adults in the U.S. a series of questions to find out their view on the U.S. economy. Referring to this scenario, the population of interest is
all adults living in the U.S when the poll was taken.
A Wall Street Journal poll asked 2,150 adults in the U.S. a series of questions to find out their view on the U.S. economy. Referring to this scenario, the 2,150 adults make up
the sample
A Wall Street Journal poll asked 2,150 adults in the U.S. a series of questions to find out their view on the U.S. economy. Referring to this scenario, the possible responses to the question “How many people in your household are unemployed currently?” result in
a ratio scale variable
A Wall Street Journal poll asked 2,150 adults in the U.S. a series of questions to find out their view on the U.S. economy. Referring to this scenario, the possible responses to the question “What do you think is the current unemployment rate?” result in
a ratio scale variable
A Wall Street Journal poll asked 2,150 adults in the U.S. a series of questions to find out their view on the U.S. economy. Referring to this scenario, the possible responses to the question “In which year do you think the last recession in the U.S. started?” result in
an interval scale variable
What is most likely a population as opposed to a sample?
registered voters in a county
The manager of the customer service division of a major consumer electronics company is interested in determining whether the customers who have purchased a Blu-ray player made by the company over the past 12 months are satisfied with their products. The population of interest is
all the customers who have bought a Blu-ray player mad by the company over the past 12 months
A summary measure that is computed to describe a characteristic from only a sample of the population is called
A sample statistic
A summary measure that is computed to describe a characteristic of an entire population is called
a population parameter
Jared was working on a project to look at global warming and accessed an Internet site where he captured average global surface temperatures from 1866. Which of the four methods of data collection was he using?
Published sources
The British Airways Internet site provides a questionnaire instrument that can be answered electronically. Which of the 4 methods of data collection is involved when people complete the questionnaire?
Surveying
A marketing research firm, in conducting a comparative taste test, provided three types of peanut butter to a sample of households randomly selected within the state. Which of the 4 methods of data collection is involved when people are asked to compare the three types of peanut butter?
Experimentation
Which of the 4 methods of data collection is involved when a person counts the number of cars passing designated locations on the Los Angeles freeway system?
Observation
To obtain a sample of 10 books in the store, the manager walked to the first shelf next to the cash register to pick the first 10 books on that shelf. This is an example of a
convenience sample
To find out the potential impact of a new zoning law on a neighborhood, the legislators conduct a focus group interview by inviting the members of the housing owner’s association of that neighborhood. This is an example of a
judgement sample
All students in a class are divided into groups of 15. One student is randomly chosen from the 1st group, the remaining observations are every 15th student thereafter. What sample is this?
Systematic yield
All students in a class are grouped according to their gender. A random sample of 8 is selected from the males and a separate random sample of 7 is drawn from the females. Which sample is this?
Stratified sample
All students in a class are divided into groups according to the rows that they are seated. One of the groups is randomly selected. Which sample is this?
cluster sample
Which of the following can be reduced by proper interviewer training?
Measurement error