Chapter 1: Intro to Business Statistics and Analytics Flashcards
facts and figures from which conclusions can be drawn; an elementary description of things, events, activities and transactions that are recorded, classified, and stored but are not organized to convey any specific meaning
data
T or F: data is organized to convey a specific meaning
False; is not organized to convey any specific meaning
Data can be:
a. numbers
b. characters (a,b,c/ A,B,C)
c. words (name, address, text)
d. all of the above
d. all of the above (data can be ANYTHING)
the data that is collected for a particular study
data set (elements: may be people, objects, events, or other entries)
any characteristic of an element; describes data
variable (ex: model design, lot type, list price, selling price)
refers to the data that have been organized, so that we can derive meaningful and valuable info.
information
T or F: x and y are variables.
True
a way to assign a value of a variable to the element; something we can measure and assign value
measurement (ex: bill paid, profit loss)
the possible measurements of the values of a variable are numbers that represent quantities
quantitative
Quantitative or qualitative data:
numbers and can apply mathematical functions (add, subtract, etc.)
Quantitative
(Quantitative/qualitative) data must have UNITS; used when we can measure something (ex: height, weight)
Quantitative
the possible measurements fall into several categories; can be in words or numbers
Qualitative
T or F: for qualitative data, you can’t apply mathematical functions, only thing we can do is COUNT.
True
Is zip code, phone numbers, and social security numbers quantitative or qualitative data?
Qualitative (ex: can count the amount of people with ____ zip code, but you don’t add these numbers together)
What are 2 graphs you can make for qualitative data?
Bar graph and pie chart
What are 4 graphs you can make with quantitative data?
- Histogram
- Line graphs
- Stem and leaf plot
- Scatter plot
T or F: Dot plots are very easy to read and sometimes very easy to compare.
True
A bar graph (does/does not) have spaces between bars.
does
data collected at the same or approximately the same point in time
cross-sectional data (ex: when we select data from one month or one year; examples: GDP data for 2021, gross income for 2024, balance sheet for 2023)
data collected over different time periods
time series data (ex: we collect data from 2000-2024)
What is the first thing you should do we data?
Figure out if it is quantitative or qualitative.
Car type, car color, and sales month is an example of:
a. quantitative data
b. qualitative data
qualitative data
Car cost, sale price, and profit is an example of:
a. quantitative data
b. qualitative data
quantitative data
data already gathered by public or private sources (such as the internet, library, US government, or data collection agency)
existing sources
(is SECONDARY data; ex: unemployment rate)
data we collect ourselves for a specific purpose
experimental and observational studies
What are the terms that these definitions are describing?
1. variable of interest (ex: getting into Auburn)
2. other variables related to response variable (ex: GPA, extra activities, etc.)
- response variable
- factors (other variables we’re interested in; what your response variable is dependent on)
when an individual or company collects data directly (ex: Meta, credit card companies, social media, GPS, maps, etc.)
primary data
- y is the (dependent/independent variable).
- x is the (dependent/independent) variable.
- dependent (is a response variable)
- independent (can give any value to x)
When initiating a study, here are the steps:
1. Define the ______ ____ _____, called a response variable.
2. Next, define other variables that may be related to the variable of interest and will be measured, called _______ _______.
3. If we manipulate the independent variables, we have an _______ ______.
4. If unable to control independent variables, the study is ________.
- variable of interest (figure out what we’re trying to do)
- independent variables
- experimental study (something we can manipulate/control)
- observational (can’t control; don’t interfere, just write down observations)
Someone is measuring the waiting time to see a doctor. This is an example of a ________ study.
a. experimental
b. observational
b. observational
Professor evaluations at the end of the semester is an example of:
a. observational study
b. experimental study
c. survey data
c. survey data (can be anything– multiple choice, open-ended, etc.)
T or F: Companies hope to use past behavior and other information to predict customer responses
True
a process of centralized data management and retrieval; its objective is the creation and maintenance of a central repository for all of an organization’s data
data warehousing
massive amounts of data; often collected in real time in different forms; sometimes needing quick analysis
big data (where companies receive data in higher volume; ex: social media)
What are the 4 V’s of big data?
- Velocity
- Veracity (accuracy)
- Volume
- Variety
a set of all elements about which we wish to draw conclusions
population (ex: ALL Netflix customers, ALL banking customers, ALL students at auburn, etc.)
T or F: Population is usually very large
True
an examination of 100% of the population of measurements
census
a subset of the elements of a population
sample (doesn’t include everything; is a subset)
measurement of the variable of interest for each and every population unit; sometimes called observations
population of measurements (ex: annual starting salaries of all graduates from last year’s MBA program)
If the population is too large, analyze a _________.
subset
(ex: if we have 1 million residents, we will take a random sample (subset) to analyze that group)
the science of describing the important aspects of a set of measurements
descriptive statistics
Mean, median, mode, maximum number, minimum number, range, standard deviation, and variance are all examples of _________ __________.
descriptive statistics (they DESCRIBE our data)
the science of using a sample of measurements to make generalizations about the important aspects of a population of measurements
statistical inference
(making generalizations about the whole population based on the sample)
a sequence of operations that takes inputs and turns them into outputs
process (ex: turning raw materials into finished goods, like cars, laptops, etc.)
a population of limited size
finite population (ex: number of cars in parking lot, number of students in business analytics class, number of customers that have MasterCard)
T or F: you cannot count a finite population.
False; CAN count
a population of unlimited size
infinite population (ex: number of stars in the sky, number of red blood cells, etc.)
T or F: you can count an infinite population.
False; CANNOT
sampling where we know the chance that each element in the population will be included in the sample
probability sampling
________ sampling is required for statistical interference?
Probability
sampling where we select elements because they are convenient to sample; easy to sample
convenience sampling (not a type of probability sampling)
What are two other types of non-probability sampling besides convenience sampling?
- Voluntary response sampling
- Judgement sampling
samples in which participants self-select; frequently used in radio and television
Voluntary response sampling
T or F: Voluntary response sampling over represent people with strong opinions; politics.
True
samples in which a person who is extremely knowledgeable about the population selects population elements he or she feels are most representative
judgement sampling (ex: deliberately selecting smart students, then making the conclusion the whole class is smart. This is WRONG)
_______ sampling includes all non-probability sampling, which includes convenience, judgement, and volunteer sampling.
Improper
T or F: Big data often needs quick analysis to support business decision making.
True
the use of traditional and newly developed statistical methods, advances in IS, and techniques from management science to explore and investigate past performance.
business analytics
What are the 3 categories of business analytics?
- Descriptive analytics
- Predictive analytics
- Prescriptive analytics
the use of traditional and newer graphics to represent easy-to-understand visual summaries of up-to-the-minute data
Descriptive analytics (historical/past data)
methods used to find anomalies, patterns, and associations in data sets to predict future outcomes
Predictive analytics
the use of predictive analytics, algorithms, and IS techniques to extract useful knowledge from huge amounts of data; the process of discovering useful knowledge in extremely large data sets
data mining (insights)
looks at variables and constraints, along with predictions from predictive analytics, to recommend courses of action
Prescriptive analytics
5 Steps for Business Analytics:
1. Define ______/ what we’re trying to solve.
2. Collect _______
3. _______ analytics– tell story about past, make visualizations
4. _______ analytics (ex: what will be our profit in April, May, etc.)
5. _______ analytics– gives us recommendations (ex: if the company needs to make profit of $50,000, what should they do?)
- goal
- data
- descriptive
- predictive
- prescriptive
Applications of Predictive Analytics:
________ and _______ _______ put customers in different clusters and send them ads that fit their needs (ex: moms getting ads about diapers)
classification and cluster detection
What are the 6 applications of predictive analytics?
- Anomaly (outlier) detection
- Association learning (ex: if going to the store to buy bread, what’s the probability you will also buy butter/jelly, etc.)
- Classification
- Cluster detection
- Prediction
- Factor detection
a predictive analytics technique where we observe values of a respond variable and corresponding predictor variables
supervised learning (ex: linear regression, logistic regression, neural networks, decision trees)
where we observe values of variables but not a response variable
unsupervised learning (ex: cluster analysis, factor analysis, association rules)
Nominative and ordinal scales of measurement are (quantitative/qualitative) data.
qualitative (we can count)
Interval and ratio scales of measurement are (quantitative/qualitative) data.
quantitative (we can measure something; ex: how much, how many)
a qualitative variable for which there is no meaningful ordering, or ranking, of the categories (ex: gender, car color, zip codes)
nominative
T or F: nominative variables can be numerical or non-numerical.
True
a qualitative variable for which there is a meaningful ordering, or ranking, of the categories (ex: teaching effectiveness)
ordinal
T or F: ordinal variables can be numerical or non-numerical.
True (ex: survey that has the option of choosing 1 through 5, one being the worst and 5 being the best… OR grades: A,B,C,D,F)
A golf leaderboard is an example of an (ordinal/nominative) qualitative variable.
ordinal (because it has a meaningful ordering/ranking)
________ variable is a quantitative variable which has all of the characteristics of ordinal plus: measurements are on a numerical scale with an arbitrary zero point (where zero has no meaning)
interval
(ex: temperature– 0 degrees F means “cold”, not “no heat”; 60 degrees F is NOT twice as warm as 30 degrees F)
T or F: Interval variables can only meaningfully compare values by the interval between them. Cannot compare values by taking their ratios.
True
________ is the arithmetic difference between the values.
interval
(ex: age intervals– can do [0-12 months]= infant, [2-4]= toddler, [5-12]= child, and [13-19]= teen. These are intervals)
________ variable is a quantitative variable that has all of the characteristics of interval plus measurements are on a numerical scale with a MEANINGFUL zero point (zero means “none” or “nothing”)
ratio (something you can measure)
T or F: With ratio variables, values can be compared by their intervals and ratios.
True (ex: $30 is $20 more than $10; $0 means no money)
T or F: In business finance, most quantitative variables are ratio variables, such as anything related to money.
True (ex: earnings, profit, loss, age, distance, height, salary)
methods for obtaining a sample
sampling designs
the sample we take
sample survey
T or F: All sampling is random sampling.
False; NOT ALL
T or F: One common sampling design involves separately sampling important groups within a population.
True
What 3 types of sampling methods are non-probability sampling?
- Convenience
- Judgement
- Volunteer
What 4 types of sampling methods are probability sampling?
- Simple, random sample
- Stratified sampling
- Multi-stage cluster sampling
- Systematic sampling
a type of probability sampling where there is an equal chance of getting selected:
a. stratified sampling
b. systematic sampling
c. multi-stage cluster sampling
d. simple, random sample
d. simple, random sample
divide population into non-overlapping groups (strata) then select a random sample from each strata
stratified random sample
(ex: the population is high school students. Divide this population into non-overlapping groups of freshmen, sophomores, juniors, and seniors. Then randomly select 20 students from each group)
divide population into clusters and then randomly select clusters to sample
multistage cluster sampling
(ex: the population is people from all 50 U.S. states. Divide this population into non-overlapping groups, such as Alabama, Georgia, Florida, etc. Then, randomly select 2 or 3 clusters, and EVERYONE in the cluster will get selected, but not every cluster will be)
list population, select random starting point, and sample each nth element
systematic sampling
What is the formula for systematic sampling?
(n x total population) / # of samples we desire
We will round DOWN this number. (Look over picture of systematic sampling problem in camera roll)
In relation to surveys, ________ questions only has two choices (ex: Yes or No, True or False)
dichotomous
Which of the following describe dichotomous questions?
a. clearly stated
b. easy to answer
c. easy to analyze (can count)
d. limited information
e. all of the above
f. none of the above
e. all of the above
T or F: In relation to surveys, multiple choice questions are usually analyzed with averages.
True (ex: basically 4-5 choices, then take the averages)
What type of questions on a survey are most honest and give complete information, and cannot be readily summarized?
open-ended questions
What are 4 types of surveys?
- Phone survey
- Mail surveys
- Web surveys
- Personal interviews
Match the following descriptions to which type of survey they describe:
a. inexpensive, low response rate
b. cheaper still, same problems as mail surveys
c. inexpensive, low response rates (20-30%), requires multiple mailings
d. more expensive, more control, higher response rates
a. Phone survey
b. Web surveys
c. Mail surveys
d. Personal interviews
Survey Terms…
the entire population of interest
target population (all the elements)
Survey Terms…
list from which the sample will be selected
sample frame (subset of population)
Survey Terms…
the difference between a numerical descriptor of the population and the corresponding descriptor of the sample
Sampling error
(ex: the mean for the population is saying 5, while the mean for the sample is saying 4)
Survey Terms…
when some population elements are excluded from the process of selecting the sample
undercoverage
Survey Terms…
some of the individuals who were supposed to be included in the sample are not
nonresponse
Survey Terms…
when the opinions of those who complete a survey vary dramatically from those who do not
selection bias
Survey Terms…
when data values are recorded incorrectly
errors of observation
Survey Terms…
when either the respondent or interviewer incorrectly marks an answer
recording error
Survey Terms…
when respondents do not tell the truth; also occurs when biased questions are used
response bias
(ex: if teacher asked who’s your favorite professor, the respondents will prob lie an say her)