BUSINESS ANALYTICS Flashcards
Data
facts and figures from which conclusions can be drawn
Data set
the data that are collected for a particular study
Elements
people, objects, events, or other entries
Variable
any characteristic of an element
Measurement
a way to assign a value of a variable to the element
Quantitative variable
the possible measurements of the values of a variable are numbers that represent quantities; numeric; mathematical operations are meaningful
Qualitative variable
the possible measurements fall into several categories; categorical; labels or names used to identify an attribute of each element
Cross-sectional data
data collected at the same or approx. the same point in time
Time-series data
data collected over different time periods
Existing sources
data already gathered by public or private sources; EX: internet, library, US gov’t, data collection agency
Experimental and observational studies
data we collect ourselves for a specific purpose
Response variable
variable of interest (Y)
Factor/independent variable
other variables related to response variable (X)
Transactional data
companies hope to use past behavior and other information to predict customer responses
Data warehousing
a process of centralized data management and retrieval; its objective is the creation and maintenance of a central repository for all of an organization’s data
Big data
massive amounts of data; often collected in real time in different forms; sometimes needing quick analysis
Census
an examination of all the population of measurements
Population
a set of all elements about which we wish to draw conclusions; size = N
Sample
a subset of the elements of a population; comes from the population; size = n
Descriptive statistics
the science of describing the important aspects of a set of measurements
Statistical inference
the science of using a sample of measurements to make generalizations about the important aspects of a population of measurements
Sample size
number of elements (n)
Random sample
same chance of being selected
Random selection
sample with replacement; sample without replacement
Finite population
a population of limited size
Infinite population
a population of unlimited size
Non-probability sampling
convenience sampling, voluntary sampling, judgment sampling
Probability sampling
sampling where we know the chance that each element in the population will be included in the sample; required for statistical inference, cluster, systematic, stratified
Convenience sampling
sampling where we select elements because they are convenient to sample; not a probability sample
Voluntary response sampling
samples in which participants self-select; frequently used by radio and television; over represent people with strong opinions
Judgement sampling
samples in which a person who is extremely knowledgeable about the population selects population elements he or she feels are most representative; the quality of the sampling is completely dependent on the researchers’ knowledge
Business analytics
the use of traditional and newly developed statistical methods, advances in IS, and techniques from management science to explore and investigate past performance; descriptive, predictive, and prescriptive analytics
Descriptive analytics
the use of traditional and newer graphics to represent easy-to-understand visual summaries of up-to-the-minute data
Predictive analytics
methods used to find anomalies, patterns, and associations in data sets to predict future outcomes
Prescriptive analytics
looks at variables and constraints, along with predictions from predictive analytics, to recommend courses of action
Data mining
the use of predictive analytics, algorithms, and IS techniques to extract useful knowledge from huge amounts of data
Nominative
a qualitative variable for which there is no meaningful order, or ranking, of categories; EX: gender, car color
Ordinal
a qualitative variable for which there is a meaningful order, or ranking, of the categories; EX: teaching effectiveness
Interval
all the characteristics of ordinal plus measurements are on a numerical scale with an arbitrary zero point; can only meaningfully compare values by the interval between them; EX: temperature
Ratio
all the characteristics of interval plus measurements are on a numerical scale with a meaningful zero point; values can be compared by their intervals and ratios; in business and finance most quantitative variables are ratio variables, such as anything related to money; EX: earnings, profit, loss, age, distance, height
Sampling designs
methods for obtaining a sample
Sample survey
the sample we take
Stratified random sampling
divide population into non-overlapping groups (strata) then select a random sample from each strata; we divide the population into groups called strata (or clusters) and then take a certain number of elements from each stratum
Multistage cluster sampling
divide population into clusters and then randomly select clusters to sample; we divide population into clusters (or groups) and then randomly select some of the clusters
Systematic sampling
list population, select random starting point, sample each n^th element; wee randomly select a starting point and take every n^th piece of data from a listing of the population
Dichotomous questions
clearly stated; easy to answer; easy to analyze; limited information
Types of surveys
phone surveys, mail surveys, web surveys, personal interviews
Phone surveys
inexpensive, low response rate
Mail surveys
inexpensive, low response rates (20-30%), requires multiple mailings
Web surveys
cheaper still, same problem as mail surveys (low response rates and requires multiple surveys)
Personal interviews
more expensive, more control, higher response rates
Frequency distribution
a table that summarizes the number of items in each of several nonoverlapping classes
Relative frequency
summarizes the proportion of items in each class; for each class, divide the frequency of the class by the total number of observations
Formula for relative frequency
frequency of each class / data size (total); multiply by 100 for percent frequency
Bar chart
a vertical or horizontal rectangle represents the frequency for each category; height can be frequency, relative frequency, or percent frequency
Pie chart
a circle divided into slices where the size of each slice represents its relative frequency or percent frequency; Degree of each slice –> Relative Frequency x360 degrees
How to construct a frequency distribution
- find the number of classes
- find the class length
- form nonoverlapping classes of equal width
- tally and count
- graph the histogram
Cumulative distribution
another way to summarize a distribution; use the same number of classes, class lengths, and class boundaries used for frequency distribution; rather than count, we record the number of measurements that are LESS THAN the upper boundary of that class
Ogive
a graph of a cumulative distribution; plot a point above each upper class boundary at a height of the cumulative frequency; connect the points with line segments; can also be drawn using cumulative relative or percent distributions
Stem-and-leaf display
the purpose is to see the overall pattern of the data, by grouping the data into classes; best for small to moderately sized data distributions
How to construct a stem-and-leaf display
- decide what units will be used
- each leaf must be a single digit and stem values will consist of appropriate leading digits
- place the stem values
- enter the leaf values (each leaf should be single digit)
- rearrange the leaves in increasing order
- can split the stems as needed
Leaf units
in general, leaf units can be any power of 10; EX: 0.1, 1, 10, 100, 1000…; if no leaf unit is given for a stem-and-leaf display, we assume its value is 1.0
Original data value formula
(stem and leaf) x leaf unit
Contingency table
classifies data on two dimensions; rows classify according to one dimension; columns classify according to a second dimension