Data Driven Decisions Flashcards
Descriptive analytics
Depict and then describe the characteristics of what is being studied
Predictive analytics
Use data from the past to predict the future
Prescriptive analytics
Include experimental design and optimization to suggest a course of action
True or False?
From data mining, someone is able to make conclusions about the underlying causes of certain variables.
False
Correct. This is a false statement. Data mining is often able to find trends, but it will usually overlook the underlying causes.
True or False?
As technology improves, there will be a greater amount of raw data.
True
Correct. This statement is true. Data collection will become easier as technology improves which will lead to an increase in raw data.
Davenport-Kim three-stage model
A decision-making model developed by Thomas Davenport and Jinho Kim that consists of three stages: framing the problem, solving the problem, and communicating results
Stage 1: Problem recognition consists of the following steps:
Identifying stakeholders
Focusing on decisions
Identifying the kind of story you’re going to tell
Determining the scope of the problem
Getting specific about what you’re trying to find out
Stage 2: Solving the problem
The modeling step
The data collection step
The data analysis step
True or False?
The first step in the Davenport-Kim three-stage model is to frame the problem by recognizing what the problem is and then reviewing previous findings to begin to structure the analysis.
True
Correct. This statement is true. Stage #1 is to frame the problem by recognizing what the problem is and then reviewing previous findings to begin to structure the analysis. Stage #2 is to solve the problem. Stage #3 is the communicate your findings.
True or False?
The stage that involves the most intense statistics and data work is stage 3, communicating results.
False
Correct. This statement is false. The stage that involves the most intense statistics and data work is stage 2, solving the problem. This step includes data modeling, data collection, and data analysis.
Continuous data
Data that can lay along any point in a range of data
Discrete data
can only take on whole values and has clear boundaries.
Nominal data
sometimes called categorical data, is used to label subjects in a study. Nominal data is a type of discrete data.
Ex: The choice of crayon color: burnt sienna, prussian blue, periwinkle, apricot
Ex: Type of tape: masking, packing, Scotch, electric
Ordinal data
is a type of discrete data. It places data objects into an order according to some quality. So, the higher a data object on the scale, the more it has of a certain quality.
Ex: small, medium, and large paperclips
Ex: Level of education: some HS, HS degree/GED, some college, Bachelor’s, Masters
Interval data
Data that is ordered within a range and with each data point being an equal interval apart
Ex: Daily temperature (in Fahrenheit or Celsius)
Ex: The number that signifies the year: 2000, 1987, etc.
Ratio data
Similar to interval data in that the data that is ordered within a range and with each data point being an equal interval apart, also has a natural zero point which indicates none of the given quality.
Ex: Heights of people in your family
Ex: The time it takes the Space Shuttle to orbit once around the earth
True or False?
The following are examples of nominal data:
male/female
red/blue
living/deceased
True
Correct. This statement is true. Nominal data, sometimes called categorical data, places objects into a category.
True or False?
Interval data has an order and all the objects are an equal interval apart.
True
Correct. This statement is true. Interval data has an order and all the objects are an equal interval apart. You cannot have a natural zero point in interval data.
Data Management
The management, including cleaning and storage, of collected data.
Analytics
The discovery, analysis, and communication of meaningful patterns in data.
Big Data
A catch-phrase that describes a massive volume of data that is so large that it’s difficult to process using traditional database and software techniques.
Blind Study
A study performed where the participants are not told if they are in the treatment group or control group
Omission Error
An error because something (for example, data or survey response) is missing.
Reliable Data
Data that is consistent and repeatable
Benchmarks
Standards or points of reference for an industry or sector that can be used for comparison and evaluation.
Valid Data
Data resulting from a test that accurately measures what it is intended to measure
Data Set
A collection of related data records on a storage device.
Systematic Errors
Errors in measurement that are constant within a data set, sometimes caused by faulty equipment or bias
Relational Database
A database structured to recognize relations among stored items of information.
Statistics
The science that deals with the interpretation of numerical facts or data through theories of probability. Also, the numerical facts or data themselves.
Information Bias
A prejudice in the data that results when either the respondent or the interviewer has an agenda and is not presenting impartial questions or responding with truly honest responses, respectively
Random Errors
Errors in measurement caused by unpredictable statistical fluctuations
Measurement Bias
A prejudice in the data that results when the sample is not representative of the population being tested
Double-Blind Study
A study performed where neither the treatment allocator nor the participant knows which group the participant is in
Triple-Blind Study
A study performed where neither the treatment allocator nor the participant nor the response gatherer knows which group the participant is in
If you were to take your temperature 10 times in a row using the same thermometer and got the same result every time, you could say that the thermometer is __________.
a) valid
b) reliable
c) accurate
d) measurable
b) reliable
Feedback: The correct answer is B. A test is reliable if it is consistent and repeatable.
According to the 2000 census the average number of people in a family in the U.S. was 3.17. Since it isn’t possible to have .17 of a person, you would use a __________ data point to describe the number of people in your family.
a) continuous
b) discrete
c) valid
d) ordinal
b) discrete
Feedback: The correct answer is B. You would use a discrete number such as one, three, or five to describe the number of people in your family.
You survey 100 New Yorkers about their preference for New York-style or Chicago-style pizza. What would be wrong with this?
a) You would encounter information bias.
b) You would encounter gender bias.
c) You would encounter random error.
d) You would encounter measurement bias.
d) You would encounter measurement bias.
Feedback: The correct answer is D. Asking 100 New Yorkers about their preferences would most likely result in measurement bias. The same would occur if you were to ask the question of 100 Chicagoans.
Rankings are an example of which kind of data?
a) nominal
b) continuous
c) ordinal
d) discrete
c) ordinal
Feedback: The correct answer is C. Ordinal numbers place subjects in order according to some quality. So, if you came in first, second, or third in a race, this would be an example of ordinal data.
The science of using mathematical procedures to describe data is __________.
a) statistics
b) mathematics
c) descriptive data
d) analytics
a) statistics
Feedback: The correct answer is A. Statistics uses mathematical procedures to describe data. Analytics makes use of statistical analysis.
The third stage of Davenport and Kim’s Three-Stage Model of quantitative decision making is which of the following?
a) solving the problem
b) framing the problem
c) communicating results
d) None of the above
c) communicating results
Feedback: The correct answer is C. The third stage in Davenport and Kim’s Three-Stage model is communicating results.
Cleaning and organizing collected raw data refers to which of the following?
a) data collection
b) data management
c) data discovery
d) rectangular data
b) data management
Feedback: The correct answer is B. Cleaning and organizing raw data is known as data management. The result is sometimes a rectangular data file.
Suppose you wanted to determine the ratio of cyclists to drivers in cities with higher versus lower air quality. What kind of study might you use?
a) observational study
b) experimental study
c) double-blind study
d) triple-blind study
a) observational study
Feedback: The correct answer is A. Because you cannot control for all variables, you would not be able to use an experimental study or blind studies.
Suppose you were to use analytics in an experiment to determine how many salespeople to assign to particular sales territories based on the makeup and performance of the territories in the results of the experiment. You would be using which kind of analytics?
a) predictive
b) prescriptive
c) descriptive
d) proactive
b) prescriptive
Feedback: The correct answer is B. Prescriptive analytics determines a course of action
Suppose you employed analytics to determine which sales territories had shown the most profitable growth in the last four quarters and would most likely do so again in the future. You would be using which kind of analytics?
a) predictive
b) prescriptive
c) descriptive
d) proactive
a) predictive
Feedback: The correct answer is A. Using past information to make decisions about the future is called predictive analytics.
Of the following, which is considered the most serious kind of data error?
a) poorly formatted data
b) number transportation
c) out-of-range data
d) missing data
d) missing data
Feedback: The correct answer is D. Missing data can severely compromise the results of your study.
If you designed a drug trial in which the subject, the data gatherer, and the treatment allocator did not know who was in the control group, then you created a __________ study.
a) blind
b) biased
c) double-blind
d) triple-blind
d) triple-blind
Feedback: The correct answer is D. A study where all parties do not know who is in the control group and who is in the treatment group is a triple-blind study. If the treatment allocator and data gatherer are the same person, this would be a double-blind study.
Suppose you were making a simplified representation of a complex problem in order to solve it, which stage of the Three Stage Model would you be in?
a) framing the problem
b) data collection
c) solving the problem
d) communicating results
c) solving the problem
Feedback: The correct answer is C. The modeling step is part of the solving the problem stage.
Assume you are measuring the various returns on investment, over the past year, for four different stocks in your portfolio. You find the following values (each as a percent of your investment): 4.68, 5.65, 3.78, -0.46, 6.91. What kind of data are these data points?
a) continuous data
b) nominal data
c) discrete data
d) ordinal data
a) continuous data
Feedback: The correct answer is A. In a set of continuous data, a point can lay along any point in a range of data.
If you were to take your temperature 10 times in a row using the same thermometer and get the following results (in degrees Fahrenheit), what could you assume about the thermometer? 34, 99, 108, 45, 66, 21, 78, 53, 94, 102
a) It is reliable but not valid.
b) It is valid but not reliable.
c) It is neither reliable nor valid.
d) It is both reliable and valid.
c) It is neither reliable nor valid.
Feedback: The correct answer is C. Because the average temperature for human beings is 98.6 degrees Fahrenheit, you can assume the results are not valid. You can also assume they are unreliable, because of the wildly varying results.
For companies to attract and retain their best customers they need a complete portrait of who they are. To develop this portrait companies turn to… A. Statistics B. Analytics C. Management Science D. Histograms
B. Analytics
A manufacturer wants to maximize their factory output while specifically minimizing labor costs. What type of analytics might they employ to achieve this goal? A. Descriptive Analytics B. Predictive Analytics C. Prescriptive Analytics D. Diagnostic Analytics
C. Prescriptive Analytics
What type of data error that occurs in measurement is constant within a data set and is sometimes caused by faulty equipment or bias? A. Random B. Omission C. Outlier D. Systematic
D. Systematic
An Educator develops a new standardized test to measure math skills of ninth graders. She has students in her home state of Ohio take the test. If the test is to be used on a national level, what type of error might be found in her data? A. Omission Error B. Systematic Error C. Measurement Bias D. Information Bias
C. Measurement Bias
A city government is trying to determine the national origins of its recent immigrant population. If a survey of the immigrant population is conducted in English what type of error might be present in the data? A. Random B. Omission C. Outlier D. Accuracy
B. Omission
The use of Big Data is increasingly important to businesses in competitive markets. Which of the following characteristics is not true of big data?
A. Requires the use of analytics
B. Contains structured data
C. Contains unstructured data
D. Can be analyzed with traditional spreadsheets
D. Can be analyzed with traditional spreadsheets
The Davenport-Kim three-stage model consists of framing the problem, solving the problem, and communicating results. Which two of the following are part of framing the problem stage? A. Determine the scope of the problem B. Data collection C. Review of previous findings D. Presenting a recommendation
A. Determine the scope of the problem
C. Review of previous findings
A healthcare provider is researching blood glucose levels before and after exercising. What two elements should be part of any experimental study such as this? A. Treatment procedures B. Patient observation C. Statistical validity D. Experimental response
A. Treatment procedures
D. Experimental response
Runners cover 26.2 miles in the Olympics marathon. What level of measurement is this? A. Nominal B. Ordinal C. Interval D. Ratio
D. Ratio
What level of measurement is the type of cars produced in Ford factory? A. Nominal B. Ordinal C. Interval D. Ratio
A. Nominal
What level of measurement is this the 10 best cities in the U.S. to retire in? A. Nominal B. Ordinal C. Interval D. Ratio
B. Ordinal
What level of measurement are women’s dress sizes (2,4,6, etc.)? A. Nominal B. Ordinal C. Interval D. Ratio
C. Interval
A local school board is studying the impact of a proposed change in testing on math scores. Bias can be introduced into the study by both students and teachers. Which research technique would eliminate this type of bias? A. Observation study B. Blind study C. Cohort study D. Double blind study
D. Double blind study
A Company’s product development team test 3 new car waxes by waxing 5 cars with each wax and then running them through a car wash. They then record number of washes it takes before the wax begins to deteriorate. What is the term for the five cars? A. The response B. The construct validity C. The experimental unit D. The treatment
C. The experimental unit