Module 1 Flashcards
Define the DCOVA framework.
The DCOVA framework consists of the following tasks:
• Define the data that you want to study in order to solve a problem or meet an objective.
• Collect the data from appropriate sources.
• Organize the data collected by developing tables.
• Visualize the data collected by developing charts.
• Analyze the data collected to reach conclusions and present those results
What is data?
Data are “the values associated with a trait or property that help distinguish the occurrences of something.” Data
The set of individual values associated with a variable.
What is a variable?
The trait or property of something that values (data) are associated with Variable
A characteristic of an item or individual.
What is statistics?
Statistics
The methods that help transform data into useful information for decision makers.
What is big data?
data that are being collected in huge volumes and at very fast rates (typically
in real time) and data that arrive in a variety of forms, organized and unorganized. These
attributes of “volume, velocity, and variety,” first identified in 2001 (see reference 5), make big
data different from any of the data sets used in this book.
Operational Definition
a
universally accepted meaning that is clear to all associated with an analysis.This definition
should clearly identify the values of the variable necessary to ensure that collected data are acceptable and appropriate for analysis
Categorical variables /Qualitative
) have values that can
only be placed into categories such as yes and no
Numerical variables (also known as quantitative variables)
have values that represent a counted or measured quantity
Discrete variables
numerical values that arise from a counting process, . “Number of
items purchased” i
Continuous variables
numerical values that arise from a
measuring process. “The time spent waiting on a checkout line” ” is an example of a continuous
numerical variable because its values can represent a measurement with a stopwatch.
Can variables be categorical AND numerical?
Yes. For example, “age” would seem to be an obvious numerical variable, but what if you are
interested in comparing the buying habits of children, young adults, middle-aged persons, and
retirement-age people? In that case, defining “age” as a categorical variable would make better
sense. Depends on operational definition.
Nominal Scale
classifies data into distinct categories in which no ranking
is implied. Examples of a nominal scaled variable are your favorite soft drink, your political
party affiliation, and your gender. Nominal scaling is the weakest form of measurement because you cannot specify any ranking across the various categories.
Ordinal Scale
n ordinal scale classifies values into distinct categories in which ranking is implied.
For example, suppose that GT&M conducted a survey of customers who made a purchase and
asked the question “How do you rate the overall service provided by Good Tunes & More during your most recent purchase?” to which the responses were “excellent,” “very good,” “fair,”
1.2 Measurement Scales for Variables
Learn More
Read the Short Takes for
Chapter 1 to learn more
about nominal and ordinal
scales.
M01_BERE9029_13_SE_C01.indd 43 19/09/14 8:37 AM
44 Chapter 1 Defining and Collecting Data
and “poor.”
interval
scale
is an ordered scale in which the difference between measurements is a
meaningful quantity but does not involve a true zero point. For example, a noontime temperature reading of 67° Fahrenheit is 2 degrees warmer than a noontime reading of 65°.
ratio scale
A ratio scale is an ordered scale in which the difference between the measurements involves a true zero point, as in height, weight, age, or salary measurements. If GT&M conducted a survey and asked how much money you expected to spend on audio equipment in
the next year, the responses to such a question would be an example of a ratio-scaled variable.
A person who expects to spend $1,000 on audio equipment expects to spend twice as much
money as someone who expects to spend $500. As another example, a person who weighs
240 pounds is twice as heavy as someone who weighs 120 pounds.
primary data
source
You are using a primary data
source if you collect your own data for analysis.
secondary data source
a secondary data source if the
data for your analysis have been collected by someone else.
population
consists of all the
items or individuals about which you want to reach conclusions.
sample
portion of a population selected for analysis.
Structured data
data that follows some organizing principle or plan, typically a repeating pattern.
unstructured data
follows no repeating pattern. For example, if five different
persons sent you an email message concerning the stock trades of a specific company, that data
could be anywhere in the message.
outliers
s, values that seem excessively different from most of the rest of the
values.
missing value
is a value that
was not able to be collected (and therefore not available to analysis).
recoded variable
After you have collected data, you may discover that you need to reconsider the categories that
you have defined for a categorical variable or that you need to transform a numerical variable
into a categorical variable by assigning the individual numeric data values to one of several
groups. In either case, you can define a recoded variable that supplements or replaces the
original variable in your analysis.