Chapter 2 - Descriptive Analytics I Flashcards
What are the Characteristics that Define the Readiness Level of Data for an Analytics Study? (1-5)
Data source reliability - appropriateness of the medium where the data was obtained
Data content accuracy - data are correct and a good match for the analytics problem
Data accessibility - data are readily and easily obtainable
Data security and data privacy - only those who have authority and need to access the data can access it
Data richness - all the required data elements
What are the Characteristics that Define the Readiness Level of Data for an Analytics Study? (6-10)
Data consistency - means data are accurately collected and combined/merged
Data currency/data timeliness - means the data should be up to date for a given analytics model
Data granularity - requires that the variables and data values be defined at lowest level of detail for the intended use of the data.
Data validity - the term used to describe a match/mismatch between the actual and expected data values of a given variable.
Data relevancy - means the variables in the data set are all relevant to the study being conducted.
What has to match related to data and its usability?
The data has to match with the task for which it is intended to be used
What does it mean to have data “analytics ready”?
It means the data has been transformed into a flat-file format and is ready for ingestion into predictive algorithms.
What are the three broad sources of data in which business analytics come from?
Unstructured data, categorical structured data, and numerical structured data
How is datum defined?
A singular form of data. A collection of facts usually obtained as the result of experiments, observations, transactions, or experiences.
What is unstructured data?
Data composed of any combination of textual, imagery, voice, and Web content.
What is structured data?
Categorical or numeric data that is used in data mining algorithms.
What are the six elements of the data taxonomy?
Categorical Data
Nominal Data
Ordinal Data
Numeric Data
Interval Data
Ratio Data
What are the two main categories of structured data and the two subcategories below them?
Structured data is either categorical or numerical.
Categorical data is either nominal or ordinal
Numerical data is either interval or ratio
What are the subcategories for unstructured data?
Textual
Multimedia
XML/JSON
What are the four steps of data preprocessing?
Data consolidation
Data cleaning
Data transformation
Data reduction
What is dimensional reduction or variable selection as it relates to data preparation?
The reduction of variables that describe the phenomenon from different perspectives down to a manageable size
What are the main tasks and methods for Data Consolidation?
Tasks: Access and collect the data, select and filter the data
Methods: SQL Queries, domain expertise, software agents
What are the main tasks and methods for Data Cleaning?
Tasks: Handle missing values, ID and reduce noise, Find and eliminate errors
Methods: Fill in missing values, ID outliers and either remove or smooth the them, ID erroneous data