Week 2 Flashcards
Data Science
the exploration and quantitative analysis of all available structured and unstructured data to develop understanding, extract knowledge and formulate actionable results
Business Intelligence
strategies and technologies used by enterprises for the data analysis of business information.
CRISP-DM
provides useful input on ways to frame analytics problems and is popular approach for data mining. Six steps include: business understanding, data understanding, data preparation, modeling, evaluation and deployment.
Framing a Decision
outline what decision is being considered, why it is important, what data is need, who will provide input. Business Understanding, Data Understanding and Data Preparation of CRISP-DM.
Analyzing a Decision
what kind of analytical approach is needed, what. does it show, what does it mean. Modeling in CRISP-DM.
Implementing a Decision
how do I make use of the decision, what can I expect, what else should be considered, how do I “sell” the result. Evaluation and Deployment in CRISP-DM.
Data Modeling Blocks
- Data, 2. Build Model, 3. Inter hidden variables, 4. Predict & Explore
Interpretation Error and Inconsistencies
Taking the value in your data for granted and difference between data sources and company’s standardized values.
Cleansing Data
Interpretation and Inconsistencies. Data Entry Errors, Redundant Whitespace, Fixing Capital Letter Mismatching, Outliers, Dealing with Missing Values, Different Units of Measurement, Different Level of Aggregation, Deviation for a Cook Book, Impossible values and Sanity Checks.
Integrating Data
Combining data from different data sources. Joining/Appending Data, Appending Tables, Using Views to Simulate Data Joins and Appends, Enriching Aggregated Measures.
Transforming Data
making data into a certain shape for models. Reducing the number of variables, turning variables into dummy variables.
Data Retrieval
data stored within the company, data outside organization and data quality checks.
Data Preparation
fix problems in the data; create derived variables.
Exploratory Data Analysis
the use of graphical techniques to gain an understanding of your data and the interactions between variables.
Joining
enriching an observation from one table with information from another.