Final Flashcards
Business Intelligence
- The umbrella term that includes the application, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.
- The use of data visualization and reporting for becoming aware and understanding “what happened and what is happening”
- Done by charts, tables, and dashboards to display, examine and explore data
- Process of raw data to interpreting information
- The process of going from raw data to intelligent information
Business Analytics
- Extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact based management to drive decisions and actions
- Practice and art of bringing quantitative data to bear on decision making
- Subset of business intelligence
- Relies on a number of different disciplines to collect and analyze data
- Ex: upsailing to customers
Data Mining
- Business analytics methods that go beyond counts, descriptive techniques, reporting, and methods based on business rules.
- Extracts useful info from large data sets (finding gold)
- Process of exploration and analysis of large quantities of data in order to discover meaningful patterns and rules
- Employs pattern recognition technologies as well as statistical and mathematical techniques
- Ex: evaluating which customers are going to switch- customer retention (phone provider)
o Subset of business intelligence
o Intersection of IT and statistics
Unsupervised Learning
- Search for patterns and structure among all variables
- No predefined outcome groups (no dependent or outcome variables)
- Define groups of cases with similar characteristics
- Find out the structure of the data
- Ex: average characteristics of measures of data in groups or clusters
- Find out classifier
- Ex: cluster analysis
Supervised Learning
- Have a target variable
- Example is regression analysis
- Predefined outcome groups or variable (know dependent variable)
- Decide to which class each case belongs (by calculating membership score of each case)
- Find out major characteristics that differentiate predefined groups
Data Mining Techniques
- Prediction
- Classification
- Association
Prediction
- Dependent (response variable is a continuous variable
- Formula or model to predict future observations
- Ex: multiple regression and decision trees
- Ex: predicting amount of time to sort when using gloves
Classification
- Dependent variable is a categorical variable
- Identify categories of data (buy vs. not buy)
- Ex: logistic regression, decision trees, cluster analysis
- Ex: will someone buy or not buy products
Association
- Relationship among entities
- Ex: if you bought cornflakes did you also buy bananas
- Ex: market basket analysis (not in class)
Business Intelligence
- Umbrella term that spans people, process and tools
- Organize data/information, enable access to it, analyze it, improves decision and manage performance
Business Analytics
- Process of “doing” analysis in a particular domain
- Uses analytical techniques (data mining)
Data Mining
Process of discovering new patterns from large data sets involving artificial intelligence, statistics, and database systems
CRIP-DM
- Cross- Industry Standard Process for Data mining
- Fits data mining into the general problem-solving strategy of business/research unit
CRIP-DM Phases (stages of Data Mining Process)
- ) Business Understanding
- ) Data Understanding
- ) Data Preparation
- ) Modeling
- ) Evaluation
- ) Deployment
Business Understanding
- Demonstrate business objectives (why study- specific problem, knowledge discovery- increase sales of new shirt)
- Assess situation (set up a concise and clear discription of the problem)
- Determine data mining goals (achieve in technical terms and what is success criteria)
- Product project plan (establish a budget)
Data Understanding
- Collect initial data
- Describe data
- Explore data
- Verify data quality
Data Preparation
- Select data
- Clean data (outliers, transform)
- Construct data
- Integrate data
- Format data
Modeling
- Select modeling technique
- Generate test design
- Build model
- Assess model
Modeling Techniques
- Classification
- Clustering
- Predictions
- Sequential patterns
Classification
Map each item of data into one of set of classes
Clustering
Grouping data- no predefined classes
Predictions
Predict a value of variable- regression analysis
Sequential Patterns
Analyzing time series data- find out a seasonal pattern
Evaluation
- Evaluate the result (interpret the results and are busines objectives met)
- Review process
- Determine next steps
Deployment
The knowledge needs to get reported to managers so they can reflect, tie it to business processes and enhance performance or solve issues.
Types of Data
- Qualitative (continuous)
- Quantative
Qualitative Types of Data
Nominal and ordinal
Nominal
- Categorically discrete data (name of school, type of car, number assigned to country)
- Nominal sounds like name
- Gender, political party
Ordinal Data
Quantities that have natural ordering (class ranks, order in place in line)
Sounds like “order”
Quantitative Types of Data
Interval and Ratio
Interval Ratio
Like ordinal except the intervals between each value are equal (temperature)
Ratio
Interval data with a natural zero point or a well adjusted scale
(time, weight, height, age)
Data Quality Characteristics
- Accuracy
- Completeness
- Consistency
- Uniqueness (each only represented once)
- Timeleiness
Types of Visualization Charts
- Frequency tables
- Bar chart, line graph, scatterplot
- Distribution plots
- Histograms
- Stem and leaf
- Box plots
- Pareto chart
- Maps
- Cross tabulations
Bar chart, line graph, scatterplot
Use categorical data
Histograms
- Show shape of distribution
- Use continuous data
Stem and Leaf
- Used to visualize the data (not used if large data set)
- More meaningful than histogram (can still see the actual numbers)
Box plots
- Helpful to give more details about data set than you would get just from a histogram
- Whiskers
- Can summarize nominal data
- Used to identify outliers
Central Tendency
- Measure represents the center or middle of the data
- May or may not be a typical value
Measures of Central Tendency
Mean, median, mode
Relationship with Mean Median and Mode in Normal Curve
All are in the middle
Empirical Rule
- 68% within 1 standard deviation
- 95% within 2 standard deviations
- 99.7% within 3 standard deviations