Exam One Flashcards
What is Analytics
Transforms data into insight for making decisions “Informs”
What do data analyst do?
Collect and interpret data
- Analyze results
- Report results back to the relevant members
- Identifies patterns and trends in data sets
- Work alongside teams within the business or the management team to establish business needs
Applications of Business Analytics
- Customer relationship
- Sports game strategies
- Pricing Decision
- Health care
- Human resource planning
- Supply Chain Management
- Financial and Marketing
Importance of Business Analytics
- Profitability of businesses
- Revenue of businesses
- Shareholder return
- Enhances understanding the data
- Vital to remain competitive
- Enables creation of informative reports
Descriptive analytics
- Uses data to understand past and present
- Summarizes data into meaningful charts and reports
- Identify patterns and trends in data
(Pie chart showing sales of product X and Y by region)
Predictive analytics
- Analyzes past performance
- Extrapolating to future
- Predicts risk
(Linear demand Prediction model. As price increases, demand falls line chart)
Prescriptive analytics
- Uses optimization techniques to identify best alternatives
- Often combines with predictive analytics to account for rist
For analysis and Decision making, you need
Metrics to quantify performance
Measures are the values of metrics
Discrete metrics involve counting (on time or not, number of on-time deliveries)
Continuous metrics are measured on a continuum (Delivery time, package weight, purchase price)
Categorical data
Data that helps sort things into groups or types. Doesn’t involve numbers but rather labels or names
Ordinal Data
Involves categories that can be arranged in a specific order or rank. Rating experience at restaurant as “bad”, “okay”, “good”. You know that one is better than the other but not by a certain amount.
Interval data
has order and measurable differences between values and does not have a true zero point. An example is degrees in temp. Interval data has no starting point or true “zero”
Ratio
It has all the features of interval but also has true zero. With ratio you can add, subtract, and use comparisons like “twice as much”.
Good decision making
requires a mixture of skills: creative development and identification of options, clarity of judgment, firmness of decision, effective implementation
Steps to problem solving
- Recognize problem
- Define problem
- Structure the problem
- Analyze the problem (Role of BA)
- Interpreting results and making decisions
- Implement the solution
Recognizing the Problem
Exist when there is a gap between what is happening and what we think should be happening
(Distribution costs being too high)
Defining the problem
Clearly defining the problem
ex. High distribution costs stem from:
- Inefficiencies in routing trucks
- Poor location of distribution centers
- External factors such as increasing fuel costs
Structuring the problem
- Stating the goals and objectives (minimizing the total delivered costs of the product)
- Characterizing the possible decisions (New manuf, New loc for warehouses)
- Identifying any constraints or restrictions (Deliver orders within 48 hrs)
Analyzing the Problem
Identifying and applying appropriate BA techniques
Interpreting results and Making Decision
- Managers interpret results from the analysis phase
- Incorporate subjective judgment as needed
- Understand limitations and model assumptions
- Make a decision utilizing the information
Implementing the solution
- Translate the results of the model back to the real world
- Make solution work in the organization by:
– Providing adequate resources
– Motivating Employees
– Eliminating resistance to change
— Modifying organizational policies
– Developing Trust
Experiment (random)
Process of observation that leads to a single outcome that cannot be predicted with certainty
Sample point
The most basic outcome of a random experiment
Sample Space
Collection of all possible outcomes (Depends on experimenter)
Event
Set of outcomes of a probability experiment
Steps for calculating probability
- Define experiment; describe the process used to make an observation and the type of observation that will be recorded
- List sample points
- Assign probabilities to sample points
- Determine collection of sample points contained in the event of interest
- Sum the sample point’s probabilities to get the event
Union
Outcomes in either events A or B or both
- Denoted by U. AUB
- ‘Or’ Statement
Intersection
Outcomes in both events A and B
- ‘AND’ Statement
Denoted by n AnB
P(A|B)
P(AnB)/P(B)
Data preprocessing
- Transforming raw data into an understandable format
- Helps us to understand and make knowledge discovery of data at the same time
Why is data preprocessing needed?
Real-world data tends to be incomplete, noisy, and inconsistent
- leads to poor-quality data and models built on the data
It provides operations that helps to organize data into a proper form for a better understanding in the data mining process
Examples of poor-quality data
Incomplete - Lacking attribute values, lacking certain attributes of interest or containing only aggregate data
Noisy - Contains too many outliers
Intentional - Disguised missing data
Why preprocess data?
Accuracy
Completeness
Consistency
Timeliness
Believability
Interpretability
Data Cleaning
- Handling missing data
- Outlier detection and removal
- Noise reduction
Data Transformation
- Scaling
- Smoothing
- Aggregation
- Generalization
Data reduction
- Feature selection
- Dimensionality
- Numerosity reduction
Handling Imbalance
- Oversampling
- Under-sampling
Data Integration
Combining tables
Tasks of data cleaning
- Fill in missing values
- Identify outliers
- Smooth out noisy data
- Correct inconsistent data
Handling missing data
Data is not always available
- many tuples have no recorded value for several attributes
Missing data may be due to
- Equipment malfunction
- Inconsistent with other recorded data thus deleted
- Data not entered due to misunderstanding
- Certain data may not be considered important at the time of entry
- Missing data may need to be inferred