Midterm Exam Flashcards
3 Primary Methods of Business Analytics
Descriptive Analytics
Predictive Analytics
Prescriptive Analytics
Descriptive Analytics
The interpretation of
historical data to identify trends and patterns
Descriptive analytics, also referred to as exploratory data analysis (EDA), explains the patterns hidden in the data.
These patterns can be:
– The number of market segments
– sales numbers based on regions
– groups of products based on reviews
– software bug patterns in a defect database
– behavioral patterns in an online gaming user database, and more.
These patterns are purely based on historical data and
use basic statistics and data visualization techniques.
Predictive Analytics
The use of statistics to
forecast future outcomes
Prescriptive Analytics
The application of testing
and other techniques to determine which outcome will yield the best result in a given
scenario
The Big Idea
- “The Unicorn”
The Big Idea = The Most Interesting idea! Something the Client MUST continue looking into! - A single sentence
- It must articulate your unique point of view
- Must covey what is at stake
- Must be a complete sentence
3-Minute Story
If you only had 3 minutes to tell your audience EXACTLY what they need to know, what would you say?
Being able to do this removes you from the dependence of your slides or visuals for a
presentation
– What if the boss asks you what you are working on?
– What if you 30 minute presentation gets cut to 10
minutes?
Population Data
In simple terms, population means the complete set of data. This also means that all the possible values are taken into consideration. When we consider the entire possible set of values, we say that we are considering the “population.”
* The following are examples of population:
– the population of all the employees of the entire information technology (IT) industry
– population of all the employees of a company
– population of all the transaction data of an application
– population of all the people in a country (census)
– population of all the people in a state
– population of all the Internet users
– population of all the users of e-commerce sites.
* The list of examples is unlimited.
Sample Data
In simple terms, sample means a section or subset of the population selected for analysis. Examples of samples are the following:
– randomly selected 100, 000 employees from the entire IT industry
– randomly selected 1, 000 employees of a company
– randomly selected 1,000,000 transactions of an application
– randomly selected 10,000,000 Internet users
– randomly selected 5,000 users each from each ecommerce site, and so on.
* Sample can also be selected using stratification (i.e., based on some rules of interest). For example:
– all the employees of the IT industry whose income is greater than $100,000
– all the employees of a company whose salary is greater than $50,000
– the top 100,000 transactions by amount per transaction (e.g., minimum $1,000 per
transaction
– all Internet users who spend more than two hours per day, etc.).
Qualitative or Categorical Data
Qualitative data is not numerical. Data is collected through observations,
conversations, surveys, discussion or may just be demographic data
-type of car
-favorite color
-favorite food
Nominal Data
The order of the data is arbitrary, or no order is associated with the data. For
example, eye color: Blue, Brown, Green, and so forth; no order is associated with the
data.
Bar and Pie charts are most commonly used.
-Status of an application (pending, not pending)
-Gender (Male, Female)
Ordinal Data
This data is in a particular defined order. Examples include Olympic medals, such as
Gold, Silver, and Bronze, and Likert scale surveys, such as disagree, agree, strongly agree. With ordinal data, you cannot state, with certainty, whether the intervals
between values are equal.
The ordinal data only shows the sequences and cannot use for statistical analysis.
Compared to nominal data, ordinal data have some kind of order that is not present in nominal data
Values of Education level - none, primary education, secondary education, higher education
Satisfaction with a product - unsatisfied, satisfied, very satisfied
Quantitative Data
- Quantitative data is numeric. Additionally, quantitative data
can be divided into categories of discrete or continuous data - Quantitative data is often referred to as measurable data.
This type of data allows statisticians to perform various
arithmetic operations, such as addition and multiplication,
and to find population parameters, such as mean or variance. The observations represent counts or
measurements, and thus all values are numerical. Each
observation represents a characteristic of the individual data
points in a population or a sample. - Quantitative data can be used for statistical manipulation.
These data can be represented on a wide variety of
graphs and charts, such as bar graphs, histograms,
scatter plots, boxplots, pie charts, line graphs, etc.
Discrete Data
A variable can take a specific value that is separate and distinct. Each value is not
related to any other value. Some examples of discrete data types include the number of cars per family, the number of times a person drinks water during a day, or the
number of defective products on a production line.
* The discrete data are countable and have finite values; their subdivision is not possible. These data are represented mainly by a bar graph, number line, or frequency table.
-shoe sizes
-number of semesters completed
Continuous Data
A variable can take numeric values within a specific range or interval. Continuous data
can take any possible value that the observations in a set can take. For example, with
temperature readings, each reading can take on any real number value on a thermometer
* The key difference between discrete and continuous data is that discrete data contains the integer or whole number. Still, continuous data stores the fractional numbers to record
different types of data such as temperature, height, width, time, speed, etc. Bar, Line and
Histograms are often use for Continuous data.
-Time it takes to travel to work
-Distance between two planets
First step of charting data
Start with the function (the trend, pattern, or vital piece of information you’re trying to impart at a glance), then consider the user (how they navigate and interact with the data), and only then do we reach the final step: making it as clean and beautiful
as possible
Trend
A trend is usually the result of long-term factors such as population increases or decreases, shifting demographic characteristics of the population, improving technology, changes in the competitive landscape, and/or changes in consumer
preferences.
* A trend shows the general direction in which something is changing.
* Uptrends are marked by rising data points, such as higher swing highs and
higher swing lows.
* Downtrends are marked by falling data points, such as lower swing lows and
lower swing highs.
Pattern
A business pattern is a set of recurring and/or related elements (business
activities, events, weak or strong signals) that indicates a business opportunity or threat.
* A pattern is a repeated occurrence or sequence.
– Center, Spread, Shape and Unusual features
– Symmetric, Bell-Shaped and Skewed
* We often collect data so that we can find patterns in the data, like numbers
trending upwards or correlations between two sets of numbers.
* Depending on the data and the patterns, sometimes we can see that pattern
in a simple tabular presentation of the data. Other times, it helps to visualize
the data in a chart, like a time series, line graph, or scatter plot.
Pattern Options
Symmetric, unimodal
Skewed right
Skewed left
Symmetric, bimodal
Non-symmetric, bimodal
Uniform
Symmetric, unimodal - bell curve
Skewed right - tail to the right
Skewed left - tail to the left
Symmetric, bimodal - two equal peaks
Non-symmetric, bimodal - two, unequal peaks
Uniform - flat uniform amount
Leadership topics
Communication
Effectiveness
Remember their good days
Influence
Priorities
Process & Navigation
Passion
Momentum
Motivation
Vision
Why business?
Pattern Types:
Gaps
Outlier
Gaps - no data in between periods of data
Outlier - a piece of data that is far away from the rest
Relationships in Data
- A relationship shows connections or associations between concepts or ideas.
- The relationship between two data
columns shows you what learning about
one variable tells you about the other. - Most commonly used Chart is a
Scatterplot and works best with
quantitative data.
SWOT Analysis
Strengths, Weaknesses, Opportunities & Threats
A SWOT analysis is designed to facilitate a realistic, fact-based, data-driven look
at the strengths and weaknesses of an organization, initiatives, or within its
industry. The organization needs to keep the analysis accurate by avoiding pre-
conceived beliefs or gray areas and instead focusing on real-life contexts.
Companies should use it as a guide and not necessarily as a prescription.
* A SWOT analysis pulls internal information (strengths of weaknesses of the
specific company) as well as external forces that may have uncontrollable
impacts to decisions (opportunities and threats).
* SWOT analysis works best when diverse groups or voices within an
organization are free to provide realistic data points rather than prescribed
messaging.
* Findings of a SWOT analysis are often synthesized to support a single
objective or decision that a company is facing.
The Analytics Process
- Identify business problem: preprocessing
- Identify data sources: preprocessing
- Select the data: preprocessing
- Clean the data: preprocessing
- Transform the data: preprocessing
- Analyze the data: analytics
- Interpret, evaluate, and deploy the model: post-processing
First steps in the analytics process
- Initial Contact with the Client
- Business Request
- Convert to a Business Problem
- Frame the Problem
– Objectives/Goals
– Data
– Models
Initial contact with the client
- This can be assigned by your firm
- It can be acquired through marketing depending on the business
- The client comes to you with the problem
- All of the above and more…
- Meetings take place to define the problem
Business Request is Created
- This is a letter of intent written by you/your firm to the client
- It will include the following:
– Sender’s name and contact details, unless shown on a letterhead
– Date
– The recipient’s name and contact details
– Greeting
– Purpose of the letter
– Body of the letter
– Professional closing
– Signature
– The sender’s name printed
Convert to a business problem
- Understand the problem
– Break it down into smaller chunks
– Prioritize - Develop Objectives
- Gather resources
– Create your Team
– Online resources, previous projects that are similar - Gather data
- Establish a plan (MVP) and set a time
MVP
Minimum viable plan - the minimum required to satisfy the assignment
Identify business problem
Identify the business problem
* What is the Business Problem?
* What are the Objectives?
* What are the Requirements of the
Stakeholders?
* What are the Requirements of the Business?
– Start with the Data to understand what patterns you
see in the data and what knowledge you can decipher
from the data
– Gather Resources, Establish the MVP – then we get
Clean the data
Data Cleansing
Data cleansing, also referred to as data cleaning or
data scrubbing, is the process of fixing incorrect,
incomplete, duplicate or otherwise erroneous
data in a data set. It involves identifying data errors
and then changing, updating or removing data to
correct them.
The power of clean data
A decision is only as good as the data that informs it. And with massive amounts of data streaming in from multiple sources, a data cleansing tool is more important than ever for ensuring accuracy of information, process efficiency, and driving your company’s
competitive edge. Some of the primary benefits of data scrubbing include:
* Improved Decision Making — Data quality is critical because it directly affects your
company’s ability to make sound decisions and calculate effective strategies.
* Boosted Efficiency — Utilizing clean data isn’t just beneficial for your company’s external needs — it can also improve in-house efficiency and productivity. When information is
cleaned properly, it reveals valuable insights into internal needs and processes.
* Competitive Edge — The better a company meets its customers needs, the faster it will
rise above its competitors. A data cleansing tool helps provide reliable, complete insights
so that you can identify evolving customer needs and stay on top of emerging trends.
Data cleansing can produce faster response rates, generate quality leads, and improve
the customer experience.leade
Data transformation
- Data transformation is the process of converting, and structuring
data into a usable format that can be analyzed to support decision making processes, and to propel the growth of an organization. - Data transformation is used when data needs to be converted to
match that of the destination system or make the data more usable. - Organizations today mostly use cloud-based data warehouses
because they can scale their computing and storage resources in
seconds. Cloud based organizations use a transformation process that converts the data as the raw data is uploaded, a process called
extract, load, and transform. The process of data transformation can
be handled manually, automated or a combination of both.
Example when transformation would be required
- After a preliminary analysis of data, sometimes you may
realize that the raw data you have may not provide good
results or doesn’t seem to make any sense. - For example, data may be skewed, data may not be
normally distributed, or measurement scales may be
different for different variables. In such cases, data may
require transformation.
Benefits of Data transformation
- Data is transformed to make it better-organized. Transformed data may
be easier for both humans and computers to use. - Properly formatted and validated data improves data quality and
protects applications from potential landmines such as null values,
unexpected duplicates, incorrect indexing, and incompatible formats. - Data transformation facilitates compatibility between applications,
systems, and types of data. Data used for multiple purposes may need
to be transformed in different ways.
Data Model
- Data models are visual representations of an enterprise’s data
elements and the connections between them. By helping to define
and structure data in the context of relevant business processes,
models support the development of effective information systems.
They enable business and technical resources to collaboratively
decide how data will be stored, accessed, shared, updated and
leveraged across an organization.
Data Model terms
Lookup Table or Dimension tables
Primary Keys
Data Table
Normalization
Lookup or Dimension Table
a table where you’ll have a field with a column of unique
values for each row/record. Lookup tables generally
have Primary Keys which is a field that uniquely identify
each row of a table.
Data or fact Table
a table that
contains numbers or values, typically at the most
granular level possible. Data tables will contain the
Foreign Keys columns that can be used to connect to
each lookup table. Foreign keys will not be unique.
Normalization
the process of organizing the
tables and columns in a relational database to reduce redundancies and preserve data
integrity.
* Joining tables based on like fields
– Creating Index columns
– Power BI = Drag and Drop
– SQL = Inner Join code
Matrix visualization
- The matrix visual is similar to a table. A table supports two
dimensions and the data is flat, meaning duplicate values
are displayed and not aggregated. A matrix makes it easier
to display data meaningfully across multiple dimensions – it supports a stepped layout. The matrix automatically
aggregates the data and enables you to drill down.
Calculated Measures
- Measures in Power BI are asummarization of
any data. So, it is important to have a summary
of any data or representation of data. Although in
Power BI, we have tools to create our measures
based on the data itself. We also have the option
to name the measures in the way we want.
Calculated Measures in Power BI
- Power BI measures are the way of defining calculations in a DAX model, which helps us to
calculate values based on each row. - But rather, it gives us aggregate values from multiple rows from a table.
- Creating Power BI measures is often called “calculated measures,” which use DAX expressions to calculate new values from the existing table.