The Different Data Science Fields Flashcards
1.2 Analysis vs analytics
.
Analysis
Analysis – Dividing data into digestible components that are easier to understand
and examining how different parts relate to each other. Performed on past data,
explaining why the story ended in the way that it did. We want to explain ‘how’ and
‘why’ something happened
Analytics
– Explores the future. The application of logical and computational
reasoning to the component parts obtained in an analysis. In doing this, you are
looking for patterns and exploring what you can do with them in the future.
Qualitative Analytics
– using intuition and experience in conjunction with analysis to plan your next business move
Qualitative Analytics
- Using intuition and knowledge of the market for the future planning process
- Turns into Business Analytics`
1.3 Intro to Business Analytics, Data Analytics, and Data Science [placeholder]
.
Data Science
- A discipline reliant on data availability, while business analytics does not completely rely on data
- Uses the parts of data analytics that use complex mathematical, statistical, and programming tools
- Data Science can be used to improve the accuracy of analytics (predictions) based on data extracted from activities that are used for drilling efficiency
- Data Science uses data from efficient drilling activities to improve the accuracy of analytics (predictions) (Optimization of Drilling Operations)
- How tools from machine learning can help us improve the accuracy of our estimations?
Data Analytics
Ex: Digital Signal Processing
1.4 Adding Business Intelligence (BI), Machine Learning, and Artificial
Intelligence (AI) [placeholder]
.
Business Intelligence (BI)
- The process of analyzing and reporting historical business data
- After reports and dashboards have been prepared, they can be used to make informed strategic business decisions by end-users such as a general manager
- Does not work with unstructured data
- Aims to explain past events using business data
- The preliminary step of predictive analytics
- Analyze past data and extract useful insights
- Create appropriate models/dashboards
Machine Learning (Subfield of AI)
The ability of machines to predict outcomes without being explicitly programmed
- Creating and implementing algorithms that let machines receive data and use this data to:
- make predictions
- analyze patterns
- give recommendations on their own
- Can hold data from third party companies, detect new patterns from their data and suggest real-time recommendations and insights to managers and other decision
- Helps develop models that predict what a client’s next purchase would be
- Fraud Protection
Artificial Intelligence (AI)
Simulating human knowledge and decision making with computers
Ex: Symbolic Reasoning - High-level human-readable representations of problems and logic (This is extinct though)
However machine learning is the only form of general AI that is being applied and practiced
Advanced Analytics
All form of analytics
1.5 An Overview of the 365 Data Science Infographic [placeholder]
.`
From a data scientist’s perspective, the solution to every task comes with having
A proper dataset
The 5 the processes of solving a business task:
1) Working with traditional data
2) Working with big data
3) Doing business intelligence
4) Applying traditional data science techniques
5) Using ML techniques
7 Important Questions for Data Science
- When is this part of the process applied?
- Why do we need it?
- What are the techniques?
- Where and in which real-life cases can it be applied?
- How is it implemented? Using what tools?
- Who is doing this
- The relationship between different data science fields [placeholder]
.
Data
- can be defined as information stored in a digital format, which can then be
used as a base for performing analysis and decision making. We can distinguish
between two types of data:
Traditional data
- Data in the form of tables containing numeric or text
values; Data that is structured and stored in databases
Unstructured Data
- text, images, video, and audio
Big data
- Extremely large data; Humongous in terms of volume. It can be in various formats:
- structured
- semi-structured
- unstructured
How is big data characterized?
- Under different frameworks we
may have 3,5,7, and even 11 Vs of big data; The main ones are volume, variety,
velocity
The 365 Data Science infographic divides data science in 3 segments:
- business intelligence (analyze the past that you acquired), traditional methods, and machine
learning (forecast future performance).
Business Intelligence
- Includes all technology-driven tools involved in the process of analyzing, understanding, and reporting available past data. It allows you to make decisions, extract insights, and extract ideas
Traditional methods
- A set of methods that are derived mainly from statistics and are adapted for business.
Machine learning
- Is all about creating algorithms that let machines receive data, perform calculations, and apply statistical analysis to make predictions with unprecedented accuracy.
- What is the purpose of each data science field
.
Difference between traditional methods and machine learning
Traditional methods relate to traditional data. They were designed prior to the existence of big data, where the technology simply wasn’t as advanced as it is today. They involve applying statistical approaches to create predictive models.
- Common data science techniques
.
4.1 Traditional data: Techniques {info dump}
.
Raw data
- Also called ‘primary data’ is data that cannot be analyzed straight away. It is untouched data you have accumulated and stored on the server. The gathering of raw data is referred to as data collection
Data Preprocessing
- Needs to be performed on raw data to obtain meaningful information. This is a group of operations that will basically convert your raw data into a more understandable format
Class labelling
- Labeling the data point to the correct data type (or arranging data by category).