The Different Data Science Fields Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

1.2 Analysis vs analytics

A

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Analysis

A

Analysis – Dividing data into digestible components that are easier to understand
and examining how different parts relate to each other. Performed on past data,
explaining why the story ended in the way that it did. We want to explain ‘how’ and
‘why’ something happened

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Analytics

A

– Explores the future. The application of logical and computational
reasoning to the component parts obtained in an analysis. In doing this, you are
looking for patterns and exploring what you can do with them in the future.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Qualitative Analytics

A

– using intuition and experience in conjunction with analysis to plan your next business move

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Qualitative Analytics

A
  • Using intuition and knowledge of the market for the future planning process
  • Turns into Business Analytics`
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

1.3 Intro to Business Analytics, Data Analytics, and Data Science [placeholder]

A

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data Science

A
  • A discipline reliant on data availability, while business analytics does not completely rely on data
  • Uses the parts of data analytics that use complex mathematical, statistical, and programming tools
  • Data Science can be used to improve the accuracy of analytics (predictions) based on data extracted from activities that are used for drilling efficiency
  • Data Science uses data from efficient drilling activities to improve the accuracy of analytics (predictions) (Optimization of Drilling Operations)
  • How tools from machine learning can help us improve the accuracy of our estimations?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data Analytics

A

Ex: Digital Signal Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

1.4 Adding Business Intelligence (BI), Machine Learning, and Artificial
Intelligence (AI) [placeholder]

A

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Business Intelligence (BI)

A
  • The process of analyzing and reporting historical business data
  • After reports and dashboards have been prepared, they can be used to make informed strategic business decisions by end-users such as a general manager
  • Does not work with unstructured data
  • Aims to explain past events using business data
  • The preliminary step of predictive analytics
  1. Analyze past data and extract useful insights
  2. Create appropriate models/dashboards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Machine Learning (Subfield of AI)

A

The ability of machines to predict outcomes without being explicitly programmed

  • Creating and implementing algorithms that let machines receive data and use this data to:
  • make predictions
  • analyze patterns
  • give recommendations on their own
  • Can hold data from third party companies, detect new patterns from their data and suggest real-time recommendations and insights to managers and other decision
  • Helps develop models that predict what a client’s next purchase would be
  • Fraud Protection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Artificial Intelligence (AI)

A

Simulating human knowledge and decision making with computers

Ex: Symbolic Reasoning - High-level human-readable representations of problems and logic (This is extinct though)

However machine learning is the only form of general AI that is being applied and practiced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Advanced Analytics

A

All form of analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

1.5 An Overview of the 365 Data Science Infographic [placeholder]

A

.`

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

From a data scientist’s perspective, the solution to every task comes with having

A

A proper dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The 5 the processes of solving a business task:

A

1) Working with traditional data
2) Working with big data
3) Doing business intelligence
4) Applying traditional data science techniques
5) Using ML techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

7 Important Questions for Data Science

A
  • When is this part of the process applied?
  • Why do we need it?
  • What are the techniques?
  • Where and in which real-life cases can it be applied?
  • How is it implemented? Using what tools?
  • Who is doing this
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
  1. The relationship between different data science fields [placeholder]
A

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Data

A
  • can be defined as information stored in a digital format, which can then be
    used as a base for performing analysis and decision making. We can distinguish
    between two types of data:
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Traditional data

A
  • Data in the form of tables containing numeric or text
    values; Data that is structured and stored in databases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Unstructured Data

A
  • text, images, video, and audio
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Big data

A
  • Extremely large data; Humongous in terms of volume. It can be in various formats:
  • structured
  • semi-structured
  • unstructured
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How is big data characterized?

A
  • Under different frameworks we
    may have 3,5,7, and even 11 Vs of big data; The main ones are volume, variety,
    velocity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The 365 Data Science infographic divides data science in 3 segments:

A
  • business intelligence (analyze the past that you acquired), traditional methods, and machine
    learning (forecast future performance).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Business Intelligence

A
  • Includes all technology-driven tools involved in the process of analyzing, understanding, and reporting available past data. It allows you to make decisions, extract insights, and extract ideas
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Traditional methods

A
  • A set of methods that are derived mainly from statistics and are adapted for business.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Machine learning

A
  • Is all about creating algorithms that let machines receive data, perform calculations, and apply statistical analysis to make predictions with unprecedented accuracy.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q
  1. What is the purpose of each data science field
A

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Difference between traditional methods and machine learning

A

Traditional methods relate to traditional data. They were designed prior to the existence of big data, where the technology simply wasn’t as advanced as it is today. They involve applying statistical approaches to create predictive models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q
  1. Common data science techniques
A

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

4.1 Traditional data: Techniques {info dump}

A

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Raw data

A
  • Also called ‘primary data’ is data that cannot be analyzed straight away. It is untouched data you have accumulated and stored on the server. The gathering of raw data is referred to as data collection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Data Preprocessing

A
  • Needs to be performed on raw data to obtain meaningful information. This is a group of operations that will basically convert your raw data into a more understandable format
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Class labelling

A
  • Labeling the data point to the correct data type (or arranging data by category).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Data cleansing: (‘data cleaning’, ‘data scrubbing’)

A
  • Deal with inconsistent data. For
    example, working on a dataset containing US states and finding that some of the names are misspelled
36
Q

Data balancing

A
  • Ensuring that the sample gives equal priority to each class. For example, if you work with a dataset that contains 80% male and 20% female data, and you know that the population contains approximately 50% men and 50% women, then you need to apply a balancing technique to counteract this problem (using an equal number of data from each group).
37
Q

Data shuffling

A

Shuffling the observations from your dataset just like shuffling a deck of cards. This will ensure your dataset is free from unwanted patterns caused by problematic data collection.

38
Q

4.2 Traditional data: Real-life examples

A

.

39
Q

Numerical variable

A

Numbers that are easily manipulated (for ex. Added), which gives us useful information

40
Q

Categorical variable

A
  • Numbers that hold no numerical value can be considered
    categorical data. Dates are also considered categorical data.
41
Q

4.3 Big data: Techniques

A
  • Examples of big data: text data, digital image data, digital video data, digital audio
    data, etc. With a wide variety of data types comes a wider range of data cleansing methods.
42
Q

Text data mining

A
  • The process of deriving valuable, unstructured data from text
43
Q

Data masking

A
  • As a business, when you work with user private data, you must be able to preserve confidential information. However, this doesn’t mean that the data can’t be touched or used for analysis.
  • In essence, data masking conceals the original data with random and false
    data, allowing you to conduct analysis and keep confidential information in a
    secure place.
44
Q

4.4 Big data: Real-life examples

A
  • Probably the most notable example of a company leveraging the true potential of
    big data is Facebook. The company keeps track of its users’ names, personal data,
    photos, videos, recorded messages and so on. This means their data has a lot of
    variety. And with 2 billion users worldwide, the volume of data stored on their
    servers is tremendous
45
Q

4.5 Business Intelligence: Techniques

A
  • Business intelligence requires the combination of data skills and business knowledge in an effort to explain the past performance of your company. It answers the questions:
    “What happened?”
    “When did it happen?”
    “How many units did we sell?”
    “In which region did we sell the most goods?” etc.
46
Q

What is the job of a business intelligence analyst?

A
  • The job of a business intelligence analyst requires her to understand the essence
    of a business and strengthen that business through the power of data.
47
Q

Metric

A
  • Refers to a value that derives from the measures you have obtained and aims at gauging business performance or progress. Has a business meaning attached to it
  • Metric = Measure + Business meaning
48
Q

Measure

A
  • Simple descriptive statistics of past performance
49
Q

KPIs

A
  • It doesn’t make sense to keep track of all metrics. So, companies choose to
    focus on the most important ones.
  • KPIs = metrics + Business objective

Filtering out the boring metrics and turning the interesting and informative KPIs
into easily understood and comparable visualizations is an important part of the
business intelligence analyst job

50
Q

4.6 Business Intelligence: Real-life examples

A
  • BI allows you to adjust your strategy to past data as soon as it is available. If done right, Business Intelligence will help to efficiently manage your shipment logistics and, in turn, reduce costs and increase profit.
51
Q

4.7 Traditional methods: Techniques {info dump}

A
  • There are two branches of predictive analytics – traditional methods (classical statistical methods for forecasting) and machine learning.
  • In business and statistics, a regression is a model used for quantifying causal relationships among the different variables included in your analysis. A logistic regression is a common example of a non-linear model. The values on the vertical line will be 1s and 0s only.
52
Q

Regression

A

Used to find an association between variables

53
Q

Clustering/Cluster Analysis

A
  • grouping the data in neighborhoods to analyse meaningful patterns
  • Grouping different observations together to find meaningful patterns
  • Cluster analysis is typically used when there is no assumption made about the likely relationships within the data. It provides information about where associations and patterns in data exist, but not what those might be or what they mean.
54
Q

Factor Analysis

A

Grouping explanatory variables (independent variables) together to find a meaning

55
Q

Time series

A
  • used in economics and finance, showing the development of certain values over time, such as stock prices or sales volume.
56
Q

4.8 Traditional methods: Real-life examples

A
  • Forecasting sales data: using time series data to predict a firm’s future expected sales
  • UX: plot customer satisfaction and customer revenue to find that each cluster represents a different geographical location
57
Q

4.9 Machine Learning: Techniques

A

.

58
Q

Machine learning

A
  • Creating an algorithm, which the computer then uses to find a model that fits the data as best as possible to make very accurate predictions.
  • In most situations, a trial-and-error process, but the special thing about it is that each consecutive trial is at least as good as the previous one.
  • There are four ingredients for machine learning: data, model, objective function,
    optimization algorithm
59
Q

Model

A
  • the computer uses an mathematical algorithm to recognize certain types of patterns

Ex: Tennis racket is your model. Need to figure out how to use the racket for the best possible way for the incoming ball

The data is your shot selection (forehand, backhand etc,.)

60
Q

Objective function

A
  • specification of the machine learning problem; is it a function that needs to be maximized (maximum error) or minimized? (minimum error) depending on the task at hand

Ex: Tennis, how far away from the target (the line) line your shot is

  • How you go about hitting your goal.
61
Q

Optimization algorithm

A
  • A process in which previous solutions of the problem are compared until reaching an optimal solution

Ex: Tennis (your coach physically correcting your mechanics without giving explicit instruction)

62
Q

4.10 Machine Learning: Types

A

.

63
Q

Supervised learning

A
  • Training an algorithm resembles a teacher supervising her students. Provides feedback every step of the way. Telling students whether they did ‘good’ or whether they need to improve their performance.
  • When using supervised learning you use labelled data (every data point is
    categorized as ‘good’ performance or as ‘performance that needs improvement’ in
    our example)

Think Tennis:

  • You work with labelled data:

You know the target prior to the shot

You associate the shot with the target

Labeled data allows us to measure the inaccuracy of the shot through the objective function and improve the way the robot shoots through the optimization algorithm (coaching)

Goal: get as close to the target as possible with optimization algorithm (coaching) help. You want error to be minimal (so minimize objective function)

64
Q

Supervised Learning Notable Approaches

A

SVMs - Support Vector Machines

NNs - Neural Networks

Deep Learning

Random Forests Models

Bayesian Networks

65
Q

Unsupervised learning

A
  • In this case, the algorithm trains itself. There isn’t a teacher who provides feedback. The algorithm uses unlabelled data that is not categorized as ‘good’ or as ‘performance that needs improvement
  • The unsupervised ML model simply uses the data and sorts in different groups. In our example, it will be able to show us two groups – ‘good performing’ and ‘performance that needs to be improved’, however, the ML model would not be able to tell us which one is which

Think Tennis:

  • Goal:

Step 1: you don’t know what the differences in tennis shots are, the goal is to figure how many differences there are

Step 2: Then when the differences are found, play around with the shots and see what the different type of shots are and what targets they can hit

66
Q

Reinforcement learning

A
  • A reward system is introduced. Every time a tennis player hits closer to a line than they used to in the past they will receive a reward (and nothing if the task is not performed better)
  • Instead of minimizing an error, we maximize a reward, or in other words,
    maximizing the objective function

Goal: To maximize the reward (maximize objective function) to get the player to hit closer to the line

67
Q

Deep learning

A
  • The modern state-of-the-art approach to machine learning – leverages the power of neural networks and can be placed in both categories – supervised and unsupervised learning.
68
Q
  1. Common data science tools
A
  • There are two main types of tools one can use in data science
  • Programming languages and software
69
Q

Programming languages and Software

A
  • Enable you to devise programs that can execute specific operations. Moreover, you can reuse these programs whenever you need to execute the same action.
  • Python and R are the most popular programming languages for data science
  • Used for mathematical computations as well as general purpose
  • Cannot address problems specific to some domains
  • Languages that you should learn Python, R,
  • SQL for relational database management systems
  • Excel for complex computations and good visualizations quickly
  • SPSS for traditional data and applying statistical analysis
  • Big data: Apache Hadoop, Apache Hase, and MongoDB
  • PowerBI, Qlik, and Tableau for business intelligence visualizations.
70
Q
  1. Data science job positions
A

.

71
Q

Database administrator

A
  • handles this control of data; works with traditional data
72
Q

Data architect

A
  • designs the way data will be retrieved processed and consumed
73
Q

Data engineer

A
  • process the obtained data so that it is ready for analysis
74
Q

BI analyst

A
  • performs analyses and reporting of past historical data
75
Q

BI consultant

A
  • ‘external BI analyst’
76
Q

BI developer

A
  • performs analyses specifically designed for the company
77
Q

Unsupervised Learning Notable Approaches

A

k - means

Deep Learning

78
Q

Data scientist

A
  • employs traditional statistical methods or unconventional machine learning techniques for making predictions
79
Q

Data analyst

A
  • prepare advanced analyses
80
Q

Machine learning engineer

A

applies state-of-the-art ML techniques

81
Q
  1. Dispelling common misconceptions (1)
A
  • 200,000 lines of data constitute big data -It is not just the volume that defines a data set as ‘big’

– variety, variability, velocity, veracity, and other characteristics play an important role as well

82
Q
  1. Dispelling common misconceptions (2)
A
  • Qualitative analysis such as SWOT are not used for quantitative analysis. Hence, they are not part of business intelligence
83
Q
  1. Dispelling common misconceptions (3)
A

Software like Excel, SPSS, and Stata can be successfully used by data science teams in many companies

84
Q
  1. Dispelling common misconceptions (4)
A
  • In deep learning, there is still a debate on WHY the algorithms used outperform all conventional methods
85
Q

Text Data Mining

A

The process of deriving valuable unstructured data from a text

Ex: A database that has information from academic papers, blog articles, private excel files, online platforms and etc., for marketing expenditure