DataCamp: Data Science for Business Flashcards

1
Q

What is data science

A

It’s a set of methodologies used to gather thousands of forms of data available and draw meaningful conclusions from the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can data do ?

A

describe current state of an organisation or process
detect anomalous events
diagnose the cause of events and behaviours
predict future events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Science Workflow

A

Data Collection
Exploration and Visualization
Experimentation and Prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Applications of Data Science:

A

Traditional Machine Learning

Internet of Things - Refers to gadgets that are not standard computers but have ability to transmit data such as smart watches and home appliances.

Image classification using Deep Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Pseudonymization

A

Anonymization of data to protect privacy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

PII - Personally Identifiable Information

A

Data that can be linked to an individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data Sources

A

Web events, logistics data, Financial transactions, customer data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Solicited Data (SD)

A
Obtained by requesting opinion from customers such as surveys, in-app questionnaires, focus groups and customer reviews. 
SD are used to:
1.  De-risk decision making
2. Monitor quality and
3. Create marketing collateral.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Types of Solicited Data

A

Qualitative (subjective): conversations, open-ended questions. Help to generate hypothesis.
Quantitative: multiple choice, rating scale. Used to validate hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Net Promoter Score

A

Quantitative method of measuring stated preference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Other data sources

A

APIs (a way of requesting data from 3rd parties over the internet), such as Twitter, Wikipedia, Google Maps, Yahoo Finance
Public records

Mechanical Turks - Manual data input by humans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Unstructured data

A

such as emails, text, video and audio files, web pages and social media are stored in document databases. Use NoSQL for data retrieval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Tabular data

A

Relational database. Use SQL for data retrieval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Dashboards

A

A dshboard is a set of metrics, usually in the form of graphs, that update on a pre-defined frequency such as daily, weekly, real-time etc. They help to visualize and explore collected data. Examples of dashboards include time series (which tracks a value over time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ad hoc request

A

Requests for data that does not need to be repeated on a weekly or daily basis. Such request should be specific, include context and priority.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A/B Testing

A

A/B Testing is a type of experiment for de-risking choices between two options such as changes to a website, addition of new features or wording email subjects.

17
Q

A/B Testing steps

A
  1. Picking a metric to track
  2. Calculating the sample size using baseline metric and test sensitivity.
  3. Running the experiment
  4. Determining the significance
18
Q

Machine Learning

A

A set of methods for making predictions based on existing data.

19
Q

Supervised learning

A

A subset of machine learning where the data has labels and features which are used for making predictions. Used to solve problems such as recommendation systems, email subject optimization and churn prediction.

20
Q

Clustering

A

ML algorithms that divide data into clusters. It is applied in Customer segmentation (where customers are didvided into different groups with common attributes), Image categorisation and anomaly detection. It is a part of Unsupervised learning (which uses data with only features and no labels).

21
Q

Special topics in Machine Learning

Time Series

A

Time series forecasting is any type of ML where time is an important feature. It shows periodic patterns and can help spot seasonality.

22
Q

Natural Language Processing (NLP)

A

Refers to ML problem where the dataset (input data) is text. Possible applications include customer reviews, tweet, medical records and email subjects. NLP can be used to classify sentiment and cluster medical records.

23
Q

Deep Learning (Neural Networks)

A

An area of ML used to solve more complex problems. Requires more data than traditional ML. Best used where inputs are less structured such as large amounts of texts and images. Main drawback is the lack of explainability of predictions but it makes highly accurate predictions.

24
Q

Explainable AI

A

Refers to methods that allow humans to understand the factors behind each prediction.