ML Flashcards

1
Q

Data Analysis vs. Data Science

A

Data Analysis/Analytics - analyses existing data

Data Science - explore the ideas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Weak AI vs Strong AI (artificial general intelligence AGI))

A

Weak AI (Narrow AI) - focuses on performing specific tasks

Strong AI (artificial general intelligence AGI) - aims to create intelligent machines that are indistinguishable from the human mind.

Artificial Super Intelligence (ASI), super AI, super intelligence - super form of AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Loebner Prize competition

A

Loebner Prize competition - a human judge guesses whether the output was created by a human or a computer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Turing test

A

Turing test - In this test, there is a person known as the “interrogator” who seeks to identify a difference between computer-generated output and human-generated ones through a series of questions. If the interrogator cannot reliably discern the machines from human subjects, the machine passes the test. However, if the evaluator can identify the human responses correctly, then this eliminates the machine from being categorized as intelligent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Chinese Room Argument (CRA)

A

Imagine a person, who does not speak Chinese, sits in a closed room. In the room, there is a book with Chinese language rules, phrases and instructions. Another person, who is fluent in Chinese, passes notes written in Chinese into the room. With the help of the language phrasebook, the person inside the room can select the appropriate response and pass it back to the Chinese speaker.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Supervised learning

A

Most common method. Labeled data. Predict trends. Best when you know what your output should be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Unsupervised learning

A

Not labeled, but machine should find the categorization itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Unsupervised learning

A

Not labeled, but machine should find the categorization itself. Good for clustering things together and detect anomalies. identifies patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Reinforcement learning

A

You tell what the goal is and how to achieve it. You give rules and machine learns it. Robotics, games, alphago.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Insight generation

A

Analyze more data. Faster. Uncover patterns in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ETL system

A

Extract, transform, Load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Warehouse, Data Lake

A

Load (ETL) data to one place for easier analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data Warehouse, Data Lake

A

Load (ETL) data to one place for easier analysis
Data warehouse - data for a specific purpose
Data lake - data for anything

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

AI flywheel

A

Data -> Predictions -> Customer Experience -> Traffic ->

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

EMUC

A

Employes Metrics Users Customer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Scrum vs Kanban

A

Kanban is more suitable as we can not predict the amount of time required for the task completion

17
Q

Training data

A

Data used to train a model (80%)

18
Q

Validation data

A

Data used to validate model performance

10% of all data

19
Q

Testing data

A

Is used to test the model once it is developed and fine-tuned (10% of all data)

20
Q

Precision

A

True/False - Positive/Negative
real\pred NO. Yes
NO. TN FP
YES FN TP

Precision= TP/TP+FP - optimize it when FP are important

21
Q

Recall

A

Recall=TP/TP+FN - optimize when False Negatives are important

22
Q

F1 score

A

Use it when both FP and FN are important

23
Q

Hierarchy of needs

A

1) AI
2) Data Science
3) Data Analytics
4) Data Infrastructure
5) Data Collection

24
Q

Triple track agile

A

Three simultaneous tracks: Discovery, Data and Delivery(to not block the data scientists)

25
Q

GDPR

A

General Data Protection Law

26
Q

COPPA

A

Children Online Privacy Protection Act

27
Q

FOIP

A

Freedom on Information and protection of privacy