ML Flashcards
Data Analysis vs. Data Science
Data Analysis/Analytics - analyses existing data
Data Science - explore the ideas
Weak AI vs Strong AI (artificial general intelligence AGI))
Weak AI (Narrow AI) - focuses on performing specific tasks
Strong AI (artificial general intelligence AGI) - aims to create intelligent machines that are indistinguishable from the human mind.
Artificial Super Intelligence (ASI), super AI, super intelligence - super form of AI
Loebner Prize competition
Loebner Prize competition - a human judge guesses whether the output was created by a human or a computer.
Turing test
Turing test - In this test, there is a person known as the “interrogator” who seeks to identify a difference between computer-generated output and human-generated ones through a series of questions. If the interrogator cannot reliably discern the machines from human subjects, the machine passes the test. However, if the evaluator can identify the human responses correctly, then this eliminates the machine from being categorized as intelligent.
Chinese Room Argument (CRA)
Imagine a person, who does not speak Chinese, sits in a closed room. In the room, there is a book with Chinese language rules, phrases and instructions. Another person, who is fluent in Chinese, passes notes written in Chinese into the room. With the help of the language phrasebook, the person inside the room can select the appropriate response and pass it back to the Chinese speaker.
Supervised learning
Most common method. Labeled data. Predict trends. Best when you know what your output should be.
Unsupervised learning
Not labeled, but machine should find the categorization itself.
Unsupervised learning
Not labeled, but machine should find the categorization itself. Good for clustering things together and detect anomalies. identifies patterns.
Reinforcement learning
You tell what the goal is and how to achieve it. You give rules and machine learns it. Robotics, games, alphago.
Insight generation
Analyze more data. Faster. Uncover patterns in the data.
ETL system
Extract, transform, Load
Data Warehouse, Data Lake
Load (ETL) data to one place for easier analysis
Data Warehouse, Data Lake
Load (ETL) data to one place for easier analysis
Data warehouse - data for a specific purpose
Data lake - data for anything
AI flywheel
Data -> Predictions -> Customer Experience -> Traffic ->
EMUC
Employes Metrics Users Customer
Scrum vs Kanban
Kanban is more suitable as we can not predict the amount of time required for the task completion
Training data
Data used to train a model (80%)
Validation data
Data used to validate model performance
10% of all data
Testing data
Is used to test the model once it is developed and fine-tuned (10% of all data)
Precision
True/False - Positive/Negative
real\pred NO. Yes
NO. TN FP
YES FN TP
Precision= TP/TP+FP - optimize it when FP are important
Recall
Recall=TP/TP+FN - optimize when False Negatives are important
F1 score
Use it when both FP and FN are important
Hierarchy of needs
1) AI
2) Data Science
3) Data Analytics
4) Data Infrastructure
5) Data Collection
Triple track agile
Three simultaneous tracks: Discovery, Data and Delivery(to not block the data scientists)
GDPR
General Data Protection Law
COPPA
Children Online Privacy Protection Act
FOIP
Freedom on Information and protection of privacy