Module 1 Flashcards

1
Q

_______ is about extracting knowledge from data.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

It is a research field at the intersection of statistics, artificial intelligence, and computer science.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

It is also known as predictive analytics or statistical learning.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A field of study concerned with giving computers the ability to learn without being explicitly programmed.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

It is a discipline of Artificial Intelligence (AI) that provides machines with the ability to automatically learn from data and past experiences while identifying patterns to make predictions with minimal human intervention.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

True or False.
Machine Learning (excluding deep learning) is a study of learning algorithms.

A

False.
(INCLUDING deep learning)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Data Science concerned with?

A
  • Collection, preparation, and analysis of data.
  • Leverages AI/ML, research, industry expertise, and statistics to make business decisions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Artificial Intelligence concerned with?

A
  • Technology for machines to understand/interpret, learn, and make “intelligent” decisions.
  • Includes Machine Learning among many fields.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Machine Learning concerned with?

A
  • Algorithms that helps machines improve through supervised, unsupervised, and reinforcement learning.
  • Subset of Al and Data Science
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Beyond that, automation by machine learning can _________ risks caused by fatigue or inattention.

A

Mitigate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

__________ are better suited than humans for tasks that are routine, repetitive, or tedious.

A

Learning Machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Machines driven by algorithms designed by humans are able to learn __________ and _________ and to fulfill tasks desired by humans.

A

Latent Rules and Inherent Patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

This is a collection of data used in machine learning tasks.

A

Dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Machine learning methods enable computers to operate ___________ without explicit programming.

A

Autonomously

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Each data record is called a “_____”.

A

Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

This is the process of creating a model from data.

A

Learning(Training)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the different types of machine learning?

A
  • Supervised Machine Learning
  • Unsupervised Machine Learning
  • Semi-supervised Machine Learning
  • Reinforcement Learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

It is a dataset used in the training process.

A

Training Set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The dataset used in the testing process is called “________”, and each sample is called a/an “_______”.

A

Test Set; Test Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Events or attributes that reflect the performance or nature of a sample in particular aspects are called “_______”.

A

Features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In training sets, each sample is referred to as a “_________”.

A

Training Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

This refers to the process of using the model obtained after learning for prediction.

A

Testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the Machine Learning Workflow?

A

(PayDay MayDay)
- Project Setup
- Data Preparation
- Modeling
- Deployment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What steps are under Project Setup in the Machine Learning Workflow?

A
  • Understand the business goals.
  • Choose the solution to your problem.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What steps are under Data Preparation in the Machine Learning Workflow?

A

(C CED)
- Data Collection
- Data Cleaning
- Feature Engineering
- Split the Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What steps are under Modeling in the Machine Learning Workflow?

A

(Hi There, My Ass)
- Hyperparameter tuning
- Train your Models
- Make predictions
- Assess Model Performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What occurs in Phase 1 of Machine Learning?

A

(PLT)
- Processing
- Learning
- Testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What steps are under Deployment in the Machine Learning Workflow?

A

(DMI)
- Deploy the model
- Monitor model performance
- Improve your model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What occurs in Phase 2 of Machine Learning?

A

New Data + Trained Model -> Prediction -> Predicted Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How does Machine Learning work? (What are the two phases in Machine Learning?)

A

Phase 1: Learning
Phase 2: Prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are some examples of Machine Learning Languages?

A

PRC
- Python
- R
- C++

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are some examples of Big Data Tools?

A
  • MemSQL
  • Apache Spark
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are some examples of General Machine Learning Frameworks?

A

NSN
- Numpy
- Scikit-learn
- NLTK

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are some examples of Data Analysis and Visualization Tools?

A

PM JWT
- Pandas
- Matplotlib
- Jupyter Notebook
- Weka
- Tableau

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are some examples of ML Frameworks for Natural Network Modelling?

A

PK CTT
- Pytorch
- Keras
- Caffe 2
- Tensorflow & Tensorboard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the Top Programming languages for Machine Learning?

A
  • Python
  • R
  • Java
  • Julia
  • Scala
  • C++
  • JavaScript
  • Lisp
  • Haskell
  • Go
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Why use Python for Data Science?

A

EEF CSS
- Easy-to-read Syntax
- Extensive Libraries and Frameworks
- Flexibility
- Compatibility with other Languages
- Strong Community Support
- Scalability and Performance

29
Q

This Python library is a collection of functions for scientific computing.

A

Scipy

29
Q

This Python library is one of the fundamental packages for scientific computing

A

Numpy

29
Q

This Python library is a very popular tool and the most prominent Python library for Machine Learning.

A

Scikit-learn

29
Q

This is a interactive environment for running code in the browser.

A

Jupyter Notebook

29
Q

This is a Python distribution made for large-scale data processing, predictive analysis, and scientific computing.

A

Anaconda

30
Q

This Python library is a library for data wrangling and analysis.

A

Pandas

30
Q

This Python library is used for Machine Learning.

A

Pytorch/Tensorflow

30
Q

This Python library is the prime scientific plotting library.

A

Matplotlib

31
Q

This Python library is used in interacting with SQL databases.

A

SQLModel

32
Q

This Python library is used for Web Crawling

A

Scrapy

33
Q

This Python library is used for Deep Learning

A

Keras

34
Q

What are the applications for Machine Learning?

A

AI METH
- Automobile
- Insurance
- Manufacturing
- E-commerce
- Transportation
- Healthcare

35
Q

What are some quality problems that real data posses?

A
  • Incompleteness
  • Noise
  • Inconsistency
36
Q

It is an observation that seems to be distant from other observations.

A

Outlier

37
Q

It is one observation that follows a different logic or generative process than the other observations.

A

Outlier

38
Q

_________ is the practice of cleaning, altering, and reorganizing raw data prior to processing and analysis,

A

Preprocessing

39
Q

Data Preprocessing is also known as “__________”

A

Data Preparation

40
Q

It is an important step before processing to prepare the data for analysis or modeling by cleaning
and transforming it.

A

Data Preprocessing

41
Q

What are the key steps in Data Preprocessing?

A

PCR TEV
- Data Profiling
- Data Cleansing
- Data Reduction
- Data Transformation
- Data Evaluation
- Data Validation

42
Q

What are the two main categories of Preprocessing?

A

Data Cleansing and Feature Engineering

43
Q

This is composed of techniques for cleaning messy data.

A

Data Cleansing

44
Q

This features techniques used by data scientists to organize the data in ways that make it more efficient to train data models and run inferences against them.

A

Feature Engineering

45
Q

What is/are done during Data Cleansing?

A
  • Identify and sort out missing data
  • Reduce noisy data
  • Identify and remove duplicates
46
Q

What is/are done during Feature Engineering?

A
  • Feature scaling of normalization
  • Data Reduction
  • Discretization
  • Feature Encoding
47
Q

This is used to understand the main characteristics of the data, identify patterns to discover patterns, spot anomalies, test a hypothesis, or check assumptions.

A

Exploratory Data Analysis (EDA)

48
Q

This often employs data visualization methods.

A

Exploratory Data Analysis (EDA)

49
Q

What are the different activities done during Exploratory Data Analysis?

A

VSAUCE (VSOCH)
- Visualization
- Summary Statistics
- Outlier Detection
- Correlation Analysis
- Hypothesis Testing

50
Q

This activity in EDA involves creating plots and charts to visualize data distributions and relationships.

A

Visualization

51
Q

This activity in EDA involves calculating measures like mean, median, variance, and standard deviation.

A

Summary Statistics

52
Q

This activity in EDA involves Identifying unusual data points.

A

Outlier Detection

53
Q

This activity in EDA involves examining relationships between variables.

A

Correlation Analysis

54
Q

This activity in EDA involves testing initial assumptions about the data.

A

Hypothesis Testing

55
Q

It is the act of filling in missing values by estimating them.

A

Imputation

56
Q

This refers to the process of taking a trained ML model and making it available for use in real-world applications.

A

Machine Learning Model Deployment

57
Q

What are the different steps in Machine Learning Model Deployment?

A
  • Training
  • Validation
  • Development
  • Monitoring
58
Q

This involves data preprocessing, feature engineering, and rigorous testing to ensure the model is robust and ready for real-world scenarios.

A

Training

59
Q

This ensures the infrastructure can handle the model’s computational requirements is vital, requiring validation and effective testing for scalability before deploying models.

A

Validation

60
Q

What are the different steps involved in the Deployment of a model?

A

DDC SCC
- Defining how to extract or process data in real time.
- Determining the storage required for these processes.
- Collection and predictions of model and data patterns.
- Setting up APIs, tools and other software environments to support and improve predictions.
- Configuring the hardware (cloud or on-prem environments) to help support the ML model.
- Creating a pipeline for continuous training and parameter tuning.

61
Q

What are the best practices for successful ML Model Deployment?

A
  • Choosing the right infrastructure
  • Effective Versioning and Tracking
  • Robust Testing and Validation
  • Implementing Monitoring and Alerting
62
Q

It covers the ethical and moral obligations of sharing, collecting, and using data, focused on ensuring that data is used fairly, for good.

A

Data Ethics

63
Q

This principle of Data Ethics goes as follows:

The first principle of data ethics is that an individual has ownership over their personal information.

A

Ownership

63
Q

This principle of Data Ethics goes as follows:

Just as it’s considered stealing to take an item that doesn’t belong to you, it’s unlawful and unethical to collect someone’s personal data without their consent.

A

Ownership

63
Q

What are the 5 principles of Data Ethics?

A
  • Ownership
  • Transparency
  • Privacy
  • Intention
  • Outcomes
63
Q

This principle of Data Ethics goes as follows:
In addition to owning their personal information, data subjects have a right to know how you plan to collect, store, and use it.

A

Transparency

63
Q

This principle of Data Ethics goes as follows:
Before collecting data, ask yourself why you need it, what you’ll gain from it, and what changes you’ll be able to make after analysis.

A

Intention

63
Q

This principle of Data Ethics goes as follows:
Another ethical responsibility that comes with handling data is ensuring data subjects’ ___________. Even if a customer gives your company consent to collect, store, and analyze their personally identifiable information (PII).

A

Privacy

64
Q

This principle of Data Ethics goes as follows:
If your intention is to hurt others, profit from your subjects’ weaknesses, or any other malicious goal, it’s not ethical to collect their data.

A

Intention

64
Q

This principle of Data Ethics goes as follows:
Even when intentions are good, the outcome of data analysis can cause inadvertent harm to individuals or groups of people. This is called a disparate impact

A

Outcome

65
Q

What are the Data Privacy Regulations (New Rules of Data)?

A
  1. Trust over Transactions - this first rule is all about consent.
  2. Insight over Identity - Avoid compromising both privacy and security.
  3. Flows over Silos - No need to work on silos, rather CIOs and CDOs can work together to facilitate the flow of insights.
66
Q

What are the Data Subject Rights?

A

The right to:
- To be informed
- To file a complaint
- To damages
- To object
- To access
- To rectify
- To erasure or blocking
- To data portability

67
Q

It is a set of principles and processes for data
collection, management, and use.

A

Data Governance

67
Q

It ensures data is accurate, consistent, and available
while protecting data privacy and security.

A

Data Governance

68
Q

It is a set of policies, procedures, and standards that implements data governance for an organization.

A

Data Governance Framework

69
Q

_____________ describes what to do, the ______________ describes how to do it.

A

Data Governance; Data Governance Framework

69
Q

What are the pillars of Data Governance?

A
  • Ownership and Accountability
  • Data Quality
  • Data Protection and Safety
  • Data Use and Availability
  • Data Management