MODULE 1 Flashcards

M1S1 - M1S2

1
Q

Algorithm where samples are used for training.

A

Machine Learning Algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

It is a research field at the intersection of statistics, artificial intelligence, and computer science.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

It is the practice of cleaning, altering, and reorganizing raw data prior to processing and analysis

A

Preprocessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

contains inconsistent records

A

Inconsistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

contains incorrect records or exceptions

A

Noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Creating plots and charts to visualize data distributions and relationships.

A

Visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

T/F
The performance of ML algorithms adaptively improves with an increase in the number of available samples during the ‘training’ processes.

A

FALSE: (‘learning’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

T/F
Data reduction is a data cleansing technique.

A

FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

T/F
Reducing noise in data is a feature engineering technique.

A

FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

It covers the ethical and moral obligations of collecting, sharing, and using data, focused on ensuring that data is used fairly, for good.

A

Data Ethics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Best Practices for Successful ML Model Deployment

A
  1. Choosing the Right Infrastructure.
  2. Effective Versioning and Tracking
  3. Robust Testing and Validation
  4. Implementing Monitoring and Alerting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data: _____________
Learning Algorithms: ______________
Basic Understanding: ______________

A

Experience (E)
Task (T)
Measure (P)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Even when intentions are good, the ___________ of data analysis can cause inadvertent harm to individuals or groups of people.

A

Outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Once deployed, models need to be continuously monitored.

A

Monitoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A field of study concerned with giving computers the ability to learn without being explicitly programmed.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

It is a collection of data used in machine learning tasks.

A

Dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Feature Engineering techniques

A
  • Feature scaling or normalization
  • Data reduction
  • Discretization
  • Feature encoding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Calculating measures like mean, median, variance, and standard deviation.

A

Summary Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Data Cleansing techniques

A
  • Identify and sort out missing data
  • Reduce noisy data
  • Identify and remove duplicates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

It is used to understand the main characteristics of the data, identify patterns to discover patterns, spot anomalies, test a hypothesis, or check assumptions.

A

Exploratory Data Analysis (EDA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The process of creating a model from data is called ___________

A

Learning (training)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Rule-based algorithms: Condition
Machine Learning: _________.

A

Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Algorithm where explicit programming is used.

A

Rule Based Algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

It refers to the process of using the model obtained after learning for prediction.

A

Testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

It is the most crucial process of integrating the ML model into its production environment. This process is the most challenging, involving several moving pieces, tools, data scientists, and ML engineers to collaborate and strategize.

A

Model Deployment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

In addition to owning their personal information, data subjects have a right to know how you plan to collect, store, and use it.

A

Transparency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Another ethical responsibility that comes with handling data is ensuring data subjects’ ____________

A

Privacy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Machine Learning is a field of study concerned with giving computers the ability to ________ without being explicitly programmed.

A

Learn

29
Q

___________ matter. Before collecting data, ask yourself why you need it, what you’ll gain from it, and what changes you’ll be able to make after analysis

A

Intention

30
Q

Data Preprocessing Techniques

A

Data Cleansing
Feature Engineering

31
Q

Machine Learning Workflow

A
  1. Project Setup
  2. Data Preparation
  3. Modelling
  4. Deployment
32
Q

Before deployment, models need to be thoroughly trained and evaluated. This involves data preprocessing, feature engineering, and rigorous testing to ensure
the model is robust and ready for real-world scenarios.

A

Training

33
Q

EDA Activities

A

Visualization
Summary Statistics
Outlier Detection
Correlation Analysis
Hypothesis Testing

34
Q

contains missing values or the data that lacks attributes

A

Incompleteness

35
Q

It is a discipline of artificial intelligence (AI) that provides machines with the ability to automatically learn from data and
past experiences while identifying patterns to make predictions with minimal human intervention.

A

Machine Learning

36
Q

Which data preprocessing task is the most time consuming?

A

Data cleaning

37
Q

Phases of ML

A
  1. Learning
  2. Prediction
38
Q

5 Principles of Data Ethics

A

Ownership
Transparency
Privacy
Intention
Outcomes

39
Q

It is an observation that seems to be distant from other observations.

A

Outlier

40
Q

Without good ________, there is no good _________.

A

data, model

41
Q

Algorithm where the decision-making rules are complex and difficult to describe.

A

Machine Learning Algorithm

42
Q

Algorithm where rules are automatically learned by the machines.

A

Machine Learning Algorithm

43
Q

Examining relationships between variables.

A

Correlation Analysis

44
Q

Important step before processing

A

To prepare the data for analysis or modeling by cleaning and transforming it.

45
Q

A continuous data is:

A

Quantitative

46
Q

T/F
Machines driven by algorithms designed by humans are able to learn latent rules and inherent patterns and to fulfill tasks desired by humans.

A

TRUE

47
Q

Steps for Data Preprocessing

A
  1. Data Profiling
  2. Data Cleansing
  3. Data Reduction
  4. Data Transformation
  5. Data Enrichment
  6. Data Validation
48
Q

Events or attributes that reflect the performance or nature of a sample in a particular aspect are called ____________

A

Features

49
Q

It is a dataset used in the training process, where each sample is referred to as a training sample.

A

Training set

50
Q

Also known as predictive analytics or statistical learning.

A

Machine Learning

51
Q

Rule Complexity : Scale of the Problem
Simple : Small = ____________
Complex: Small = ____________
Simple : Large = _____________
Complex : Large = ____________

A

Simple Problems
Manual Rules
Rule Based Algorithm
Machine Learning Algorithm

52
Q

Algorithm where rules can be specified.

A

Rule Based Algorithm

53
Q

Each data record is called a __________

A

Sample

54
Q

Testing initial assumptions about the data.

A

Hypothesis Testing

55
Q

It is about extracting knowledge from data.

A

Machine Learning

56
Q

Data Preprocessing is also called as ___________

A

Data Preparation

57
Q

T/F
Machine Learning methods enable computers to operate autonomously without explicit programming.

A

TRUE

58
Q

It refers to the process of taking a trained ML model and making it available for use in real-world applications

A

Machine Learning Model Deployment

59
Q

Identifying unusual data points.

A

Outlier Detection

60
Q

T/F
Sorting out missing data is a data cleansing technique.

A

TRUE

61
Q

It is a set of policies, procedures, and standards that implements data governance of an organization.

A

Data Governance Framework

62
Q

Types of Machine Learning

A

Supervised
Unsupervised
Semi-Supervised
Reinforcement

63
Q

It is a study of learning algorithms.

A

Machine Learning

64
Q

It is a set of principles and processes for data collection, management, and use. The goal is to ensure that data is accurate, consistent, and available for use, while protecting data privacy and security.

A

Data Governance

65
Q

A nominal data is:

A

Qualitative

66
Q

ML models should be able to handle increased loads and continue to deliver results efficiently. Ensuring the infrastructure can handle the model’s computational requirements is vital, requiring validation and effective testing for scalability before deploying models.

A

Validation

67
Q

The first principle of data ethics is that an individual has __________ over their personal information.

A

Ownership

68
Q

Machine Learning Deployment Model

A

Training
Validation
Deployment
Monitoring