MODULE 1 Flashcards by Deleted Deleted

Algorithm where samples are used for training.

Machine Learning Algorithm

How well did you know this?

Not at all

Perfectly

It is a research field at the intersection of statistics, artificial intelligence, and computer science.

Machine Learning

How well did you know this?

Not at all

Perfectly

It is the practice of cleaning, altering, and reorganizing raw data prior to processing and analysis

Preprocessing

How well did you know this?

Not at all

Perfectly

contains inconsistent records

Inconsistency

How well did you know this?

Not at all

Perfectly

contains incorrect records or exceptions

Noise

How well did you know this?

Not at all

Perfectly

Creating plots and charts to visualize data distributions and relationships.

Visualization

How well did you know this?

Not at all

Perfectly

T/F
The performance of ML algorithms adaptively improves with an increase in the number of available samples during the ‘training’ processes.

FALSE: (‘learning’)

How well did you know this?

Not at all

Perfectly

T/F
Data reduction is a data cleansing technique.

FALSE

How well did you know this?

Not at all

Perfectly

T/F
Reducing noise in data is a feature engineering technique.

FALSE

How well did you know this?

Not at all

Perfectly

It covers the ethical and moral obligations of collecting, sharing, and using data, focused on ensuring that data is used fairly, for good.

Data Ethics

How well did you know this?

Not at all

Perfectly

Best Practices for Successful ML Model Deployment

Choosing the Right Infrastructure.
Effective Versioning and Tracking
Robust Testing and Validation
Implementing Monitoring and Alerting

How well did you know this?

Not at all

Perfectly

Data: _____________
Learning Algorithms: ______________
Basic Understanding: ______________

Experience (E)
Task (T)
Measure (P)

How well did you know this?

Not at all

Perfectly

Even when intentions are good, the ___________ of data analysis can cause inadvertent harm to individuals or groups of people.

Outcome

How well did you know this?

Not at all

Perfectly

Once deployed, models need to be continuously monitored.

Monitoring

How well did you know this?

Not at all

Perfectly

A field of study concerned with giving computers the ability to learn without being explicitly programmed.

Machine Learning

How well did you know this?

Not at all

Perfectly

It is a collection of data used in machine learning tasks.

Dataset

How well did you know this?

Not at all

Perfectly

Feature Engineering techniques

Feature scaling or normalization
Data reduction
Discretization
Feature encoding

How well did you know this?

Not at all

Perfectly

Calculating measures like mean, median, variance, and standard deviation.

Summary Statistics

How well did you know this?

Not at all

Perfectly

Data Cleansing techniques

Identify and sort out missing data
Reduce noisy data
Identify and remove duplicates

How well did you know this?

Not at all

Perfectly

It is used to understand the main characteristics of the data, identify patterns to discover patterns, spot anomalies, test a hypothesis, or check assumptions.

Exploratory Data Analysis (EDA)

How well did you know this?

Not at all

Perfectly

The process of creating a model from data is called ___________

Learning (training)

How well did you know this?

Not at all

Perfectly

Rule-based algorithms: Condition
Machine Learning: _________.

Model

How well did you know this?

Not at all

Perfectly

Algorithm where explicit programming is used.

Rule Based Algorithm

How well did you know this?

Not at all

Perfectly

It refers to the process of using the model obtained after learning for prediction.

Testing

How well did you know this?

Not at all

Perfectly

It is the most crucial process of integrating the ML model into its production environment. This process is the most challenging, involving several moving pieces, tools, data scientists, and ML engineers to collaborate and strategize.

Model Deployment

In addition to owning their personal information, data subjects have a right to know how you plan to collect, store, and use it.

Transparency

Another ethical responsibility that comes with handling data is ensuring data subjects’ ____________

Privacy

Machine Learning is a field of study concerned with giving computers the ability to ________ without being explicitly programmed.

Learn

___________ matter. Before collecting data, ask yourself why you need it, what you’ll gain from it, and what changes you’ll be able to make after analysis

Intention

Data Preprocessing Techniques

Data Cleansing Feature Engineering

Machine Learning Workflow

1. Project Setup 2. Data Preparation 3. Modelling 4. Deployment

Before deployment, models need to be thoroughly trained and evaluated. This involves data preprocessing, feature engineering, and rigorous testing to ensure the model is robust and ready for real-world scenarios.

Training

EDA Activities

Visualization Summary Statistics Outlier Detection Correlation Analysis Hypothesis Testing

contains missing values or the data that lacks attributes

Incompleteness

It is a discipline of artificial intelligence (AI) that provides machines with the ability to automatically learn from data and past experiences while identifying patterns to make predictions with minimal human intervention.

Machine Learning

Which data preprocessing task is the most time consuming?

Data cleaning

Phases of ML

1. Learning 2. Prediction

5 Principles of Data Ethics

Ownership Transparency Privacy Intention Outcomes

It is an observation that seems to be distant from other observations.

Outlier

Without good ________, there is no good _________.

data, model

Algorithm where the decision-making rules are complex and difficult to describe.

Machine Learning Algorithm

Algorithm where rules are automatically learned by the machines.

Machine Learning Algorithm

Examining relationships between variables.

Correlation Analysis

Important step before processing

To prepare the data for analysis or modeling by cleaning and transforming it.

A continuous data is:

Quantitative

T/F Machines driven by algorithms designed by humans are able to learn latent rules and inherent patterns and to fulfill tasks desired by humans.

TRUE

Steps for Data Preprocessing

1. Data Profiling 2. Data Cleansing 3. Data Reduction 4. Data Transformation 5. Data Enrichment 6. Data Validation

Events or attributes that reflect the performance or nature of a sample in a particular aspect are called ____________

Features

It is a dataset used in the training process, where each sample is referred to as a training sample.

Training set

Also known as predictive analytics or statistical learning.

Machine Learning

Rule Complexity : Scale of the Problem Simple : Small = ____________ Complex: Small = ____________ Simple : Large = _____________ Complex : Large = ____________

Simple Problems Manual Rules Rule Based Algorithm Machine Learning Algorithm

Algorithm where rules can be specified.

Rule Based Algorithm

Each data record is called a __________

Sample

Testing initial assumptions about the data.

Hypothesis Testing

It is about extracting knowledge from data.

Machine Learning

Data Preprocessing is also called as ___________

Data Preparation

T/F Machine Learning methods enable computers to operate autonomously without explicit programming.

TRUE

It refers to the process of taking a trained ML model and making it available for use in real-world applications

Machine Learning Model Deployment

Identifying unusual data points.

Outlier Detection

T/F Sorting out missing data is a data cleansing technique.

TRUE

It is a set of policies, procedures, and standards that implements data governance of an organization.

Data Governance Framework

Types of Machine Learning

Supervised Unsupervised Semi-Supervised Reinforcement

It is a study of learning algorithms.

Machine Learning

It is a set of principles and processes for data collection, management, and use. The goal is to ensure that data is accurate, consistent, and available for use, while protecting data privacy and security.

Data Governance

A nominal data is:

Qualitative

ML models should be able to handle increased loads and continue to deliver results efficiently. Ensuring the infrastructure can handle the model's computational requirements is vital, requiring validation and effective testing for scalability before deploying models.

Validation

The first principle of data ethics is that an individual has __________ over their personal information.

Ownership

Machine Learning Deployment Model

Training Validation Deployment Monitoring

MODULE 1 Flashcards

M1S1 - M1S2 (68 cards)