MODULE 1 Flashcards
M1S1 - M1S2
Algorithm where samples are used for training.
Machine Learning Algorithm
It is a research field at the intersection of statistics, artificial intelligence, and computer science.
Machine Learning
It is the practice of cleaning, altering, and reorganizing raw data prior to processing and analysis
Preprocessing
contains inconsistent records
Inconsistency
contains incorrect records or exceptions
Noise
Creating plots and charts to visualize data distributions and relationships.
Visualization
T/F
The performance of ML algorithms adaptively improves with an increase in the number of available samples during the ‘training’ processes.
FALSE: (‘learning’)
T/F
Data reduction is a data cleansing technique.
FALSE
T/F
Reducing noise in data is a feature engineering technique.
FALSE
It covers the ethical and moral obligations of collecting, sharing, and using data, focused on ensuring that data is used fairly, for good.
Data Ethics
Best Practices for Successful ML Model Deployment
- Choosing the Right Infrastructure.
- Effective Versioning and Tracking
- Robust Testing and Validation
- Implementing Monitoring and Alerting
Data: _____________
Learning Algorithms: ______________
Basic Understanding: ______________
Experience (E)
Task (T)
Measure (P)
Even when intentions are good, the ___________ of data analysis can cause inadvertent harm to individuals or groups of people.
Outcome
Once deployed, models need to be continuously monitored.
Monitoring
A field of study concerned with giving computers the ability to learn without being explicitly programmed.
Machine Learning
It is a collection of data used in machine learning tasks.
Dataset
Feature Engineering techniques
- Feature scaling or normalization
- Data reduction
- Discretization
- Feature encoding
Calculating measures like mean, median, variance, and standard deviation.
Summary Statistics
Data Cleansing techniques
- Identify and sort out missing data
- Reduce noisy data
- Identify and remove duplicates
It is used to understand the main characteristics of the data, identify patterns to discover patterns, spot anomalies, test a hypothesis, or check assumptions.
Exploratory Data Analysis (EDA)
The process of creating a model from data is called ___________
Learning (training)
Rule-based algorithms: Condition
Machine Learning: _________.
Model
Algorithm where explicit programming is used.
Rule Based Algorithm
It refers to the process of using the model obtained after learning for prediction.
Testing
It is the most crucial process of integrating the ML model into its production environment. This process is the most challenging, involving several moving pieces, tools, data scientists, and ML engineers to collaborate and strategize.
Model Deployment
In addition to owning their personal information, data subjects have a right to know how you plan to collect, store, and use it.
Transparency
Another ethical responsibility that comes with handling data is ensuring data subjects’ ____________
Privacy
Machine Learning is a field of study concerned with giving computers the ability to ________ without being explicitly programmed.
Learn
___________ matter. Before collecting data, ask yourself why you need it, what you’ll gain from it, and what changes you’ll be able to make after analysis
Intention
Data Preprocessing Techniques
Data Cleansing
Feature Engineering
Machine Learning Workflow
- Project Setup
- Data Preparation
- Modelling
- Deployment
Before deployment, models need to be thoroughly trained and evaluated. This involves data preprocessing, feature engineering, and rigorous testing to ensure
the model is robust and ready for real-world scenarios.
Training
EDA Activities
Visualization
Summary Statistics
Outlier Detection
Correlation Analysis
Hypothesis Testing
contains missing values or the data that lacks attributes
Incompleteness
It is a discipline of artificial intelligence (AI) that provides machines with the ability to automatically learn from data and
past experiences while identifying patterns to make predictions with minimal human intervention.
Machine Learning
Which data preprocessing task is the most time consuming?
Data cleaning
Phases of ML
- Learning
- Prediction
5 Principles of Data Ethics
Ownership
Transparency
Privacy
Intention
Outcomes
It is an observation that seems to be distant from other observations.
Outlier
Without good ________, there is no good _________.
data, model
Algorithm where the decision-making rules are complex and difficult to describe.
Machine Learning Algorithm
Algorithm where rules are automatically learned by the machines.
Machine Learning Algorithm
Examining relationships between variables.
Correlation Analysis
Important step before processing
To prepare the data for analysis or modeling by cleaning and transforming it.
A continuous data is:
Quantitative
T/F
Machines driven by algorithms designed by humans are able to learn latent rules and inherent patterns and to fulfill tasks desired by humans.
TRUE
Steps for Data Preprocessing
- Data Profiling
- Data Cleansing
- Data Reduction
- Data Transformation
- Data Enrichment
- Data Validation
Events or attributes that reflect the performance or nature of a sample in a particular aspect are called ____________
Features
It is a dataset used in the training process, where each sample is referred to as a training sample.
Training set
Also known as predictive analytics or statistical learning.
Machine Learning
Rule Complexity : Scale of the Problem
Simple : Small = ____________
Complex: Small = ____________
Simple : Large = _____________
Complex : Large = ____________
Simple Problems
Manual Rules
Rule Based Algorithm
Machine Learning Algorithm
Algorithm where rules can be specified.
Rule Based Algorithm
Each data record is called a __________
Sample
Testing initial assumptions about the data.
Hypothesis Testing
It is about extracting knowledge from data.
Machine Learning
Data Preprocessing is also called as ___________
Data Preparation
T/F
Machine Learning methods enable computers to operate autonomously without explicit programming.
TRUE
It refers to the process of taking a trained ML model and making it available for use in real-world applications
Machine Learning Model Deployment
Identifying unusual data points.
Outlier Detection
T/F
Sorting out missing data is a data cleansing technique.
TRUE
It is a set of policies, procedures, and standards that implements data governance of an organization.
Data Governance Framework
Types of Machine Learning
Supervised
Unsupervised
Semi-Supervised
Reinforcement
It is a study of learning algorithms.
Machine Learning
It is a set of principles and processes for data collection, management, and use. The goal is to ensure that data is accurate, consistent, and available for use, while protecting data privacy and security.
Data Governance
A nominal data is:
Qualitative
ML models should be able to handle increased loads and continue to deliver results efficiently. Ensuring the infrastructure can handle the model’s computational requirements is vital, requiring validation and effective testing for scalability before deploying models.
Validation
The first principle of data ethics is that an individual has __________ over their personal information.
Ownership
Machine Learning Deployment Model
Training
Validation
Deployment
Monitoring