Midterms - MLA Flashcards
Movie ratings, Military rank are samples of:
Group of answer choices
Discrete data
Ordinal data
Continuous data
Nominal data
Nominal data
Choose all the most popular Python Libraries that are used in data science.
Group of answer choices
NUMPY
ANACONDA
SCIPY
JUPYTER
PANDAS
SQL
NUMPY
SCIPY
JUPYTER
PANDAS
ANACONDA
Which processes are involved in data preparation?
Group of answer choices
Not in the options
All the given options
Data Cleaning, Feature Engineering
Splitting of dataset
Data collection, Data Cleaning
All the given options
A continuous data is:
Group of answer choices
Qualitative
Quantitative
Quantitative
Temperature range is a sample of:
Group of answer choices
Discrete data
Continuous data
Continuous data
Sorting out missing data is a data cleansing technique.
Group of answer choices
True
False
True
Based on the ML application table scenario, when rule complexity is simple and problem scale is large, ML application is:
Group of answer choices
ML Algorithms
Simple Prolem
Manual Rules
Rule-based Algorithms
Rule-based Algorithms
Machine Learning is a field of study concerned with giving computers the ability to ________ without being explicitly programmed.
LEARN
A nominal data is:
Group of answer choices
Quantitative
Qualitative
Qualitative
Which is not true about Machine Learning?
Group of answer choices
Their maintenance is much lower than a human’s and costs a lot less in the long run.
Enable computers to operate autonomously with explicit programming.
Machines driven by algorithms designed by humans are able to learn latent rules and inherent patterns and to fulfill tasks desired by humans.
Automation by machine learning can mitigate risks caused by fatigue or inattention.
Enable computers to operate autonomously with explicit programming.
Reducing noise in data is a feature engineering technique.
Group of answer choices
True
False
False
Rule-based algorithms: Condition
Machine Learning: _________.
MODEL
ML is a research field at the intersection of _________, artificial intelligence, and computer science.
STATISTICS
Data reduction is a data cleansing technique.
Group of answer choices
True
False
False
In EDA, this process identifies unusual data points. __________
OUTLIER DETECTION
Dataset is divided into _______ set and test set.
TRAINING
These concepts helps to understand how well a model performs: Overfitting, Underfitting, _________.
GENERALIZATION
Logistic Regression is an example of a regression algorithm.
False
This refers to the error resulting from sensitivity to the noise in the training data.
Group of answer choices
Not in the options
Overfitting
Underfitting
Generalization
Not in the options
In supervised learning, market trend analysis is an example of:
Group of answer choices
Classification
Correlation
Prediction
Regression
Regression
When the model fits too closely to the training dataset.
Group of answer choices
Overfitting
Underfitting
Generalization
Generalization sabi ni canvas pero overfitting talaga
The _____ refers to the error from having wrong / too simple assumptions in the learning algorithm.
BIAS
Classification algorithms address classification problems where the output variable is categorical.
Group of answer choices
True
False
True
There is a regression variant of the k-nearest neighbors algorithm.
Group of answer choices
True
False
True
In k-NN, High Model Complexity is:
Group of answer choices
Overfitting
Underfitting
Overfitting
The ‘k’ in k-Nearest neighbors refers to the new closest data point.
Group of answer choices
True
False
False
K-nearest neighbors make a prediction for a new data point by finding the data that match from the training dataset.
Group of answer choices
True
False
False
In k-NN, High Model Complexity is underfitting.
Group of answer choices
True
False
False
In k-NN, Euclidean distance (by default) is used to choose the right distance measure.
Group of answer choices
True
False
True
In k-NN, Low Model Complexity is:
Group of answer choices
Overfitting
Underfitting
Underfitting
Linear models make a prediction using a linear function of the input features.
Group of answer choices
True
False
True
Linear Regression is also known as Ordinal Least Squares.
Group of answer choices
True
False
TRUE
The ________ is the sum of the squared differences between the predictions and the true values.
Group of answer choices
Mean error
Median error
Total R
Mean Squared Error
Not in the options
Mean Squared Error
The ‘offset’ parameter is also called slope.
Group of answer choices
True
False
False
Lasso uses L1 Regularization.
Group of answer choices
True
False
True
n Ridge regression is α (alpha) is lesser, the penalty becomes larger.
Group of answer choices
True
False
False
Dichotomous classes means Yes or No.
Group of answer choices
True
False
True
Its primary objective is to map the input variable with the output variable.
Group of answer choices
Unsupervised Learning
Classification
Correlation
Supervised Learning
Supervised Learning
In k-NN, when you choose a small value of k (e.g., k=1), the model becomes more complex.
Group of answer choices
True
False
True
Ridge is generally preferred over Lasso, but if you want a model that is easy to analyze and understand then use Lasso.
Group of answer choices
True
False
True
When comparing training set and test set scores, we find that we predict very accurately on the training set, but the R2 on the test set is much worse. This is a sign of:
Group of answer choices
Underfitting
Overfitting
Overfitting
Ridge regression is a linear regression model that controls complexity to avoid overfitting.
Group of answer choices
True
False
True
The two phases of supervised ML process: Training, ________.
VALIDATION / TESTING? / PREDICTING?
is about extracting knowledge from data
Machine Learning
It is a research field at the intersection of statistics, artificial intelligence, and computer science and is also known as predictive analytics or statistical learning
Machine Learning
A field of study concerned with giving computers the ability to learn without being explicitly programmed
Machine Learning
is a discipline of artificial intelligence (AI) that provides machines with the ability to automatically learn from data and past experiences while identifying patterns to make predictions with minimal human intervention
Machine Learning (ML)
Machine Learning (ML) is a discipline of _____ that provides machines with the ability to automatically learn from data and past experiences while identifying patterns to make predictions with minimal human intervention
artificial intelligence (AI)
is a study of learning algorithms. A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E
Machine learning (including deep learning)
Collection, preparation, and analysis of data
Data Science
Leverages AI/ML, research, industry expertise, and statistics to make business decisions
Data Science
Technology for machines to understand/interpret, learn, and make ‘intelligent’ decisions. Includes Machine Learning among many other fields
Artificial Intelligence
Algorithms that help machines improve through supervised, unsupervised, and reinforcement learning
Machine Learning
Subset of AI and Data Science tool
Machine Learning
Explicit programming is used to solve problems
Rules can be manually specified
Rule-based algorithms
Samples are used for training
The decision-making rules are complex or difficult to describe
Rules are automatically learned by machines
Machine Learning
Small Scale Simple Rule Complexity
Simple Problems
Large Scale Simple Rule Complexity
Rule-based algorithms
Small Scale Complex Rule Complexity
Manual Rules
Large Scale Complex Rule Complexity
Machine Learning Algorithms
enable computers to operate autonomously without explicit programming. ML application are fed with new data, and they can independently learn, grow, develop, and adapt
Machine learning methods
adaptively improves with an increase in the number of available samples during the ‘learning’ process
performance of ML algorithms
______ can work 24/7 and don’t get tired, need breaks, call in sick, or go on strike
Computers and robots
Machines driven by algorithms designed by humans are able to learn ______ and inherent patterns and to fulfill tasks desired by humans
latent rules, inherent patterns
______ are better suited than humans for tasks that are routine, repetitive, or tedious
Learning machines
______ can mitigate risks caused by fatigue or inattention
automation by machine learning
Types of Machine Learning
Supervised Machine Learning
Unsupervised Machine Learning
Semi-Supervised Learning
Reinforcement Learning
a collection of data used in machine learning tasks. Each data record is called a sample
Dataset
Events or attributes that reflect the performance or nature of a sample in a particular aspect are called ______
features
dataset used in the training process, where each sample is referred to as a training sample.
Training set
The process of creating a model from data is called _____
learning (training).
Testing refers to the process of using the model obtained after learning for prediction.
Test set
The dataset used is called a _____, and each sample is called a _____
test set, test sample
(1) Project Setup
Understand the business goals
Choose the solution to your problem.
Speak with your stakeholders and deeply understand the business goal behind the model being proposed. A deep understanding of your business goals will help you scope the necessary technical solution, data sources to be collected, how to evaluate model performance, and more
Understand the business goals
Once you have a deep understanding of your problem - focus on which category of models drives the highest impact.
Choose the solution to your problem.
(2) Data Preparation
Data Collection
Data Cleaning
Feature Engineering
Split the data
Collect all the data you need for your models, whether from your own organization, public, or paid sources
Data Collection
Turn the messy raw data into clean, tidy data ready for analysis.
Data Cleaning
Manipulate the datasets to create variables (features) that improve your model’s prediction accuracy. Create the same features in both the training set and the testing set
Feature Engineering
Randomly divide the records in the dataset into a training set and a testing set. For a more reliable assessment of model performance, generate multiple training and testing sets using cross-validation
Split the data
(3) Modeling
Hyperparameter tuning
Train your models
Make predictions
Assess model performance
For each model, use ______ using techniques to improve model performance.
Hyperparameter tuning
Fit each model to the training set
Train your models
Make predictions on the testing set
Make predictions
For each model, calculate performance metrics on the testing set such as accuracy, recall, and precision
Assess model performance
(4) Deployment
Deploy the model
Monitor model performance
Improve your model
Embed the model you choose in dashboards, applications, or wherever you need it
Deploy the model
Regularly test the performance of your model as your data changes to avoid model drift
Monitor model performance
Continuously iterate and improve your model post-deployment. Replace your model with an updated version to improve performance
Improve your model
Phase 1: Learning
Preprocessing
Learning
Testing
Preprocessing:
Clean Data
Format Data
Learning:
Supervised
Unsupervised
Reinforcement
Testing:
Measure Performance
Test Algorithm
Phase 2: Prediction
New Data + Trained Model = Prediction -> Predicted Data
Machine Learning Languages
Python R C++
Big Data Tools
MemSQL
Apache Spark
General Machine Learning Frameworks
Numpy
Scikit-learn
NLTK
Data Analysis & Visualization Tools
Pandas
Matplotlib
Jupyter Notebook
Weka
Tableau
Macine Learning Frameworks for Natural Network Modeling
Pytorch
Kenas
Caffe 2
Tensorflow & Tensorboard
Top Programming Languages for ML
Python
R
Java
Julia
Scala
C++
JavaScript
Lisp
Haskell
Go
Why Python?
Easy-to-Read Syntax
Extensive Libraries and Frameworks
Strong Community Support
Flexibility
Compatibility with Other Languages
Scalability and Performance
Most popular ______ that are used in data analysis, data science, machine learning (ML), artificial intelligence (AI), natural language processing (NLP), deep learning, and by data scientists:
Python libraries
Top 10 Python Libraries
Pandas
Matplotlib
Tensorflow
SciPy
Scrapy
NumPy
SeaBorn
Keras
Pytorch
SQLModel
A very popular tool and the most prominent Python library for ML
Scikit-learn
is one of the fundamental packages for scientific computing
Numpy
is a collection of functions for scientific computing
Scipy
is the primary scientific plotting library
Matplotlib
is a library for data wrangling and analysis
Pandas
A Python distribution made for large-scale data processing, predictive analysis, and scientific computing
Anaconda