Midterms - MLA Flashcards by Ralph Gamboa

Movie ratings, Military rank are samples of:
Group of answer choices

Discrete data

Ordinal data

Continuous data

Nominal data

How well did you know this?

Not at all

Perfectly

Choose all the most popular Python Libraries that are used in data science.
Group of answer choices

NUMPY

ANACONDA

SCIPY

JUPYTER

PANDAS

SQL

NUMPY

SCIPY

JUPYTER

PANDAS

ANACONDA

How well did you know this?

Not at all

Perfectly

Which processes are involved in data preparation?
Group of answer choices

Not in the options

All the given options

Data Cleaning, Feature Engineering

Splitting of dataset

Data collection, Data Cleaning

All the given options

How well did you know this?

Not at all

Perfectly

A continuous data is:
Group of answer choices

Qualitative

Quantitative

How well did you know this?

Not at all

Perfectly

Temperature range is a sample of:
Group of answer choices

Discrete data

Continuous data

How well did you know this?

Not at all

Perfectly

Sorting out missing data is a data cleansing technique.
Group of answer choices

True

False

True

How well did you know this?

Not at all

Perfectly

Based on the ML application table scenario, when rule complexity is simple and problem scale is large, ML application is:
Group of answer choices

ML Algorithms

Simple Prolem

Manual Rules

Rule-based Algorithms

How well did you know this?

Not at all

Perfectly

Machine Learning is a field of study concerned with giving computers the ability to ________ without being explicitly programmed.

LEARN

How well did you know this?

Not at all

Perfectly

A nominal data is:
Group of answer choices

Quantitative

Qualitative

How well did you know this?

Not at all

Perfectly

Which is not true about Machine Learning?
Group of answer choices

Their maintenance is much lower than a human’s and costs a lot less in the long run.

Enable computers to operate autonomously with explicit programming.

Machines driven by algorithms designed by humans are able to learn latent rules and inherent patterns and to fulfill tasks desired by humans.

Automation by machine learning can mitigate risks caused by fatigue or inattention.

Enable computers to operate autonomously with explicit programming.

How well did you know this?

Not at all

Perfectly

Reducing noise in data is a feature engineering technique.
Group of answer choices

True

False

How well did you know this?

Not at all

Perfectly

Rule-based algorithms: Condition

Machine Learning: _________.

MODEL

How well did you know this?

Not at all

Perfectly

ML is a research field at the intersection of _________, artificial intelligence, and computer science.

STATISTICS

How well did you know this?

Not at all

Perfectly

Data reduction is a data cleansing technique.
Group of answer choices

True

False

How well did you know this?

Not at all

Perfectly

In EDA, this process identifies unusual data points. __________

OUTLIER DETECTION

How well did you know this?

Not at all

Perfectly

Dataset is divided into _______ set and test set.

TRAINING

How well did you know this?

Not at all

Perfectly

These concepts helps to understand how well a model performs: Overfitting, Underfitting, _________.

GENERALIZATION

How well did you know this?

Not at all

Perfectly

Logistic Regression is an example of a regression algorithm.

False

How well did you know this?

Not at all

Perfectly

This refers to the error resulting from sensitivity to the noise in the training data.
Group of answer choices

Not in the options

Overfitting

Underfitting

Generalization

Not in the options

How well did you know this?

Not at all

Perfectly

In supervised learning, market trend analysis is an example of:
Group of answer choices

Classification

Correlation

Prediction

Regression

How well did you know this?

Not at all

Perfectly

When the model fits too closely to the training dataset.
Group of answer choices

Overfitting

Underfitting

Generalization

Generalization sabi ni canvas pero overfitting talaga

How well did you know this?

Not at all

Perfectly

The _____ refers to the error from having wrong / too simple assumptions in the learning algorithm.

BIAS

How well did you know this?

Not at all

Perfectly

Classification algorithms address classification problems where the output variable is categorical.
Group of answer choices

True

False

True

How well did you know this?

Not at all

Perfectly

There is a regression variant of the k-nearest neighbors algorithm.
Group of answer choices

True

False

True

How well did you know this?

Not at all

Perfectly

In k-NN, High Model Complexity is: Group of answer choices Overfitting Underfitting

Overfitting

The ‘k’ in k-Nearest neighbors refers to the new closest data point. Group of answer choices True False

False

K-nearest neighbors make a prediction for a new data point by finding the data that match from the training dataset. Group of answer choices True False

False

In k-NN, High Model Complexity is underfitting. Group of answer choices True False

False

In k-NN, Euclidean distance (by default) is used to choose the right distance measure. Group of answer choices True False

True

In k-NN, Low Model Complexity is: Group of answer choices Overfitting Underfitting

Underfitting

Linear models make a prediction using a linear function of the input features. Group of answer choices True False

True

Linear Regression is also known as Ordinal Least Squares. Group of answer choices True False

TRUE

The ________ is the sum of the squared differences between the predictions and the true values. Group of answer choices Mean error Median error Total R Mean Squared Error Not in the options

Mean Squared Error

The ‘offset’ parameter is also called slope. Group of answer choices True False

False

Lasso uses L1 Regularization. Group of answer choices True False

True

n Ridge regression is α (alpha) is lesser, the penalty becomes larger. Group of answer choices True False

False

Dichotomous classes means Yes or No. Group of answer choices True False

True

Its primary objective is to map the input variable with the output variable. Group of answer choices Unsupervised Learning Classification Correlation Supervised Learning

Supervised Learning

In k-NN, when you choose a small value of k (e.g., k=1), the model becomes more complex. Group of answer choices True False

True

Ridge is generally preferred over Lasso, but if you want a model that is easy to analyze and understand then use Lasso. Group of answer choices True False

True

When comparing training set and test set scores, we find that we predict very accurately on the training set, but the R2 on the test set is much worse. This is a sign of: Group of answer choices Underfitting Overfitting

Overfitting

Ridge regression is a linear regression model that controls complexity to avoid overfitting. Group of answer choices True False

True

The two phases of supervised ML process: Training, ________.

VALIDATION / TESTING? / PREDICTING?

is about extracting knowledge from data

Machine Learning

It is a research field at the intersection of statistics, artificial intelligence, and computer science and is also known as predictive analytics or statistical learning

Machine Learning

A field of study concerned with giving computers the ability to learn without being explicitly programmed

Machine Learning

is a discipline of artificial intelligence (AI) that provides machines with the ability to automatically learn from data and past experiences while identifying patterns to make predictions with minimal human intervention

Machine Learning (ML)

Machine Learning (ML) is a discipline of _____ that provides machines with the ability to automatically learn from data and past experiences while identifying patterns to make predictions with minimal human intervention

artificial intelligence (AI)

is a study of learning algorithms. A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E

Machine learning (including deep learning)

Collection, preparation, and analysis of data

Data Science

Leverages AI/ML, research, industry expertise, and statistics to make business decisions

Data Science

Technology for machines to understand/interpret, learn, and make ‘intelligent’ decisions. Includes Machine Learning among many other fields

Artificial Intelligence

Algorithms that help machines improve through supervised, unsupervised, and reinforcement learning

Machine Learning

Subset of AI and Data Science tool

Machine Learning

Explicit programming is used to solve problems Rules can be manually specified

Rule-based algorithms

Samples are used for training The decision-making rules are complex or difficult to describe Rules are automatically learned by machines

Machine Learning

Small Scale Simple Rule Complexity

Simple Problems

Large Scale Simple Rule Complexity

Rule-based algorithms

Small Scale Complex Rule Complexity

Manual Rules

Large Scale Complex Rule Complexity

Machine Learning Algorithms

enable computers to operate autonomously without explicit programming. ML application are fed with new data, and they can independently learn, grow, develop, and adapt

Machine learning methods

adaptively improves with an increase in the number of available samples during the ‘learning’ process

performance of ML algorithms

______ can work 24/7 and don’t get tired, need breaks, call in sick, or go on strike

Computers and robots

Machines driven by algorithms designed by humans are able to learn ______ and inherent patterns and to fulfill tasks desired by humans

latent rules, inherent patterns

______ are better suited than humans for tasks that are routine, repetitive, or tedious

Learning machines

______ can mitigate risks caused by fatigue or inattention

automation by machine learning

Types of Machine Learning

Supervised Machine Learning Unsupervised Machine Learning Semi-Supervised Learning Reinforcement Learning

a collection of data used in machine learning tasks. Each data record is called a sample

Dataset

Events or attributes that reflect the performance or nature of a sample in a particular aspect are called ______

features

dataset used in the training process, where each sample is referred to as a training sample.

Training set

The process of creating a model from data is called _____

learning (training).

Testing refers to the process of using the model obtained after learning for prediction.

Test set

The dataset used is called a _____, and each sample is called a _____

test set, test sample

(1) Project Setup

Understand the business goals Choose the solution to your problem.

Speak with your stakeholders and deeply understand the business goal behind the model being proposed. A deep understanding of your business goals will help you scope the necessary technical solution, data sources to be collected, how to evaluate model performance, and more

Understand the business goals

Once you have a deep understanding of your problem - focus on which category of models drives the highest impact.

Choose the solution to your problem.

(2) Data Preparation

Data Collection Data Cleaning Feature Engineering Split the data

Collect all the data you need for your models, whether from your own organization, public, or paid sources

Data Collection

Turn the messy raw data into clean, tidy data ready for analysis.

Data Cleaning

Manipulate the datasets to create variables (features) that improve your model’s prediction accuracy. Create the same features in both the training set and the testing set

Feature Engineering

Randomly divide the records in the dataset into a training set and a testing set. For a more reliable assessment of model performance, generate multiple training and testing sets using cross-validation

Split the data

(3) Modeling

Hyperparameter tuning Train your models Make predictions Assess model performance

For each model, use ______ using techniques to improve model performance.

Hyperparameter tuning

Fit each model to the training set

Train your models

Make predictions on the testing set

Make predictions

For each model, calculate performance metrics on the testing set such as accuracy, recall, and precision

Assess model performance

(4) Deployment

Deploy the model Monitor model performance Improve your model

Embed the model you choose in dashboards, applications, or wherever you need it

Deploy the model

Regularly test the performance of your model as your data changes to avoid model drift

Monitor model performance

Continuously iterate and improve your model post-deployment. Replace your model with an updated version to improve performance

Improve your model

Phase 1: Learning

Preprocessing Learning Testing

Preprocessing:

Clean Data Format Data

Learning:

Supervised Unsupervised Reinforcement

Testing:

Measure Performance Test Algorithm

Phase 2: Prediction

New Data + Trained Model = Prediction -> Predicted Data

Machine Learning Languages

Python R C++

Big Data Tools

MemSQL Apache Spark

General Machine Learning Frameworks

Numpy Scikit-learn NLTK

Data Analysis & Visualization Tools

Pandas Matplotlib Jupyter Notebook Weka Tableau

Macine Learning Frameworks for Natural Network Modeling

Pytorch Kenas Caffe 2 Tensorflow & Tensorboard

Top Programming Languages for ML

Python R Java Julia Scala C++ JavaScript Lisp Haskell Go

Why Python?

Easy-to-Read Syntax Extensive Libraries and Frameworks Strong Community Support Flexibility Compatibility with Other Languages Scalability and Performance

Most popular ______ that are used in data analysis, data science, machine learning (ML), artificial intelligence (AI), natural language processing (NLP), deep learning, and by data scientists:

Python libraries

Top 10 Python Libraries

Pandas Matplotlib Tensorflow SciPy Scrapy NumPy SeaBorn Keras Pytorch SQLModel

A very popular tool and the most prominent Python library for ML

Scikit-learn

is one of the fundamental packages for scientific computing

Numpy

is a collection of functions for scientific computing

Scipy

is the primary scientific plotting library

Matplotlib

is a library for data wrangling and analysis

Pandas

A Python distribution made for large-scale data processing, predictive analysis, and scientific computing

Anaconda

is an interactive environment for running code in the browser

Jupyter Notebook

Applications of Machine Learning

Manufacturing Healthcare E-commerce Automobile Insurance Transportation

credit scoring, algorithmic trading

Computational finance

facial recognition, motion tracking, object detection

Computer vision

DNA sequencing, brain tumor detection, drug discovery

Computational biology

predictive maintenance

Automotive, aerospace, and manufacturing

voice recognition

Natural language processing

contains missing values or the data that lacks attributes

Incompleteness

contains incorrect records or exceptions.

Noise

contains inconsistent records

Inconsistency

Without good data, there is no

good model

is an observation that seems to be distant from other observations or, more specifically, one observation that follows a different logic or generative process than the other observations

Outlier

s the practice of cleaning, altering, and reorganizing raw data prior to processing and analysis, which is also known as data preparation

Preprocessing

Preprocessing - is the practice of cleaning, altering, and reorganizing raw data prior to processing and analysis, which is also known as ______

data preparation

It is an important step before processing to prepare, _____

prepare data for analysis and modeling by cleaning and transforming

Key steps in Data Preprocessing

Data Profiling Data Cleansing Data Reduction Data Transformation Data Enrichment Data Validation

Data Preprocessing Techniques

Data Cleansing Feature Engineering

Identify and sort out missing data Reduce noisy data Identify and remove duplicates

Data Cleansing

Involves techniques used by data scientists to organize the data in ways that make it more efficient to train data models and run inferences against them

Feature Engineering

Feature scaling of normalization Data reduction Discretization Feature encoding

Feature Engineering

To understand the main characteristics of the data, identify patterns to discover patterns, spot anomalies, test a hypothesis, or check assumptions

Exploratory Data Analysis (EDA)

Data Visualization Methods

Visualization Summary Statistics Outlier Detection Correlation Analysis

Creating plots and charts to visualize data distributions and relationships

Visualization

Calculating measures like mean, median, variance, and standard deviation.

Summary Statistics

Identifying unusual data points

Outlier Detection

Examining relationships between variables

Correlation Analysis

Testing initial assumptions about the data

Hypothesis Testing

are useful for visualizing the “count” of values in the data set

Bar plots and Histograms

Machine Learning Model Deployment

Training Validation Deployment Monitoring

refers to the process of taking a trained Ml model and making it available for use in real-world applications

Machine Learning Model Deployment

Before deployment, models need to be thoroughly trained and evaluated. This involves data preprocessing, feature engineering, and rigorous testing to ensure the model is robust and ready for real-world scenarios

Training

ML models should be able to handle increased loads and continue to deliver results efficiently. Ensuring the infrastructure can handle the model’s computational requirements is vital, requiring validation and effective testing for scalability before deploying models

Validation

Model deployment is the most crucial process of integrating the ML model into its production environment.

Deployment

Deployment process entails:

Defining how to extract or process the data in real time Determine the storage required for these processes Collection and predictions of model and data patterns Setting up APIs, tools, and other software environments to support and improve predictions Configuring the hardware (cloud or on-prem environments) to help support the ML model Creating a pipeline for continuous training and parameter tuning

This process is the most challenging, involving several moving pieces, tools, data scientists, and ML engineers to collaborate and strategize

Deployment

Once deployed, models need to be continuously _____

monitored.

Real world data can evolve, and models may drift in their performance.

Monitoring

Implementing ______ systems to help to detect deviations and make necessary adjustments in a timely manner

monitoring

Best Practices for Successful ML Model Deployment

Choosing the Right Infrastructure Effective Versioning and Tracking Robust Testing and Validation Implementing Monitoring and Alerting

covers the ethical and moral obligations of collecting, sharing, and using data, focused on ensuring that data is used fairly, for good

Data Ethics

5 Principles of Data Ethics

Ownership Transparency Privacy Intention Outcomes

the first principle of data ethics is that an individual has ownership over their personal information. Just as it’s considered stealing to take an item that doesn’t belong to you, it’s unlawful and unethical to collect someone’s personal data without their consent

Ownership

In addition to owning their personal information, data subjects have a right to know how you plan to collect, store, and use it. When gathering data, exercise ______

transparency

Another ethical responsibility that comes with handling data is ensuring data subjects’ _____. Even if a customer gives your company to collect, store, and analyze their personally identifiable information (PII)

privacy

Before collecting data, ask yourself why you need it, what you’ll gain from it, and what changes you’ll be able to make after analysis. If your intention is to hurt others, profit from your subjects’ weaknesses, or any other malicious goal, it’s not ethical to collect their data

Intention

even when intentions are good, the outcome of data analysis can cause inadvertent harm to individuals or groups of people.

Outcomes

the outcome of data analysis can cause inadvertent harm to individuals or groups of people. This is called a ______

disparate impact

Data Privacy Regulation (New Rules of Data)

Rule 1: Trust over Transactions Rule 2: Insight over Identity Rule 3: Flows over silos

This first rule is all about consent. Until now, companies have been gathering as much as data as possible on their current and prospective customers’ preferences, habits, and identities, transaction by transaction - often without customers understanding what is happening

Rule 1: Trust over Transactions

Firms need to re-think not only how they acquire data from their customers but from each other as well. Currently, companies routinely transfer large amounts of personal identifiable information (PII) through a complex web of data agreements, compromising both privacy and security

Rule 2: Insight over Identity

New organizing principle for internal data teams. Once all your customer data has meaningful consent and you are acquiring insight without transferring data, CIOs and CDOs no longer need to work in silos, with one trying to keep data locked up while the other is trying to break it out. Instead, CIOs and CDOs can work together to facilitate the flow of insights

Rule 3: Flows over silos

Data Subject Rights

Right to be Informed Right to Damages Right to Access Right to Erasure or Blocking Right to File a Complaint Right to Object Right to Rectify Right to Data Portability

is a set of principles and processes for data collection, management, and use. The goal is to ensure that data is accurate, consistent, and available for use, while protecting data privacy and security

Data Governance

is a set of policies, procedures, and standards that implements data governance for an organization.

Data Governance Framework

The Pillars of Data Governance

Ownership & Accountability Data Quality Data Protection & Safety Data use & Availability Data Management

10 Questions to Answer before using AI in Public Sector Algorithmic Decision Making

Objective Use Impacts Assumptions Data Inputs Mitigation Ethics Oversight Evaluation

why is the algorithm needed and what outcomes is it intended to enable

Objective

In what processes and circumstances is the algorithm appropriate to be used?

Use

what impacts - good and bad - could the use of the algorithm have on people?

Impacts

what assumptions is the algorithm based on and what are their limitations and potential biases?

Assumptions

what datasets is/was the algorithm trained on and what are their limitations and potential biases?

Data

what new data does the algorithm use when making decisions?

Inputs

what actions have been taken to mitigate the negative impacts that could result from the algorithm’s limitations and potential biases?

Mitigation

what assessments has been made of the ethics of using this algorithm?

Ethics

what human judgement is needed before acting on the algorithm’s output and who is responsible for ensuring its proper use?

Oversight

how, and by what criteria, will the effectiveness of the algorithm be assessed, and by whom?

Evaluation

Each example in the dataset is a pair consisting of an input object (such as a _____) and a desired output value (____).

feature vector, label

The primary objective of the supervised learning technique is to ______

map the input variable with the output variable

Supervised machine learning is further classified into two broad categories:

Regression Classification

Regression: target is a _____ variable

continuous

Regression Examples

Forecasting future stock price Forecasting energy resources Weather prediction Market trend analysis Predicting the environmental impact of pollutants

Classification: target is a ____ variable

categorical

Classification Examples

Classifying objects in images Classifying chest X-rays images into COVID positive/negative Handwritten digits recognition Filter Emails into spam or not Activity recognition for wearable devices

Refer to algorithms that address classification problems where the output variable is categorical; for example, yes or no, true or false, male or female.

Classification

Predicts one of the possible class labels

Classification

classification of two classes (yes/no, negative/positive, 0/1

Binary Classification

classification of three or more classes

Multiple Classification

Classification algorithms include:

Random Forest Algorithm Decision Tree Algorithm Logistics Regression Algorithm Support Vector Machine Algorithm

_____ algorithms handle _____ problems where input and output variables have a linear relationship

Regression

Regression algorithms include:

Simple Linear Regression Algorithm, Multivariate Regression Algorithm, Decision Tree Algorithm, and Lasso Regression

Same with any ML processes, the supervised ML has two phases: the usual ____ and _____, followed by _____

training validation prediction

the larger variety of data points your data set contains, the more complex a model you can use without ____

overfitting

how well a model performs:

Generalization Overfitting Underfitting

If a model is able to make accurate predictions on unseen data, we say it is able to _____ from the training set to the test set

generalize

Occurs when a model learns the training data too well, including its noise and outliers

Overfitting

occurs when you fit a model too closely to the particularities of the training set and obtain a model that works well on the training set but is not able to generalize to new data

Overfitting

performs exceptionally well on training data but poorly on new, unseen data because it has essentially memorized the training data rather than learning the underlying patterns

overfitted model

If your model is too simple then you might not be able to capture all the aspects and variability in the data, and your model will do badly even on the training set. Choosing too simple a model is called underfitting

underfitting

performs poorly on both training and new data because it hasn’t learned enough from the training data

underfitted

The more complex we allow our model to be, the better we will be able to predict on the training data

Model Complexity Curve

error from having wrong / too simple assumptions in the learning algorithm

Bias

error resulting from sensitivity to the noise / fluctuations in the training data

Variance

Low Bias and Low Variance = ?

Good Model

the k-NN algorithm is arguably the simplest machine learning algorithm.

k-Nearest Neighbors

Building the model consist only of storing the training dataset.

k-Nearest Neighbors

To make a prediction for a new data point, the algorithm finds the closest data points in the training dataset - its _______

“nearest neighbors”

in its simplest version, the k-NN algorithm only considers exactly one nearest neighbor, which is the closest training data point to the point we want to make a prediction for

k-Neighbors classification

Instead of considering only the closest neighbor, we can also consider an _______. This is where the name of the k-nearest neighbors algorithm comes from

arbitrary number, k, of neighbors

There is also a regression variant of the _____

k-nearest neighbors algorithm.

The k-nearest neighbors algorithm for regression is implemented in the KNeighbors Regressor class in scikit-learn. It’s used similarly to KNeighborsClassifier:

k-NN Estimator

_______, also known as the coefficient of determination, is a measure of goodness of a prediction for a regression model, and yields a score between 0 and 1.

The Square Score (R^2)

A value of 1 corresponds to the perfect prediction, and a value of 0 corresponds to a constant model that just predicts the mean of the training set responses, y_train:

The Square Score (R^2)

The regression model’s score() function returns the coefficient of determination R.

Estimation of the Regression Model

Perfect Prediction: target value == prediction -> numerator == denominator

R^2 = 1

Predicting the average degree of target value: numerator == denominator,

R^2 = 0

Predicting worse than the average can result in

negative numbers

Two important parameters to the KNeighbors classifier:

The number of neighbors how you measure distance between data points

By default, _____ is used to choose the right distance measure

Euclidean distance

Strengths/Advantages of KNN

Easy to understand Works well without any special adjustments Suitable as a first-time models

Weaknesses/Disadvantages of KNN

If the number of features or samples is large, the prediction is slow and data preprocessing is important. Does not work well with sparse datasets

enerate a formula to create a best-fit line to predict unknown values

Linear models

make a prediction using a linear function of the input features

Linear models

They are called _____ because they assume there is a ___ relationship between the outcome variable and each of its predictors

linear

several real-life scenarios follow linear relations between dependent and independent variables.

Application of Linear Models

Application of Linear Models Example

The relationship between the boiling point of water and change in altitude The relationship between spending on advertising and the revenue of an organization The relationship between the amount of fertilizer used and crop yields Performance of athletes and their training regimen

Types of Linear Models

Linear Regression Logistics Regression

The algorithm is used for solving regression problems

Linear Regression

Final output of the model is numeric value (numerical predictions).

Linear Regression

The algorithm maps a linear relationship between the input features(X) and the output (y)

Linear Regression

Linear model for classification problems

Logistics Regression

It generates a probability between 0 and 1. This happens by fitting a logistic function, also known as the sigmoid function.

Logistics Regression

Logistic Regression generates a probability between 0 and 1. This happens by fitting a logistic function, also known as the _____. The function first transforms the linear regression output between 0 and 1. After that, a predefined threshold helps to determine the probability of the output values

sigmoid function

is the simplest and most classic linear method for regression

Linear Regression (aka Ordinary Least Squares)

Linear regression finds the parameters w and b that minimize the _____ between predictions and the true regression targets, y, on the training set.

mean square error

The ______ is the sum of the squared differences between the predictions and the true values.

mean square error

The “slope”parameters (w), also called _______, are stored in the coef_attribute,

weights or coefficients

the offset or ______ is stored in the intercept_attribute:

intercept (b)

a model that allows us to control complexity. One of the most commonly used alternatives to standard linear regression is ____

ridge regression

is also a linear model for regression, so the formula is used to make predictions is the same one used for OLS

Ridge Regression

Each feature should have as little effect on the outcome as possible (which translates to having a small slope), while still predicting well. This constraints is an example of what is called ______

regularization.

Regularization means explicitly restricting a model to avoid _____

overfitting.

The particular kind of Regularization used by ridge regression is known as

L2 regularization

Ridge regression is implemented in ___ function.

linear_model

a higher alpha means a more restricted model, so we expect the entries of coef_ to have smaller magnitude for a high value of alpha than for a low value of alpha

Ridge Coef

a higher alpha means ______, so we expect the entries of coef_ to have smaller magnitude for a high value of alpha than for a low value of alpha

a more restricted model

plots that show model performance as a function of dataset size are called _____

learning curves

An alternative to Ridge for regularizing linear regression is _____

Lasso

As with ridge regression, using the lasso also restricts coefficients to be close to zero, but in a slightly different way, called _____

L1 regularization

The consequence of L1 is that when using lasso, some coefficients are exactly zero. This means some features are ______ by the model

entirely ignored

A ____ allowed us to fit a more complex model which worked better on the training and testing.

lower alpha

If only some of the many traits are considered important, ____

Lasso

When you want a model that is easy to analyze and understand, ___

Lasso

The most common linear classification algorithms are:

Logistic Regression Linear Support Vector Machines

LinearSVC =

support vector classi fier

Despite its name, ____ is a classification algorithm and not a regression algorithm, and it should not be confused with LinearRegression

LogisticsRegression

For LogisticRegression and LinearSVC, the trade-off parameter that determines the strength of the regularization is called ____ and higher values of __ correspond to ______

C, C, less regularization.

In other words, when you use a ____ for the parameter C, LogisticRegression and LinearSVC try to fit the training set as best as possible, while the ____ of the parameter C, the models put more emphasis on finding a _____ that is close to zero

high value , low values, coefficient vector (w)

Using ____ of C will cause the algorithms to try to adjust to the “majority” of data points

low values

using a _____ of C stresses the importance that each individual data point be classified correctly

higher value

are a family of classifiers that are quite similar to the linear models

Naive Bayes classifiers

Training speed is faster than linear classifier

Naive Bayes Classifier

Generalization performance is slightly slower

Naive Bayes Classifier

The reason that ______ are so efficient is that they learn parameters by looking at each feature individually and collect simple per-class statistics from each feature

naive Bayes models

3 Kinds of Naive Bayes Classifier in Scikit-learn:

GaussianNB BernoulliNB MultinomialNB

_____ -> continuous data, what NB

GaussianNB

Binary data, text data, what NB

BernoulliNB

integer count data, text data, what NB

MultinomialNB

Control model complexity with alpha parameter

Naive Bayes Classifier

Smooth statistics by adding virtually positive data as much as alph

Naive Bayes Classifier

Large alpha decreases the complexity of the model but does not change the performance

Naive Bayes Classifier

_____is a high-dimensional dataset

GaussianNB

_____ and MultinomialNB are a text-like used to count sparse data.

BernoulliNB

BernoulliNB and _____ are a text-like used to count sparse data.

MultinomialNB

______ and _____ are a text-like used to count sparse data.

BernoulliNB MultinomialNB

Training and testing are fast and easy to understand and process

Naive Bayes Classifier

Works well with sparse high-dimensional datasets and is not parameter sensitive

Naive Bayes Classifier

Naive Bayes Classifier Strengths, Weaknesses, and Parameters

Control model complexity with alpha parameter Smooth statistics by adding virtually positive data as much as alpha Large alpha decreases the complexity of the model but does not change the performance GaussianNB is a high-dimensional dataset BernoulliNB and MultinomialNB are a text-like used to count sparse data. Training and testing are fast and easy to understand and process Works well with sparse high-dimensional datasets and is not parameter sensitive