Final Exam Flashcards by Moriah Frazier

An _____ is something that perceives and acts in an environment

Agent

How well did you know this?

Not at all

Perfectly

The _____ for an agent specifies the action taken by the agent in response to any percept sequence

Agent Function

How well did you know this?

Not at all

Perfectly

The _____ evaluates the behavior of the agent in an environment

Performance Measure

How well did you know this?

Not at all

Perfectly

A ______ acts so as to maximize the expected value of the performance measure, given the percept sequence it has so far

Rational Agent

How well did you know this?

Not at all

Perfectly

A ______ specification includes the performance measure, the external environment, the actuators, and the sensors

Task Environment

How well did you know this?

Not at all

Perfectly

What should be the first step when designing an agent?

A task environment

How well did you know this?

Not at all

Perfectly

The _____ implements the agent function. The design also depends on the nature of the environment.

Agent Program

How well did you know this?

Not at all

Perfectly

_______ respond directly to percepts

Simple reflex agents

How well did you know this?

Not at all

Perfectly

_______ maintain internal state to track aspects of the world that are not evident in the current percept

Model-based reflex agents

How well did you know this?

Not at all

Perfectly

______ act to achieve their goals

Goal-based agents

How well did you know this?

Not at all

Perfectly

______ try to maximize their own expected “happiness”

Utility-based agents

How well did you know this?

Not at all

Perfectly

All agents can improve their performance through _____

Learning

How well did you know this?

Not at all

Perfectly

PEAS =

Task Environment:
Performance
Environment
Actuators
Sensors

How well did you know this?

Not at all

Perfectly

______ extracts general rules from single examples by explaining the examples and generalizing the explanation

Explanation Based Learning (EBL)

How well did you know this?

Not at all

Perfectly

______ uses prior knowledge in the form of determinations to identify the relevant attributes, thereby generating a reduced hypothesis space and speeding up learning

Relevance Based Learning (RBL)

How well did you know this?

Not at all

Perfectly

_______ finds inductive hypotheses that explain sets of observations with the help of background knowledge

Knowledge Based Inductive Leaning (KBIL)

How well did you know this?

Not at all

Perfectly

________ techniques perform KBIL on knowledge that is expressed in first order logic

Inductive Logic Programming (ILP)

How well did you know this?

Not at all

Perfectly

An _______ is when the hypothesis is “unknown”

Entailment Constraint

How well did you know this?

Not at all

Perfectly

____ is the field of study that gives computers the ability to learn without being explicitly programmed

Machine Learning

How well did you know this?

Not at all

Perfectly

The examples that the system uses to learn is called the ______

Training Set

How well did you know this?

Not at all

Perfectly

The training set you feed to the algorithm includes the desired solutions, called labels

Supervised Learning

How well did you know this?

Not at all

Perfectly

The training data is unlabeled. The system tries to learn without a teacher

Unsupervised Learning

How well did you know this?

Not at all

Perfectly

The training data is partially labeled

Semi-Supervised Learning

How well did you know this?

Not at all

Perfectly

The training data is fully labeled

Self-Supervised Learning

How well did you know this?

Not at all

Perfectly

The ML training is done at large quantities, all at once

Batch Learning

ML training is done continuously over time

On-line Learing

Learns by comparing new data to properties of old data. Classifies based on similiarity

Instance Based ML

Generalization (prediction) is done by "exercising" that model on new data

Model-Based ML

- Insufficient Volume of Training Data - Low Quality Data - "Bad" Features

Challenges in ML

What is the Cardinal Rule?

Never use test data for anything but final model testing

Train the ML agent on the training data, then test the efficiency of the agents learning on the best data

Test and Validation

A piece of the training data to make adjustments and to retrain

Validation Set

_____ is when a model fails to perform well with training data

Underfitting

_____ is when ML performs well on training data, but poorly on test data

Overfitting

- Determine or predict an outcome (leaf node) based on a set of inputs - Each (non-leaf) node is a test of one of the inputs

Decision Tree Learning

_______ learning goal is to.... - Construct the shallowest tree possible - arrive at a decision using the fewest number of features - Deeper tree = more time, more features

Decision Tree Leaning

____ is the degree of uncertainty in information

Entropy

_____ is the reduction in entropy achieved by adding a variable to the decision tree

Information Gain

_____ measures the inequality among values of a frequency distribution

Gini Impurity

CART

Classification and Regression Tree Algorithm

The methods and algorithms for an agent to learn or figure out such functions is called ______

Regression Learning

_____ is an estimate of the true function

Hypothesis

____ is a set of possible hypotheses

Hypothesis Space

A function to define a measure of error in the agent's hypothesis

Error/Loss/Cost Function

_____ is changing the parameter in the hypothesis so that the cost function computes an error closer to zero or converges

Gradient Descent

Adjusting or constraining models to fit better or generalize better (NOT ADJUSTING THE DATA)

Regularization

For a ML agent to find a good model that does not overfit and generalizes well it finds a simple model by ___________

Penalizing overly complex models

L1 regularization with regression learning is called _______

LASSO Regression

What does LASSO stand for?

Least Absolute Shrinkage and Selection Operator Regression

Complexity term = sum of the squared values of feature coefficients * alpha - Good for models that don't have a lot of features

Ridge Regression

Complexity term = combination of LASSO Regression and Ridge Regression - Good for models with a lot of features, and when there are more features than training cases

Elastic Net Regression

An agent attempts to define a hypothetical function that approximates a true function

Logistic Regression Learning

A type of Unsupervised ML: Reducing the number of dimensions (features) in a dataset

Dimensionality Reduction

A type of Unsupervised ML: Finding the commonality in data

Clustering

______ is when finding data or subsets of a dataset that "don't fit" (not normal)

Anomaly Detection

_______ estimates the probabilities of outcomes, finding outliers, and weird data

Density Estimation

_________ is a methodology for identifying the principal components in a dataset

Principal Components Analysis (PCA)

A ________ is a vector or axis in data that accounts for some amount of variance in the data

Component

_______ models brain cells and assemblies of brain cells

Artificial Neural Networks

What are the two components of an artificial neuron?

A sumer and an activation function

How are weights adjusted in a deep neural network?

Gradient Descent

A relatively small area of the cerbral cortex that processes visual "input"

Visual Cortex

In humans, groups of neurons process specific parts of the visual input called local _______

Receptor Fields

The process of sweeping or scanning across data and applying the filters as it goes

Convolution

An algorithm for choosing the "best" action in reinforcement learning situation is called _____

Policy

__________ is the process of learning optimal policies

Policy Search

The policy that gives the best expected utility is referred to as the _______

Optimal Policy

A machine learning strategy that involves building multiple decision trees and combining their collective results is known as

Random Forest

In building a decision tree from a dataset with multiple features, a decision tree algorithm selects the feature for the root of the decision tree based on

the feature with the lowest Gini impurity value

Decision tree algorithms learning to predict outcomes of quantitative variables (as opposed to categorical variables) use a measure to express the inaccuracy (loss) of the model's predictions. In decision trees with quantitative outcome variables, one such measure of loss is -

mean squared error

In many machine learning algorithms there are potentially many possible models for the algorithm to explore to determine the best model for the problem of interest. The entire set of possible models that might used in the machine learning process is referred to as a

Hypothesis Space

In classification machine learning problems, one strategy is to train multiple classifiers and combine the results of these classifiers to predict the class of test or new data instances. This strategy is known as

ensemble learning

In logistic regression machine learning, the use of a sigmoid function is to -

establish a threshold for the classification of a data instance in one class vs another class.

In regression learning the inclusion of high-degree polynomial features can make the models prone to

overfitting

In machine learning the concept of regularization refers to -

methods to adjust models to minimize loss function values by constraining the model

Given a machine learning project intended to train an agent to predict a binary outcome class (for example, whether a house in a local market will sell within a specific timeframe, or not) using a set of quantitative predictors (like square feet of floor space, number of bathrooms, number of bedrooms, listing price and size of the house's lot). What type of machine learning algorithm would best fit this type of machine learning project? A. Linear regression B. normative scaling C. Bidirectional search D. Logistic regression

D. Logistic Regression

A Principle Components Analysis searches a dataset to find components that are orthogonal. What does orthogonal mean?

that the identified components are uncorrelated with each other

an algorithm to try to find groupings in dimensional datasets

K-means Clusters

K-means clustering uses the concept of centroids. What are centroids?

initially arbitrary points in n-dimensional data space used to calculate the proximity of data points in the dataset to each of the respective centroids.

In K-means clustering what is K?

the number of clusters the algorithm should try to find

The use of pretrained weights from a previously trained neural network as the initial weights for a new neural network being trained on a different but similar problem is referred to as

Transfer Learning

Applying a kernel with no weights to input data and returning the highest value in the receptor field to which the kernel was applied is known as

Maxpooling

In a convolutional neural network, a convolutional layer produces what?

Feature Maps

A pooling layer using a 2x2 kernel with a stride of 2 produces what kind of output?

output that is a 75% reduction in size relative to its input

What is the number of feature maps that can be generated by one convolutional layer?

it depends on the number of filters defined for the layer