teste 2 Flashcards
What is a state space search?
it is a process in which successive states are considered with the intention of finding the goal state
in a state space search what are the variables and fucntions
S: all possible states
A : all possible actions for a state
Action(s) action allowed to be performed when state is s
Results(s,a) results when a action is taken in s state
Costs(s,a) costs of doing a in s state
What are the three examples of state space search?
depth first search
breadth first search
A*-heuristic search
In depth first search how does the algorithm proceed?
the root node is selected in the tree plot and each branch is explored fully in order until the goal state is found
breadth-first search
explores nodes at each level before moving to the next
Pros and cons of depth first search
pro: low memorie requirement
con: slow, may not find solution
pros and cons of breadth first search
pro:garantees solution
cons: high memory cost
What is A*-heuritic search
it is an informed seach algorithm that aims to minimize costs( like memory and time) from the star till the goal node
In a A*-heuristic search the formula for cost is f(n) = g(n)+h(n) what does h(n) and g(n) mean
h(n) represent the heuristic function, in this case the cost from n to the goal
g(n) represents the cost of a step
In machine learning when the classes are unkown what type of classification is used
clustering
Check distance formulas
check them
what is cross validation
a way of training and testing your classifictaion method in machine learning
what are the four types of cross validateion
leave one out
bootstap
n-fold
split test
quickly descrive the cross validation methods
split test: half the set is training the rest is testing
bootstrap: random datapoints are selected to make a set (for testing and training) (there is reposition)
n-fold: testing successive and intersecting arrays of data against the rest
leave one out: one single data point is used as testing
In a confusion matrix, which axis must have a sum of 100%
vertical
Formula for the true positive rate
TPR = TP/P = 1-FNR
True negative rate formula
TNR = TN/N
The positive prediction value (PPV)
PPV = TP/PP = 1- FDR
FDR - False discovery rate
F1 score formual
2PPV*TPR/PPV+TPR
Accuracy
Acc = TP+TN/P+N
What is PP ?
The total number of things labled positive (TP + FP)
What is P?
Total number of positives
TP + FN
What is the complete machine learning system
sample-extaction1-classifier3-evaluation12-decision
1- learning
2-reporting
There are two types of models in machine leraving are are they and describe them
Discriminative and Generative models
Descriminative: They foccus onf distinction among classes, learning decision boundaries (ex: K-NN, SVM, Regression)
Generative: model how data is placed throughout the space, focusing on characteristics and a known model
What is Gradient descent and it’s goal
Gradient descent is the minimum of the derivative of the MSE.
Label and explain the different gradian descent models.
Stochastic (gd): a point in the GD is chosen at random in each step. It is fast, and good for redundant data.
Batch(gd): all samples are tested per iteration
MInibatch(gd): sub set per iteration
How does the step size affect the linear regression
see images on step size
Regularized linear models name them:
Ridge regression
Lasso
iHow does ridge regression work
In ridge regression, some bias is applied to the final loss function of the trained data.
This bias, when applied to the testing data, will in the long term, provide better fittings to the data.
This is a way to avoid overfitting.
How does ridge regression work
a linear regression model is dependent on the training data.
However the training dataset is not representative of the set/samples. So a small amount of bias is introduced in the error function- causing the function not a align with the trainingset perfectly. —Make model shit during training
As the iterations of fitting go on, the final fit should be better than the simple one.
This fights overfitting. We do not want Mse to be zero all the time.
In the ridge regression what is the porpose of A, what does it do, how does its increase affect slope
MAKINg sure that both mse and the slope have the same units
controls the severity of the penalty to the mse.
decreased the slope.
describe Lasso,
Lasso: least absolute shrinkage and selection operator regression
The penalty is not squared, but abs, it is very similar to ridge, but it can achieve slope 0.
In lasso, why is zero slope usefull:
to erase contribuition from useless parameters in the determination of y
What is early stopping
it is ml technique uselfuu to avoid overfitting and underfitting
it determines the balance between overfitting and underfitting by determining the number of steps to take.
A plot is usually created, Loss/Iteration .
The training set always decreases in loss with the number of iterations.
At each step the model should be testing with the testing data and if the loss starts to increase (in the testing datset) starts to increase then the model should stop.,
Logistic regression
preditc if somthing is (T of F / 0 or 1)
the cost function takes into account both options
Classificatiuon is the same as regression
No it is not
in a Decision tree[CART], what are the inputs
GINI and Entropy
Gini formula
1-sum(square of probabilities)
Loss in desion tree formula
go see
In classification you have
Tree
Knn
Voronoi
SVM
What is the kernel trick in svm
it is a trick used when a way to divide data with a line is not possible. What is done is the mapping is changed and then divided and then reverted
Whta does Parametric mean:
n of parameters is fixed in order to predict class
Parametric, compared to non-parametric is
simple, fast, less data, contrained and has poor fits
O que é genrative no modo parametrico
as probabiliadeds de classificação apriori são gaussinaas, e o n de para,metros é fixo
Parzen windows é?E utiliza?
um método generativo de ML em a divisão de informação pode ser feita com gaussianas?
Utiliza a silverman’s rule para determinara espessura das gaussianas
Descreve Bayes Classifier
Processo de classificação baseado na maximazação de classificações corretas.
\
Baseado no teorema
p(y|x) p(x) = p(x|y)p(y)
O que é a curse of dimentionality
com o aumento de dimencionamento, o aumento do n de dados aumento exponecialmente
Naive bayes, explain it
é como o bayes classification, mas as features sao dadas como independentes, ou seja a probabiliade aprioir de um set de dados é dada pela probabilidade apriori de dado 1dado2dado…..
p(x|class ) = p(x1|class)p(x2|class)….
In bayes and naive byes the goal is to:
maximinse p(y|x)
Combining classifiers: what are the types
combining ^y
combining p(y|x)
ensamble
in emsamble what types exist?
bagging(pasting)
random forest
boosting
stacking
Explain bagging(pasting)
the set is devided and a classifier is applied to each set
explain boosting
selection of samples taht dindr workn and boosting the second round of classifiers with those.
staking
train classifeir on the prediction o fseleveal classisfiers
Unsupervised learning is:
no known classifications are available the data is the only information
Unsuperperfised leaving has to reduce the number of features this is done via:
Filter method and wrapper method and embebed
in unsupervised learning describe filter methods:
they filter out the features taht are not distinctive (if two features are highly correlated, then they are excluded)
in unsupervised learning deacribe the wrapper method
checaks the data correlation (not the correlation between two feautes) like the filter method but for data. If the data on a feautre is very widespread than that feaure is not good!
Features are ranked
Embebed methods are:
there are tow modes: lets talk foawrd
each feature is tested with the classifeir.
the best features i then added to a list
next the classifer is ran again against all the features with the best feature from before. until therror is no longer decreasinf
Clustering what is it and how is it done
formation of groups through the adjusment of the groups centrod via many iterattins.
done with k-means
Describe k-means
minimixação da distancia dos pontos à centriod que muda com as iterações
desvamntagens do k-menas
as posições dos centroides podem depender da posição inicial randomizda
e
o numeor de clusters selcionado iniclamente pode vir a fazer merda
Como os valores das centroides podem varaia com a i8nicialização entao que solução existe
clustering hierarquico
descrebe clustering heirarquico
that bubble shit man
What are gaussina mixture models
A Gaussian mixture model (GMM) is a probabilistic model that assumes that the
instances were generated from a mixture of several Gaussian distributions whose
parameters are unknown.
What is statified sampling
obriga datasets a aser as imbalanced as the data itself.
XAi
computer explains the reason be3hind its decisions
active learning
computer selects data and the user classifies it
reenforcement
ML is trained with new data set