AI 2 Flashcards
What is a maximum likelihood estimate?
The chance of the outcome most likely to occur. for example if we have a likelihood function that takes the shape of normal distriburion with observed result of success/tries against probability of each. the peak of the curve is the most likely to occur and represents the MLE
What is a cost function and what is the cost function usually for a given likelihood function L(θ|x)
A cost function is effectively measures the discrepency between an estimation of a function and the actual function. The cost function is usually -log(L(θ|x))
What is supervised learning?
Taking a dataset of input/output pairs and creating a function that describes them with as little discrepency as possible. this is used for classifying data or predicting data.
What is unsupervised learning?
Structuring data with multiple dimensions but no specific output. Unsupervised objectives may be clustering, dimensionality reduction or anomaly detection
How do we define a stationary point (w*) on a multidimensional function?
∂/∂w₁ g (w ∗) = 0
∂/∂w₂ g (w ∗) = 0
…
∂/∂wₙ g (w ∗) = 0
∴ if the point is stationary, the derivative operator with respect to each variable is 0
What value does gradient descent use for the direction of updating the parameters?
The negative of the gradient of the cost function
What equation can expresses the conditional probability of the binary dependent variable yᵢ being 1 considering the parameters θ in logistic regression? P(yᵢ=1|xᵢ)
What are the odds?
σ(xᵢ) = 1/ (1+exp(-θ⋅xᵢ))
where θ is the parameters of the independent variable vector xᵢ
odds = exp(θ⋅xᵢ)
How is the function that represents the chance of a point being y=1 based on a dataset of input vectors and coefficients written?
hθ(X ) = P(Y=1|X ;θ) = 1/ ( 1+exp(-θ⋅X) )
where capital X and Y represent the input and outputs of all the points in the training set
; in X ;θ means probability based on the two variables
What is the likelihood function for logistic regression?
L(θ|y;x) = P(Y|X;θ)= Π P(yᵢ |xᵢ ;θ)
where P(yᵢ |xᵢ ;θ) is hθ(xᵢ) if y=1 and 1-hθ(xᵢ) if y=0
How do we calculate the cost function for logistic regression?
− log(L(θ|y;x)) =Σ(i=1>N) yᵢ log(hθ(xᵢ)) + (1−yᵢ) log(1− hθ(xᵢ))
(the y and 1-y coefficients ensure that only the relevant term will be used and the other will be 0
What do we need in the dataset for logistic regression to work?
- Binary output
- Independent variables
- Variables have low multicollinearity (variables unrelated)
- There is a linear relationship between the log odds and the variables i.e. theta is made up of constant coefficients log( p/(1-p) ) = θ₀ + θ₁x₁ + …
- large sample size. rule: sample needs to be greater than 10 * samples / lower probability of y outcome for example 0.1 chance y=0
What are the three axioms of measuring the information of a given event?
- An event with probability of 100% yields no information
- The less probable an event is, the more information it yields
- If two events are measured separately, the information gathered is the sum of both informations
How is information from an event measured?
Iₓ(x) = logₐ [ 1/(Pₓ(x)) ]
where Pₓ(x) is the probability of x being the value it is
information calculated with a=2 is called bits, a=e is called natural units or nats and a=10 is called dits, bans or hartleys
How do you calculate logit(x) (log-odds)?
log (p/(1-p)) = log(p) - log(1-p) = log(x) - log(¬x)
What is entropy and how is it calculated (discrete data)?
Entropy is the uncertainty in a random variable X.
E[Iₓ(x)] = -Σ(i=1>n) P(X=xᵢ) * logₐ(P(X=xᵢ))
xi is all the values that X could be. for example a dice would be 1-6