Feature Crosses Flashcards
What is a feature cross?
A feature cross is a synthetic feature that encodes nonlinearity in the feature space by multiplying two or more input features together. (The term cross comes from cross product.)
How do you call a synthetic feature, that combines other features?
Feature cross (feature cross product).
Explain three possible examples of a feature cross.
We can create many different kinds of feature crosses. For example:
Why is it efficient to use feature crossing?
Thanks to stochastic gradient descent, linear models can be trained efficiently. Consequently, supplementing scaled linear models with feature crosses has traditionally been an efficient way to train on massive-scale data sets.
Explain example 1 of a feature cross of two one-hot encoded vectors (features).
As another example, suppose you bin latitude and longitude, producing separate one-hot five-element feature vectors. For instance, a given latitude and longitude could be represented as follows:
binned_latitude = [0, 0, 0, 1, 0]
binned_longitude = [0, 1, 0, 0, 0]
Suppose you create a feature cross of these two feature vectors:
binned_latitude X binned_longitude
This feature cross is a 25-element one-hot vector (24 zeroes and 1 one). The single 1 in the cross identifies a particular conjunction of latitude and longitude. Your model can then learn particular associations about that conjunction.
Explain example 2 of a feature cross.
Suppose our model needs to predict how satisfied dog owners will be with dogs based on two features:
Behavior type (barking, crying, snuggling, etc.) Time of day
If we build a feature cross from both these features:
[behavior type X time of day]
then we’ll end up with vastly more predictive ability than either feature on its own. For example, if a dog cries (happily) at 5:00 pm when the owner returns from work will likely be a great positive predictor of owner satisfaction. Crying (miserably, perhaps) at 3:00 am when the owner was sleeping soundly will likely be a strong negative predictor of owner satisfaction.
Different cities in California have markedly different housing prices. Suppose you must create a model to predict housing prices. Which sets of features or feature crosses could learn city-specific relationships between roomsPerPerson and housing price?
One feature cross: [binned latitude X binned longitude X binned roomsPerPerson]
Crossing binned latitude with binned longitude enables the model to learn city-specific effects of roomsPerPerson.
Binning prevents a change in latitude producing the same result as a change in longitude. Depending on the granularity of the bins, this feature cross could learn city-specific or neighborhood-specific or even block-specific effects.
What is FTRL standing for?
It is a regression algorithm (logistic regression). Follow The Regularised Leader.
Here is an implementation:
https://www.kaggle.com/jiweiliu/ftrl-starter-code
Explain a logistic function.
A logistic function or logistic curve is a common “S” shape (sigmoid curve), with equation:
f(x)=L/(1+e^(-k(x-x0))
where
e = the natural logarithm base (also known as Euler’s number),
x0 = the x-value of the sigmoid’s midpoint,
L = the curve’s maximum value, and
k = the steepness of the curve.[1]
For values of x in the domain of real numbers from −∞ to +∞, the S-curve shown on the right is obtained, with the graph of f approaching L as x approaches +∞ and approaching zero as x approaches −∞.
The function was named in 1844 (published 1845)[a] by Pierre François Verhulst, who studied it in relation to population growth.[2] The initial stage of growth is approximately exponential (geometric); then, as saturation begins, the growth slows to linear (arithmetic), and at maturity, growth stops. Verhulst did not explain the choice of the term “logistic” (French: logistique), but it follows his discussion of arithmetic growth and geometric growth (whose curve he calls a logarithmic curve, instead of the modern term exponential curve), and thus “logistic growth” is presumably named by analogy with arithmetic and geometric, logistic being from Ancient Greek: λογῐστῐκός, translit. logistikós, a traditional division of Greek mathematics,[b] and in contrast to the logarithmic curve.[c] The term is unrelated to the military and management term logistics, which is instead from French: logis “lodgings”, though some believe the Greek term also influenced logistics; see Logistics § Origin for details.