Intro Flashcards
Overfitting
the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably
Underfitting
a data model is unable to capture the relationship between the input and output variables accurately, generating a high error rate on both the training set and unseen data
Neural network
a biologically inspired mathematical function made up of artifical “neurons” that interact with one another. Neurons are typically represented by circles and arrows between neurons represent weights that describe the relationships between the neurons
Sigmoid function
A weighted sum of inputs is passed through an activation function and this output serves as an input to the next layer. When the activation function for a neuron is a sigmoid function it is a guarantee that the output of this unit will always be between 0 and 1
Recurrent neural networks (RNN)
a neural network made up of identical “units” of neurons that feed into one another in a series. The most common type of network that interacts with time series or sequential data
Long Short-Term Memory (LSTM) networks
variant of RNNs. Its Gating mechanism sets it apart. This feature addresses the short term memory problem of RNNs
Regularization
step taken to reduce Overfitting (high variance) and underfitting (high bias)
Tensor
mathematical objects that can be used to describe physical properties, just like scalars and vectors. In fact tensors are merely a generalization of scalars and vectors; a scalar is a zero rank tensor, and a vector is a first rank tensor.
Scalar
has only magnitude, no direction
Vector
having magnitude and direction
Backpropagation
short for “backward propagation of errors,” is an algorithm for supervised learning of artificial neural networks using gradient descent. Given an artificial neural network and an error function, the method calculates the gradient of the error function with respect to the neural network’s weights.
Gradient descent
an iterative first-order optimization algorithm used to find a local minimum/maximum of a given function. This method is commonly used in machine learning (ML) and deep learning(DL) to minimize a cost/loss function (e.g. in a linear regression)
Gradient
calculated by taking the partial derivative of a function with respect to each variable. Result is expressed as a vector
Jacobian matrix
the Jacobian matrix of a vector-valued function of several variables is the matrix of all its first-order partial derivatives.
Normalization
The goal of normalization is to transform features to be on a similar scale. This improves the performance and training stability of the model
Data lake
datastore where data is stores from multiple datasources. Typically dirty with missing fields and inconsistent data types. Can have schema applied and data permissioning
Data pipeline
Retrieve data from datasource (ingestion) -> apply data lake schema/permissioning parameters -> store in data lake (aggregation of data stores) -> pull data from data lake and clean/structure it (feature engineering)
Imputation
filling in missing data values
Data warehouse
different from data lake. Take slice of historical data, put data in column format as opposed to row, and apply additional transformations. Proxy between data lake and analyst. Aggregations are faster on column structured data
Confusion matrix
visualize false positives, false negatives, etc. Uses to grade your ML algorithm, not train
Classifier
In data science, a classifier isa type of machine learning algorithm used to assign a class label to a data input. An example is an image recognition classifier to label an image (e.g., “car,” “truck,” or “person”).
Hyperparameter
In machine learning, a hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters are derived via training.
Matrix factorization
Matrix factorization is a class of collaborative filtering algorithms used in recommender systems. Matrix factorization algorithms work by decomposing the user-item interaction matrix into the product of two lower dimensionality rectangular matrices.
Convolutional layer
A convolutional layer isthe main building block of a CNN. It contains a set of filters (or kernels), parameters of which are to be learned throughout the training. The size of the filters is usually smaller than the actual image
Examples of machine learning use cases
Recommender systems
Sentiment analysis
Classification/Clustering
Spam detection
Anomaly detection
Support Vector Machines
Supervised learning algorithms that analyze data for classification and regression analysis.
A simple example may be drawing a lime on a graph where the points above the line are one category and those below the line are another category.
Hyperplane
subspace whose dimension is one less than that of its ambient space. For example, if a space is 3-dimensional then its hyperplanes are the 2-dimensional planes, while if the space is 2-dimensional, its hyperplanes are the 1-dimensional lines.
Linear regression
Simple regression machine learning algorithm that predicts your response variable y based on one or more independent variables X, fitting a linear relationship between the two.
Cost function
Function used to provide a quantitative measure of a machine learning algorithms prediction inaccuracy. Mean squared error is a commonly used example
Learning curves
A good tool to understand whether overfitting is going on.