2. Training Machine Learning Algorithms for Classification Flashcards
Describe nerve cell as a simple logic gate with binary outputs
multiple signals arrive at the dendrites, are then integrated into the cell body, and, if the accumulated signal exceeds a certain threshold, an output signal is generated that will be passed on by the axon.
Describe a simple perceptron learning algorithm.
Check Wikipedia
What does vectorization mean and what are the benefits of using numpy / this concept of vectorization?
Vectorization means that an elemental arithmetic operation is automatically applied to all elements in an array. By formulating our arithmetic operations as a sequence of instructions on an array rather than performing a set of operations for each element one at a time, we can make better use of our modern CPU architectures with Single Instruction, Multiple Data (SIMD) support. Furthermore, NumPy uses highly optimized linear algebra libraries, such as Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) that have been written in C or Fortran. Lastly, NumPy also allows us to write our code in a more compact and intuitive way using the basics of linear algebra, such as vector and matrix dot products.
What is One-vs.-All (OvA), or sometimes also called One-vs.-Rest (OvR)?
is a technique, us to extend a binary classifier to multi-class problems. Using OvA, we can train one classifier per class, where the particular class is treated as the positive class and the samples from all other classes are considered as the negative class. If we were to classify a new data sample, we would use our Training a perceptron model on the Iris dataset classifiers, where Training a perceptron model on the Iris dataset is the number of class labels, and assign the class label with the highest confidence to the particular sample.
What is an objective function and give a common example?
One of the key ingredients of supervised machine learning algorithms is to define an objective function that is to be optimized during the learning process. This objective function is often a cost function that we want to minimize..
What is the feature scaling method called standardization
gives our data the property of a standard normal distribution. The mean of each feature is centered at value 0 and the feature column has a standard deviation of 1
What is online learning?
In online learning, our model is trained on-the-fly as new training data arrives. This is especially useful if we are accumulating large amounts of data. An advantage of stochastic gradient descent.
What is mini-batch learning?
Mini-batch learning can be understood as applying batch gradient descent to smaller subsets of the training data—for example, 50 samples at a time. The advantage over batch gradient descent is that convergence is reached faster via mini-batches because of the more frequent weight updates. Furthermore, mini-batch learning allows us to replace the for-loop over the training samples in Stochastic Gradient Descent (SGD) by vectorized operations, which can further improve the computational efficiency of our learning algorithm.
What is Adaline?
ADAptive LInear NEuron (Adaline).The Adaline algorithm is particularly interesting because it illustrates the key concept of defining and minimizing cost functions. The key difference between the Adaline rule and Rosenblatt’s perceptron is that the weights are updated based on a linear activation function rather than a unit step function like in the perceptron. In Adaline, this linear activation function Adaptive linear neurons and the convergence of learning is simply the identity function of the net input.