Quant 2.6 Flashcards

Question

Where can SVM algorithm be useful?

Answer 1

- Can be used for classification, regression and outlier detection, but typically used for classification problems - Well suited for small-size to medium-size complex high-dimensionality data sets Example: predicting company failures, classify text from documents into useful categories.

Answer 2

- the K-nearest neighbor is used to classify a new observation by finding similarities ("nearness") between it and its k-nearest neighbors in the existing data set. (remember triangles and x's and a square shows up)

Answer 3

- Defining the 'similar' or 'near' k, the hyperparameter of the model, must be chosen carefully. Also, different k values can give different results.

Answer 4

- Benefits: Intuitive, non-parametric, can be used directly for multi-class classification. KNN is most often used for classification & sometimes for regression. Applications: Corporate bond credit rating assignment, bankruptcy prediction, stock price prediction, customized equity and bond index creation.

Answer 5

- Can be applied to predict a categorical target variable or a continuous target variable. (Remember the example of classifying companies by whether or not they increase dividend payments) Also, a binary tree is a combination of a root node, decision nodes and terminal nodes.

Answer 6

- Use parameters such as max depth of the tree, min population at a node, max no.of decision nodes. Pruning: remove section of the tree that provide little classification power.

Answer 7

- it can uncover complex non-linear dependencies between features. Tree provides visual explanation for prediction (unlike a black-box algorithm) Some applications: Fraud detection in financial statements, generating consistent decision processes in equity and fixed-income selection, simplifying communication of investment strategies to clients.

Answer 8

- Here, we can combine predictions from a collection of models. Typically produces more accurate and more stable predictions than the best single model.

Answer 9

Majority-vote classifier (hetergeneous learning) use different algorithms to select result with most votes. Divesity is good and assumes that model predictions are independent. Bootstrap aggregating (bagging - homogeneous learning) basically generate new training data sets of data from original trainging data set and train a single algorithm on the n independent data sets to generate n models (prediction rules). This protects against overfitting. Random forest classifier (black-box algorithm) is a collection of many different decision trees generated by a bagging model or by randomly reducing the no. of features available during training. Advantages: Reduces signal to noise ratio and protects against overfitting on the training data. Applications: Prediction if an IPO will be successful & factor-based investment strategies.

Answer 10

- There are algorithms based on the two major problem types that we use for Unsupervised Machine learning: Dimension Reduction and Clustering. So, there are three algorithms: 1. PCA (Dimension Reduction) 2. K-Means (Clustering) 3. Hierarchical clustering (Clustering)

Answer 11

- Reducing the set of features to a manageable size while retaining as much of the variation in data as possible.

Answer 12

- Is a dimension reduction method where we reduce highly correlated features of data into a few uncorrelated composite variables. A 'composite' variable is a variable that combines two or more variables that are statistically strongly related to each other.

Answer 13

- Eigenvectors are the composite variables called Principal components that are linear combinations of the original features but are mutually uncorrelated composite variables (called as PC1). - Eigenvalues is proportion of total variance in the initial data that is explained by each eigenvector.

Answer 14

- The algorithm finds PC1 such that the sum of projection errors for all data points in minimised and the sum of spread between all data is maximised.

Answer 15

- Scree plots can help us decide. Scree plots show the proportion of total variance in the data explained by each principal component.

Answer 16

- PCs are difficult to interpret (Black-box algorithm). Dimension reduction facilitates visual representation of data in 2 or 3 dimensions. Dimension reduction is often performed before training another supervised or unsupervised learning model.

Answer 17

- A 'cluster' contains a subset of observations that are "similar". Observations in a cluster should be close to each other (cohesion) Observation in two different clusters should be far from each other (separation)

Answer 18

Determine hyperparameter, k, before training begins. Repeatedly partitions observations into k non-overlapping clusters (chosen randomly) Each cluster is characterized by its centroid Each observation is assigned to the cluster with the centroid to which that observation is closest.

Answer 19

Deriving alternatives to static industry classifications. Data exploration for discovering patterns in high dimensional data.

Answer 20

algorithms create intermediate rounds of clusters of increasing ('agglomerative' - bottom-up) or decreasing ('divisive' - top-down) size until a final clustering is reached.

Answer 21

is more computationally intensive compared to k-means clustering. allows analyst to examine alternative segmentations of data of different granularity before deciding which one to use. doesn't rely on a hyperparameter.

Answer 22

It highlights the hierarchical relationships among clusters.

Answer 23

Portfolio diversification Uncovering important underlying structure in complex data sets Discovering patterns in high dimensional data Deriving alternatives to static industry classifications

Answer 24

Neural networks have layers of nodes connected by links Input layer nodes correspond to features Hidden layer(s) feed output node Output node generates predicted value

Answer 25

It happens in the hidden layers. Each hidden node has 2 functional parts: Summation operator - Multiplies each value by a weight and sums the weighted values to form the total net input Activation formula - acts like a light dimmer switch that decreases or increases the strength of the input Learning takes place in the hidden layer through improvements in weights applied to nodes with the aim of reducing total error.

Answer 26

DLNs are sophisticated neural networks. Neural networks with many hidden layers - at least 3 but often more than 20 hidden layers are known as DLNs DLNs are the backbone of the AI revolution and are used in complex activities such as image, pattern and speech recognition.

Answer 27

RL algorithm involves an agent that should perform actions that will maximize its rewards over time, taking into consideration the constraints of its environment. The algo observes its environment, learns by testing new actions, and reuses its previous experiences. Learning occurs through millions of trials and errors.

Quant 2.6 Flashcards

(51 cards)