Quantitative Methods Flashcards
Primary areas of fintech
- Increasing functionality to handle large sets of data that may come from many sources and exist in a variety of forms
- Tools and techniques for analyzing a very large data set, such as artificial intelligence.
- Automation of financial functions such as executing trades and providing investment advice.
- Emerging technologies for financial recordkeeping that may reduce the need for intermediaries.
Big Data
Big Data refers to all the potentially useful information that is generated in the economy, including data from traditional sources and alternative data.
Corporate exhaust
Businesses that generate potentially useful information such as bank records and retail scanner data
Internet of Things
Sensors, such as radio frequency identification chips, that are embedded in numerous devices such as smart phones and smart buildings
Characteristics of Big Data
Volume, velocity, and variety
Data processing method include
- Capture—collecting data and transforming it into usable forms.
- Curation—assuring data quality by adjusting for bad or missing data.
- Storage—archiving and accessing data.
- Search—examining stored data to find needed information.
- Transfer—moving data from their source or a storage medium to where they are needed.
Neural Networks
Example of artificial intelligence in that they are programmed to process information in a way similar to the human brain
Machine Learning
This refers to programming that gives a computer system the ability to improve its performance of a task over time. The machine learning process typically requires vast amounts of data.
- In supervised learning, the input and output data are labeled, the machine learns to model the outputs from the inputs, and then the machine is given new data on which to use the model.
- In unsupervised learning, the input data are not labeled and the machine learns to describe the structure of the data.
- Deep learning is a technique that uses layers of neural networks to identify patterns, beginning with simple patterns and advancing to more complex ones. Deep learning may use supervised or unsupervised learning.
ML can produce models that overfit or underfit the data.
- Overfitting occurs when the machine learns the input and output data too exactly, treats noise as true parameters, and identifies spurious patterns and relationships.
- Underfitting occurs when the machine fails to identify actual patterns and relationships, treating true parameters as noise.
Fintech Application
- Text analytics refers to the analysis of unstructured data in text or voice forms.
- Natural language processing refers to the use of computers and artificial intelligence to interpret human language.
-
Algorithmic trading refers to computerized securities trading based on a predetermined set of rules.
- High-frequency trading identifies and takes advantage of intraday securities mispricings
-
Robo-advisors are online platforms that provide automated investment advice based on a customer’s answers to survey questions.
- The primary advantage of robo-advisors is their low cost to customers.
- A disadvantage of robo-advisors is that the reasoning behind their recommendations might not be apparent.
Distributed ledger
A distributed ledger is a database that is shared on a network so that each participant has an identical copy. A distributed ledger must have a consensus mechanism to validate new entries into the ledger. Distributed ledger technology uses cryptography to ensure only authorized network participants can use the data.
Distributed ledgers can take the form of permissionless or permissioned networks.
- In permissionless networks, all network participants can view all transactions. These networks have no central authority, which gives them the advantage of having no single point of failure. The ledger becomes a permanent record visible to all, and its history cannot be altered (short of the manipulation described previously). This removes the need for trust between the parties to a transaction.
- In permissioned networks, users have different levels of access. For example, a permissioned network might allow network participants to enter transactions while giving government regulators permission to view the transaction history. A distributed ledger that allowed regulators to view records that firms are required to make available would increase transparency and decrease compliance costs.
Blockchain
A blockchain is a distributed ledger that records transactions sequentially in blocks and links these blocks in a chain. Each block has a cryptographically secured “hash” that links it to the previous block. The consensus mechanism in a blockchain requires some of the computers on the network to solve a cryptographic problem. These computers are referred to as miners.
Financial applications of distributed ledger technology
- Cryptocurrencies are a current example of distributed ledger technology in finance. It allows participants to engage in real-time transactions without a financial intermediary and typically resides on permissionless networks.
- Initial coin offerings sell cryptocurrency for money or another cryptocurrency. This reduces the cost and time frame compared to carrying out a regulated IPO, and initial coin offerings typically do not come with voting rights. Fraud has occurred with initial coin offerings and they may become subject to securities regulations.
- Smart contracts are electronic contracts that could be programmed to self-execute based on terms agreed to by the counterparties.
- Tokenization refers to electronic proof of ownership of physical assets, which could be maintained on a distributed ledger.
Post-trade clearing and settlement is an area of finance to which distributed ledger technology might be productively applied. Distributed ledgers could automate many of the processes currently carried out by custodians and other third parties. On the other hand, the inability to alter past transactions on a distributed ledger is problematic when canceling a trade is required.
Covariance
- A statistical measure of the degree to which the two variables move together.
- Captures the linear relationship between two variables.
- A positive covariance indicates that the variables tend to move together, vice versa.
- May range from negative to positive infinity, and it is presented in terms of squared units
Correlation coefficient (r)
- A measure of the strength of the linear relationship (correlation) between two variables. No unit.
- –1 ≤ r ≤ +1
Spurious Correlation
Spurious correlation refers to the appearance of a causal linear relationship when, in fact, there is no relation.
Simple Linear Regression
The purpose of simple linear regression is to explain the variation in a dependent variable in terms of the variation in a single independent variable.
Dependent vs. independent variable
- The dependent variable is the variable whose variation is explained by the independent variable. Also referred to as the explained variable, the endogenous variable, or the predicted variable.
- The independent variable is the variable used to explain the variation of the dependent variable. Also referred to as the explanatory variable, the exogenous variable, or the predicting variable.
Linear Regression Assumptions
- Linear relationship between Y and X
- No exact linear relationship among X’s (Multicollinearity)
- The expected value of the residual term is zero [E(ε) = 0].
- The variance of the residual term is constant for all observations[E(εi2)=σε2]. (Heteroskedasticity)
- The residual term is independently distributed; that is, the residual for one observation is not correlated with that of another observation [E(εiεj)=0,j≠i].
- The residual term is normally distributed.
Slope Coefficient
slope coefficient for the regression line describes the change in Y for a one unit change in X.
Sum of squared errors (SSE)
The sum of the squared vertical distances between the estimated and actual Y-values is referred to as the sum of squared errors (SSE).
Simple linear regression is frequently referred to as ordinary least squares (OLS) regression, and the values estimated by the estimated regression equation, Yi, are called least squares estimates.
Standard error of estimate (SEE)
- Measures the degree of variability of the actual Y-values relative to the estimated Y-values from a regression equation.
- The smaller the standard error, the better the fit.
- The SEE is the standard deviation of the error terms in the regressions, also referred to as the standard error of the residual, or standard error of the regression.
- Equal to the square root of the MSE
Coefficient of determination (R2)
- The coefficient of determination (R2) is defined as the percentage of the total variation in the dependent variable explained by the independent variable. R2 = r2 for a regression with one independent variable. This approach is not appropriate when more than one independent variable is used in the regression
- R2 by itself may not be a reliable measure of the explanatory power of the multiple regression model, because R2 almost always increases as variables are added to the model, even if the marginal contribution of the new variables is not statistically significant. A relatively high R2 may reflect the impact of a large set of independent variables rather than how well the set explains the dependent variable. This problem is often referred to as overestimating the regression.
- Adjusted R2 is always less than or equal to R2
Analysis of variance (ANOVA)
a statistical procedure for analyzing the total variability of the dependent variable.
Total sum of squares (SST)
Regression sum of squares (RSS)
Sum of squared errors (SSE)
- Total sum of squares (SST) measures the total variation in the dependent variable.
- Regression sum of squares (RSS) measures the variation in the dependent variable that is explained by the independent variable.
- Sum of squared errors (SSE) measures the unexplained variation in the dependent variable.
- Total variation = explained variation + unexplained variation
- SST = RSS + SSE
Mean regression sum of squares (MSR)
Mean squared error (MSE)
The mean regression sum of squares (MSR) and mean squared error (MSE) are simply calculated as the appropriate sum of squares divided by its degrees of freedom.
R2
The R2 is the percentage of the total variation in the dependent variable explained by the independent variable
Partial Slope Coefficients
The slope coefficients in a multiple regression.
Each slope coefficient is the estimated change in the dependent variable for a one-unit change in that independent variable, holding the other independent variables constant.
F-statistic
- An F-test assesses how well a set of independent variables, as a group, explains the variation in the dependent variable. In multiple regression, the F-statistic is used to test whether at least one independent variable in a set of independent variables explains a significant portion of the variation of the dependent variable.
- F = MSR / MSE
- Always a one-tailed test