Math Flashcards
Alternative hypothesis
In statistical hypothesis testing, the null hypothesis and alternative hypothesis are two mutually exclusive statements.
The alternative hypothesis (often denoted as Ha or H1) is a statement that contradicts the null hypothesis and usually assumes that hypothesised effect exists. It represents the researcher’s hypothesis or the claim to be tested. The alternative hypothesis suggests that there is a significant effect, relationship, or difference between variables in the population, while null hypothesis usually states that there is no effect.
Arg Max function
Arg Max (arg max): A mathematical function that returns the input value where a given function achieves its maximum value. In other words, it finds the input that makes the function’s output the highest.
arg maxₓ f(x) = {x | f(x) = max(f(x’))}
(where x’ represents all possible inputs)
Common Uses:
* Optimization problems
* Machine learning algorithms
* Decision-making (finding the best solution)
Average (mean)
Average (Mean): A measure of central tendency representing the typical or central value in a dataset.
Calculation: Sum of all values divided by the total number of values.
Mathematical Formula:
x̄ = (1/n) Σ_{i=1}^n x_i
(Where x̄ is the average, n is the number of data points, and x_i represents each value)
Uses:
* Descriptive statistics
* Data analysis
* Comparing datasets or groups
Base rate
Refers to the underlying probability of an event occurring in a population, regardless of other factors. It serves as a benchmark for assessing the likelihood of an event. Understanding the base rate is crucial for making accurate predictions and evaluating the performance of predictive models. For example, in medical diagnosis, the base rate might represent the prevalence of a disease within a certain population, providing valuable context for interpreting diagnostic test results.
Basis
In linear algebra, a basis is a set of linearly independent vectors that span a vector space, meaning any vector in the space can be expressed as a unique linear combination of basis vectors. Basis vectors form the building blocks for representing and understanding vector spaces, facilitating operations such as vector addition, scalar multiplication, and linear transformations. For example, in Euclidean space, the standard basis consists of orthogonal unit vectors aligned with the coordinate axes (e.g., {(1, 0, 0), (0, 1, 0), (0, 0, 1)} for 3-dimensional space), enabling the representation of any point in the space using coordinates along these axes.
Bellman Equations
Bellman Equations are a set of recursive equations used in dynamic programming and reinforcement learning to express the value of a decision problem in terms of the values of its subproblems. They provide a way to decompose a complex decision-making process into smaller, more manageable steps.
Application: Bellman Equations are fundamental in reinforcement learning algorithms such as value iteration and Q-learning, where they are used to compute the optimal value function or policy for a given environment.
Example: In a grid world environment where an agent must navigate to a goal while avoiding obstacles, Bellman Equations express the value of each state as the immediate reward plus the discounted value of the subsequent state reached by taking an optimal action.
Bernoulli Distribution
A discrete probability distribution that models the outcomes of a binary random experiment. It is characterized by a single parameter, p, representing the probability of success (usually denoted by 1) in a single trial and the probability of failure (denoted by 0) as 1 - p. The distribution is commonly used to model simple events with two possible outcomes, such as success or failure, heads or tails, and yes or no.
Binomial coeficient formula
The formula calculates the number of ways you can choose a smaller group (k) out of a larger group (n) when the order you pick them in doesn’t matter.
Formula: (n k) = n! / (k! (n-k)!)
Binomial distribution
Binomial Distribution: A discrete probability distribution describing the number of successes in a fixed number of independent trials, each with the same success probability (p). Modeling binary outcomes (success/failure) in various fields.
Common Notation: B(n, p)
* n: Number of trials
* p: Probability of success on each trial
The probability mass function (PMF) of the binomial distribution gives the probability of observing exactly k successes in n trials:
P(X = k) = (n k) pᵏ (1 - p)ⁿ⁻ᵏ
(where ‘k’ is the number of successes and (n k) is the binomial coefficient)
Block Matrices
Matrices composed of smaller submatrices arranged in a rectangular array.
Capital Sigma Notation
The summation over a collection X = {x1, x2, . . . , xn≠1, xn} or over the attributes of a vector x = [x(1), x(2), . . . , x(m≠1), x(m)]
Cartesian coordinate system
The Cartesian coordinate system, named after the French mathematician René Descartes, provides a geometric framework for specifying the positions of points in a plane or space using ordered pairs or triplets of numbers, respectively. In a two-dimensional Cartesian coordinate system, points are located with reference to two perpendicular axes, usually labeled x and y, intersecting at a point called the origin. The coordinates of a point represent its distances from the axes along the respective directions. The Cartesian coordinate system serves as the foundation for analytic geometry, facilitating the study of geometric shapes, equations, and transformations in mathematical analysis and physics.
Cauchy Distribution
A probability distribution that arises frequently in various areas of mathematics and physics. It is characterized by its symmetric bell-shaped curve and heavy tails, which indicate that extreme values are more likely compared to other symmetric distributions like the normal distribution. The Cauchy distribution has no defined mean or variance due to its heavy tails, making it challenging to work with in statistical analysis. However, it has applications in fields such as physics, finance, and signal processing.
Centra Limit Theorem (CLT)
A key concept in statistics that states that the distribution of sample means from any population approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. This theorem is crucial in inferential statistics as it allows for the estimation of population parameters and the construction of confidence intervals and hypothesis tests, even when the population distribution is unknown or non-normal. The Central Limit Theorem is widely applied in various fields, including finance, biology, and engineering, where statistical inference is essential.
Central Tendencies
Central tendencies, also known as measures of central tendency, are summary statistics that describe the central location or typical value of a dataset. They provide insights into the distribution of data and help summarize the main features of the dataset. The three main measures of central tendency are the:
- mean (the arithmetic average of all values in the dataset and is sensitive to outliers)
- median (the middle value of the dataset when the values are arranged in ascending or descending order and is robust to outliers.)
- mode. (the most frequently occurring value(s) in the dataset and is applicable to both numerical and categorical data.)
Chain rule
Chain Rule (Calculus): A fundamental rule for finding the derivative of composite functions (functions made up of other functions).
States: The derivative of the composite function f(g(x)) is equal to the derivative of the outer function f evaluated at the inner function g(x), multiplied by the derivative of the inner function g. In mathematical notation, this can be expressed as:
d/dx [f(g(x))] = f’(g(x)) * g’(x)
where f’(g(x)) represents the derivative of the outer function f evaluated at g(x), and g’(x) represents the derivative of the inner function g.
the derivative of a sum is the sum of derivatives:
u = w1x1 + w2x2 + b
y = f(u) //(f being the activation function)
dy/dx1 = (dy/du) * (du/dx1) = f’(u) * w1
dy/dw1 = (dy/du) * (du/dw1) = f’(u) * x1
dy/db = (dy/du) * (du/db) = f’(u) * 1
Each coefficient (weight) or bias has its own “chain” within the overall calculation. The derivative of the activation function (f’(u)) is a common factor dictating how much a change anywhere in the input (u) affects the output. This is the core of why we can calculate the contribution of individual weights and biases to the error during backpropagation!
Codomain
In mathematics, the codomain of a function is the set of all possible output values or elements that the function can produce. It is the set of values to which the function maps its domain elements. The codomain represents the entire range of possible outputs of the function, regardless of whether all elements in the codomain are actually attained by the function.
The codomain is distinct from the range, which refers to the set of actual output values produced by the function when evaluated on its domain. In function notation, the codomain is typically denoted as the set Y in the function f: X → Y, where X is the domain of the function and Y is the codomain.
The codomain provides information about the possible outputs of a function and helps define the scope and range of the function’s behavior.
Combinatorics
A branch of mathematics concerned with counting, arranging, and analyzing the combinations and permutations of finite sets of objects. In machine learning and artificial intelligence, combinatorics plays a crucial role in feature engineering, model parameterization, and optimization algorithms.
Concave function
The opposite of Convex Function. Function in which whenever you connect to points of the function, then a line is always above function graph. Concave functions are essential in convex optimization, where they serve as objective functions or constraints in optimization problems. In machine learning and artificial intelligence, concave functions find applications in convex optimization algorithms, such as gradient descent, for training models, minimizing loss functions, and solving constrained optimization problems. Understanding concave functions is crucial for designing efficient optimization algorithms and analyzing the convergence properties of machine learning models.
Conditional distribution
The probability distribution of a random variable given the value or values of another variable. It describes the likelihood of observing certain outcomes of one variable given specific conditions on another variable. Conditional distributions are fundamental for modeling dependencies and relationships between variables in probabilistic models, Bayesian inference, and predictive modeling tasks.
Confidence Intervals
Statistical intervals used to estimate the range of plausible values for a population parameter, such as the mean or proportion, based on sample data. They provide a measure of uncertainty around the point estimate and quantify the precision of estimation. Confidence intervals are essential for hypothesis testing, parameter estimation, and assessing the reliability of statistical inference in machine learning and data analysis.
Continous
Continuous variables are those that can take any real value within a certain range or interval. They are characterized by an infinite number of possible values and are typically represented by real numbers. Continuous variables are prevalent in data analysis, modeling, and predictive tasks, such as regression analysis, time series forecasting, and density estimation.
Continous random variable
In contrast to discrete random variables, continuous random variables can take on an infinite number of possible values within a specified range. These values are typically associated with measurements or quantities that can take any value within a certain interval. Continuous random variables are described by probability density functions (PDFs), which indicate the likelihood of observing a value within a given range. Examples of continuous random variables include height, weight, temperature, and time.
Continous variable
A type of quantitative variable that can take on an infinite number of values within a specified range or interval. Continuous variables are characterized by having an uncountable and infinite number of possible values, including both whole numbers and fractional values. They can take on any value within their range, and the concept of “gaps” between values is not meaningful. Continuous variables are typically represented by real numbers and are subject to arithmetic operations such as addition, subtraction, multiplication, and division.
Examples of continuous variables include measurements such as height, weight, temperature, time, and distance.
Convex
In mathematics, a set or function is said to be convex if every line segment connecting two points within the set lies entirely within the set itself. In other words, a set is convex if, for any two points x and y in the set, the line segment connecting x and y is also contained in the set. Similarly, a function is convex if its epigraph (the region lying above the graph of the function) is a convex set.
Convexity is a fundamental concept in optimization, geometry, and mathematical analysis, with many important properties and applications. Convex sets and functions have desirable properties such as uniqueness of solutions, global optimality, and efficient optimization algorithms. Convexity plays a crucial role in convex optimization problems, machine learning algorithms, economics, game theory, and signal processing, among other fields.
Convexity plays a crucial role in machine learning optimization problems. It simplifies the optimization process by ensuring well-behaved objective functions, allowing efficient algorithms like gradient descent to find global minima. Convexity guarantees that any local minimum is also a global minimum, providing confidence in the optimality of solutions. Convex problems are robust to initialization, making optimization less sensitive to starting points. Additionally, convexity promotes generalization by leading to simpler models with fewer parameters and facilitating the use of regularization techniques.
Corelation matrix
A square matrix that summarizes the correlation coefficients between pairs of variables in a dataset. Each entry in the matrix represents the correlation between two variables, indicating the strength and direction of their linear relationship. Correlation matrices are commonly used in exploratory data analysis and feature selection to identify patterns, dependencies, and multicollinearity among variables in machine learning and statistical modeling.
Covariance
A statistical measure that quantifies the degree of joint variability between two random variables. It indicates the tendency of the variables to vary together, either positively or negatively, from their respective means. Positive covariance indicates that the variables tend to increase or decrease together, while negative covariance indicates that one variable tends to increase as the other decreases. Covariance is a fundamental concept in statistics, machine learning, and finance, where it serves as a measure of linear relationship between variables.
Covariance matrix
A square matrix that summarizes the covariances between pairs of variables in a dataset. It is a symmetric matrix where each entry represents the covariance between two variables. Covariance matrices are essential in multivariate statistics and machine learning, where they characterize the relationships and variability among multiple variables simultaneously. In machine learning, covariance matrices are used in techniques such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Gaussian distribution modeling for dimensionality reduction, feature selection, and statistical inference.
Covariance of data
- Measures: The direction and degree to which two random variables change together.
- Range: Can range from negative infinity to positive infinity.
- Units: The units reflect the product of the units of the two variables being measured. This makes it harder to interpret directly.
- Impact of scaling: If you change the scale of one or both variables (e.g., switch from inches to centimeters), the covariance value will also change.
Critical value
A threshold or reference point used in statistical hypothesis testing to determine the significance of test results. It represents the boundary beyond which the null hypothesis is rejected or the test statistic is considered extreme enough to warrant further investigation. Critical values are derived from probability distributions, such as the standard normal distribution or t-distribution, and correspond to specific levels of significance or confidence levels. Critical values play a crucial role in hypothesis testing, confidence intervals, and decision-making in statistical analysis.
Cumulative distribution function (CDF)
A probability distribution function that represents the probability that a random variable takes on a value less than or equal to a given point. In other words, it provides the cumulative probability distribution of a random variable. The CDF is often denoted by F(x) and is used to analyze and understand the probability distribution of continuous and discrete random variables. It is a fundamental concept in statistics and probability theory, commonly used in hypothesis testing, estimation, and modeling.
Density estimation
A statistical technique used to estimate the probability density function (PDF) of a random variable based on observed data. It involves estimating the underlying distribution of the data points in a continuous domain. Density estimation methods include parametric approaches such as kernel density estimation and non-parametric approaches such as histograms and nearest neighbor methods. Density estimation is commonly used in exploratory data analysis, modeling univariate and multivariate distributions, and generating synthetic data for simulation and modeling.
It helps understanding Probability Distributions (by Visualizing the overall shape and spread of a distribution from a data sample. Identify modes (peaks) in the data, suggesting possible clusters. Detect outliers – unusual points residing in very low probability areas.)
Dependent (variable)
The relationship between two or more random variables where the value of one variable influences or is influenced by the value of another variable. Dependent variables are interconnected and exhibit some form of correlation, association, or causality. Understanding dependent relationships is crucial for modeling and analyzing complex systems, conducting hypothesis testing, and making predictions in various fields such as finance, economics, and social sciences.
Derivative
The slope, often referred to as the derivative in calculus, is a fundamental concept that measures how a function changes as its input changes. Geometrically, the slope represents the steepness of the tangent line to the function’s graph at a given point. A positive slope indicates that the function is increasing, while a negative slope indicates that the function is decreasing. A slope of zero indicates that the function is neither increasing nor decreasing at that point.
The derivative represents the rate of change or the slope of the function at a particular point. It measures how the function value changes with respect to a small change in the independent variable. The derivative is a fundamental concept in calculus and mathematical analysis, used to analyze the behavior of functions, optimize functions, and solve differential equations. In machine learning and optimization, derivatives are essential for gradient-based optimization algorithms such as gradient descent.
The general formula for calculating the derivative of a function f(x) with respect to its input variable x is denoted as f’ (x) or df/dx . It is defined as the limit of the difference quotient as the change in x approaches zero:
f′(x) = lim h→0 ((f(x+h) - f (x)) / h )
Derivative of power functions: Constant
f’(x) = 0
Derivative of power functions: Cubic
f’(x) = 3x^2
Derivative of power functions: Exponential
f(x) = e^x
f’(x) = e^x
Derivative of power functions: General formula (power rule)
f(x) = x^n
f’(x) = nx^n-1
Derivative of power functions: i/x
f’(x) = -x^-2
Derivative of power functions: Inverse function
g’(y) = 1 / f’(x)
Derivative of power functions: Line
f(x) = ax + b
f’(x) = a
Derivative of power functions: Quadratic
f’(x) = 2x
Derivative of power functions: Trigonometric function
- Sine: The derivative of the function sin(x) is cos(x).
- Cosine: The derivative of the function cos(x) is -sin(x).
- Tangent: The derivative of the function tan(x) is sec^2(x), or 1/cos^2(x).
Descriptive Statistics
Statistical techniques employed to summarize and describe the main features of a dataset. They encompass measures such as the mean, median, mode, standard deviation, range, skewness, and kurtosis. Descriptive statistics offer a comprehensive overview of dataset characteristics, aiding in interpretation, comparison, and decision-making across various fields such as economics, finance, and social sciences. They provide valuable insights into the distribution, variability, and shape of data, facilitating data-driven decision-making and hypothesis testing.
Determinant
In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. The determinant of a matrix A is commonly denoted det(A), det A, or |A|. Its value characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if and only if the matrix is invertible and the linear map represented by the matrix is an isomorphism. The determinant of a product of matrices is the product of their determinants. The determinant is used in various mathematical operations and theorems, including solving systems of linear equations, computing eigenvalues and eigenvectors, and determining the orientation and volume of geometric shapes. The determinant is denoted by the symbol “det(A)” or “|A|”, where A is the matrix.
Diagonal Matrix
A square matrix with non-zero elements only on its main diagonal.
Diffrentiation
A fundamental operation in calculus that involves calculating the rate of change or slope of a function at a given point. It is the process of finding the derivative of a function with respect to one or more variables. The derivative represents how the function’s output changes as its input varies and provides valuable insights into the behavior of functions, including identifying critical points, extrema, and inflection points.
Discrete Random variable
Random variable that can take on a countable number of distinct values. These values are typically integers and are often the result of counting or enumerating outcomes in a sample space. Discrete random variables are characterized by a probability mass function (PMF), which assigns probabilities to each possible value the variable can take. Examples of discrete random variables include the number of heads obtained in a series of coin flips or the number of defects in a batch of products.
Discrete variable
A type of variable that can only take on distinct, separate values from a finite or countable set. It is characterized by having gaps or jumps between consecutive values, with no intermediate values allowed. Discrete variables are often categorical or qualitative in nature, representing distinct categories, classes, or labels. Examples of discrete variables include the number of students in a class, the outcomes of a dice roll, the types of animals in a zoo, and the categories of products in a store. They are used to represent countable phenomena and make categorical distinctions.
Disjoint (mutualy exclusive)
Two events or sets are said to be disjoint or mutually exclusive if they have no elements in common, i.e., they cannot occur simultaneously. If events A and B are disjoint, then P(A ∩ B) = 0. Disjoint events are independent of each other, and the occurrence of one event does not affect the probability of the other event occurring.
Divide by coefficient (matrix)
Dividing each term of an expression or equation by a constant factor or coefficient. It is a common operation used to simplify algebraic expressions, solve equations, or manipulate mathematical formulas. Dividing by a coefficient scales or rescales the expression by the reciprocal of the coefficient, effectively adjusting the magnitude or scale of the terms. Dividing by a coefficient is a fundamental operation in algebra, calculus, and linear algebra, used in various mathematical and scientific contexts.
Domain
The set of all possible input values or independent variables for which the function is defined. It represents the permissible values that the input variable can take while ensuring that the function produces meaningful output. The domain specifies the range of valid inputs that the function can process and is essential for determining the function’s behavior, range, and properties. The domain of a function is typically described using interval notation, set notation, or inequalities, depending on the nature of the function and its constraints. Understanding the domain of a function is crucial for analyzing its behavior, solving equations, and evaluating its applicability to real-world problems.
Dot product
Also known as the scalar product or inner product, is an algebraic operation that takes two equal-length sequences of numbers (usually vectors) and returns a single number. It is calculated by multiplying corresponding components of the vectors and then summing the products. The dot product is used to measure the similarity or alignment between vectors, compute projections, and calculate work done by a force acting in a direction. In machine learning and linear algebra, the dot product plays a crucial role in vector spaces, optimization algorithms, and neural network operations.
Eigenbases
Bases is a minimal set of vectors (number of vectors is the same as dimentionality of space). These are the set of eigenvectors corresponding to a linear transformation or matrix. In linear algebra, an eigenbasis is a basis for a vector space consisting entirely of eigenvectors of a linear operator or matrix. Each eigenvector in the eigenbasis is associated with a unique eigenvalue, and together they form a complete set of linearly independent vectors that diagonalize the matrix. Eigenbases play a fundamental role in diagonalization, spectral decomposition, and solving systems of linear equations, providing a convenient representation for analyzing and understanding linear transformations.
In normal language: USed when you stretch or rotate a shape on a grid (linear transformation). An eigenbasis is a special set of vectors pointing in different directions on the grid. These arrows have a unique property: when the transformation happens, they don’t change direction, they only get longer or shorter. An eigenbasis helps us understand how the transformation affects the grid by showing us which directions stay the same and how much they stretch or shrink.
Eigenvectors
Special vectors associated with linear transformations or matrices that retain their direction when the transformation is applied. In linear algebra, an eigenvector of a square matrix A is a nonzero vector v such that Av = λv, where λ is a scalar known as the eigenvalue corresponding to v. Eigenvectors represent the directions along which linear transformations stretch or compress space, and eigenvalues represent the scale factors by which these transformations occur. Eigenvectors are used in various applications such as principal component analysis (PCA), spectral analysis, and solving systems of differential equations, providing insights into the behavior and properties of linear systems.
When we apply a transformation to the space, these arrows might change in length, but they don’t change direction. They’re like the backbone of the transformation, showing us the main directions that don’t get twisted or turned. Each arrow has a special number associated with it called an eigenvalue, which tells us how much the arrow stretches or shrinks when the transformation happens.
Elimination method
Systematic approach used to solve systems of linear equations by eliminating variables one by one until a solution is found. It involves manipulating equations to cancel out variables or reduce the system to simpler equations with fewer variables. The elimination method is commonly used in algebra and linear algebra to solve systems of equations with multiple unknowns, providing a step-by-step procedure to determine the values of the variables that satisfy all the equations simultaneously.
Euclidian distance
Measures the straight-line distance between two points in Euclidean space (follows from the Pythagorean theorem).
Formula (2D space):
d = sqrt((x2 - x1)^2 + (y2 - y1)^2)
- (x1, y1) and (x2, y2) are the coordinates of the two points
- d is the Euclidean distance
Generalizes to higher-dimensional spaces, where it measures the straight-line distance between points in n-dimensional space.
Applications: Pattern recognition, Clustering, Regression analysis, Nearest neighbor algorithms
Euler’s number
Euler’s number, denoted by the letter ‘e’, is a mathematical constant approximately equal to 2.71828… It’s an irrational number, meaning it has an infinite, non-repeating decimal expansion. It’s the base of the natural logarithm (ln). Euler’s number is deeply connected to processes that exhibit exponential change, such as compound interest or radioactive decay. It elegantly represents proportional growth and change. The value of Euler’s number arises naturally in various mathematical contexts, particularly in calculus, number theory, and complex analysis.
The function f(x) = e^x is very special. It’s the only function whose derivative (rate of change) is equal to itself. the amazing thing about e^x is that the rate of change at any point on that curve is exactly equal to the value of the function itself. This property makes it incredibly useful for modelling growth and decay. (for example in bacteria multiplication: if the population is currently 100, it’s growing at a rate of 100 bacteria per hour. If it’s 500, it’s growing at a rate of 500 bacteria per hour.)
Overall, Euler’s number is a fundamental constant in mathematics with wide-ranging applications across different fields. Its importance lies in its connection to exponential growth, calculus, complex analysis, and other areas of mathematics, making it a cornerstone of mathematical theory and practice. ‘e’ plays a fundamental role in calculus, particularly in solving differential equations and finding integrals. Many phenomena in the world, from population growth to radioactive decay, can be modeled or approximated using functions involving ‘e’. ‘e’ is essential in compound interest calculations used in financial models.
Event
A possible outcome or occurrence of a random experiment. It represents a specific situation or result that may happen, such as rolling a particular number on a dice, drawing a specific card from a deck, or observing a certain event in a statistical study. Events are fundamental concepts in probability theory and are used to define probability distributions, calculate probabilities, and analyze uncertainty in various domains.
Expectation (Mean)
Often referred to as the mean or expected value, of a random variable is a measure of the central tendency of its distribution. It represents the average value that the variable would take over a large number of independent repetitions of the random experiment. The expectation is calculated as the weighted sum of all possible values of the random variable, where each value is weighted by its corresponding probability of occurrence. The expectation is a fundamental concept in probability theory and is used to characterize the properties of random variables, estimate population parameters, and make predictions.
Exponential
The essence of an exponential relationship is that a quantity grows or shrinks by being multiplied by itself repeatedly.
An exponential function has the general form f(x) = a^x, where:
- ‘a’ is the base (the number being multiplied)
- ‘x’ is the exponent (the number of times the base is multiplied by itself)
Exponential function is a mathematical function or distribution characterized by a constant base raised to the power of a variable exponent. The exponential function, f(x) = e^x, where e is Euler’s number (approximately 2.71828), is a common example of an exponential function. Exponential functions exhibit rapid growth or decay, depending on whether the exponent is positive or negative. Exponential distributions describe the behavior of random variables that model processes with constant rates of change over time, such as radioactive decay, population growth, or the waiting times between independent events. Exponential functions and distributions are widely used in mathematics, statistics, and science to model various natural phenomena and processes.
Functions
A function is a relation that associates each element x of a set X, the domain of the function, to a single element y of another set Y, the codomain of the function. A function usually has a name. If the function is called f, this relation is denoted y = f(x) (read f of x), the element x is the argument or input of the function, and y is the value of the function or the output. The symbol that is used for representing the input is the variable of the function
(we often say that f is a function of the variable x).
Geometric (your first succes will be on the n-th try)
Refers to a probability distribution in which the likelihood of success increases with each attempt, following a geometric progression. In this context, the probability of success on the first try is p, the probability of success on the second try is p(1-p), the probability of success on the third try is p(1-p)^2, and so on. The geometric distribution is commonly used to model the number of trials needed to achieve the first success in a sequence of independent Bernoulli trials, where each trial has a constant probability of success p.
Geometric dot product
Also known as the scalar product or inner product, is a mathematical operation that takes two vectors and returns a scalar quantity. It is calculated by multiplying corresponding components of the vectors and summing the results. In geometric terms, the dot product represents the magnitude of one vector projected onto another vector, scaled by the cosine of the angle between them. The dot product is used to measure the similarity or alignment between vectors, calculate projections, and determine angles between vectors. In machine learning and data analysis, the dot product is often used in vector spaces, optimization algorithms, and neural network operations.
Global minimum
In optimization, the global minimum refers to the lowest possible value of the objective function over the entire feasible domain. It represents the optimal solution that minimizes the objective function and satisfies all constraints, providing the best achievable outcome for the optimization problem. The global minimum is distinguished from local minima, which are lower values of the objective function within specific regions of the feasible domain but may not be the lowest overall. Finding the global minimum is a key objective in optimization problems, as it ensures the best performance or utility of the system under consideration. Various optimization algorithms, such as gradient descent and simulated annealing, are employed to search for the global minimum in complex, high-dimensional optimization landscapes encountered in machine learning, engineering, economics, and other fields.
Gradient
A vector-valued function that represents the direction and magnitude of the steepest ascent of a scalar-valued function at a given point. It is a generalization of the derivative to multiple dimensions and provides valuable information about the rate of change or slope of the function in each direction. The gradient of a function points in the direction of the greatest increase of the function and has a magnitude equal to the rate of change in that direction. In machine learning, the gradient is commonly used in optimization algorithms, such as gradient descent, to iteratively update the parameters of a model in the direction that minimizes the objective function. By following the negative gradient direction, optimization algorithms can converge towards the optimal solution or minimum of the objective function.
the gradient of a function is a vector that points in the direction of the greatest rate of increase of the function at a given point. It is a generalization of the derivative of a scalar-valued function to functions of multiple variables. The gradient is calculated by taking the partial derivatives of the function with respect to each of its variables and arranging them into a vector. Geometrically, the gradient represents the direction of steepest ascent of the function’s graph at the given point. In machine learning and optimization, the gradient plays a crucial role in gradient-based optimization algorithms such as gradient descent, where it is used to update the parameters of a model iteratively to minimize a loss function and find the optimal solution.
Hyperplane
In geometry and linear algebra, a hyperplane is a flat affine subspace of dimension n−1 embedded in an n-dimensional space. It is defined as the set of points that satisfy a linear equation of the form w⋅x+b=0, where w is a normal vector perpendicular to the hyperplane, x is a point in the space, and b is a scalar bias term. Geometrically, a hyperplane divides the space into two half-spaces and serves as a boundary or separation surface between them. In machine learning, hyperplanes are fundamental concepts in classification and regression tasks, where they are used to define decision boundaries between different classes or regions of the input space. Hyperplanes are also used in clustering, dimensionality reduction, and pattern recognition algorithms for partitioning and organizing data in high-dimensional spaces.
Hypothesis testing
A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently support a particular hypothesis. Hypothesis testing is a statistical method to determine if an observed difference or effect in your data is likely due to a real phenomenon in the larger population, or if it could be simply explained by random chance. It helps you make informed, data-driven decisions about whether changes, treatments, or relationships are truly significant. Many tests have assumptions about your data that need to be checked. Hypothesis testing is all about asking questions like: Is there a true difference between group A and group B? Does this newly developed drug actually work better than the old one? Is there a relationship between a customer’s age and their likelihood to buy a product.
Key Steps:
1. Formulate Hypotheses:
Null Hypothesis (H0): The default statement, usually one of “no effect” or “no difference”.
Alternative Hypothesis (Ha): The statement you want to find evidence to support.
- Choose a Test Statistic and Significance Level:
Test Statistic: Calculates a value summarizing how different your sample is from what the null hypothesis expects (e.g., t-statistic, z-statistic)
Significance Level (alpha): Your risk tolerance for rejecting the null even if it’s true (common value: 0.05) - Calculate the p-value:
The probability of getting a test statistic as extreme or more extreme than what you observed if the null hypothesis were true. - Make Decision:
p-value < alpha: Reject the null hypothesis. You have evidence to support the alternative hypothesis.
p-value >= alpha: Fail to reject the null hypothesis. You don’t have enough evidence to claim the effect or difference exists in the larger population.
Hypothesis testing doesn’t provide definitive proof about your population parameter, just evidence.
It is associated with Errors: Type I error (false positive), Type II error (false negative) are possible.
Identity Matrix
A special diagonal matrix with ones on the main diagonal and zeros elsewhere.
Independent (statistics)
The core idea of independence is that two events or variables are independent if knowing the outcome of one tells you nothing about the outcome of the other.
Example (Coin Tosses): If you flip two fair coins, the outcome of the first flip doesn’t influence the outcome of the second. These events are independent.
Feature Independence:
Ideally, Features Are Informative Alone: Each feature in your dataset should provide unique information about the target variable you’re trying to predict.
Redundant Features: Highly correlated features can hinder some models, so feature selection processes often aim to identify and potentially remove them.
Independent sample
A set of data points drawn from a population where each observation is unrelated to or not influenced by others. Independence of samples is fundamental for statistical analysis, ensuring that observations remain statistically independent and free from confounding variables or biases. Independent samples facilitate robust statistical inference, hypothesis testing, and generalizability of findings across different contexts or populations. They provide a reliable basis for making inferences about population parameters and assessing the effectiveness of interventions or treatments in research studies.
Inferential Statistics
A branch of statistics concerned with making predictions, inferences, or generalizations about a population based on data collected from a sample. It involves using probability theory to draw conclusions about a population parameter, such as a mean or proportion, from sample data. Inferential statistics allows researchers to make informed decisions and predictions based on limited information.
Integration
Continuous analog of a sum, which is used to calculate areas, volumes, and their generalizations. Integration, the process of computing an integral, is one of the two fundamental operations of calculus, the other being differentiation. The integral can be seen as the opposite of a derivative. If a function represents the rate of change of something, the integral helps us find the total amount of change accumulated over an interval. the integral tells you the total amount of change over an interval.
Consider a function f(x) and its graph. The definite integral of f(x) between two points ‘a’ and ‘b’ calculates the signed area enclosed by the function’s curve, the x-axis, and the vertical lines at x=a and x=b.
Integrals help locate the center of mass of objects, especially those with irregular shapes or varying density. The integral of a probability density function (PDF) represents probabilities. The area under the PDF curve within a specific range calculates the probability of a random variable falling within that range.
What Integrals Tell Us:
Geometrically: Integrals reveal the area under a curve.
Physically: Integrals translate rates of change into total quantities accumulated (distance, work, volume, etc.).
Probabilistically: Integrals are key for working with continuous distributions and finding probabilities.
Interval
A set of values between two endpoints, typically expressed in terms of the lower and upper bounds. In mathematics, intervals can be open, closed, half-open, or half-closed, depending on whether the endpoints are included or excluded from the set of values. Intervals are commonly used to represent sets of real numbers or continuous ranges of variables in various mathematical contexts, such as calculus, geometry, and statistics.