numerical methods 1 Flashcards by thomas enos

How is the spacing between floating point numbers defined, and what controls it?

Representation of floating points is discrete (as opposed to being continuous), because the amount of bits used to store them is finite.

Spacing is related to the size of the fraction/mantissa, f, and
directly influences precision.

The floating point representation defines a set of binary intervals, one for each value of e (exponent).

Within each interval (2^e<= x <= 2^e+1), spacing is fixed at 2^e-t
. Where t is the number of bits used to store f. Therefore, the spacing is controlled by the amount of bits stored in f.

How well did you know this?

Not at all

Perfectly

How does numerical error relate to the spacing between floating point numbers?

Binary intervals defined by spacing are smaller close to zero, and larger away from zero.

When a number is defined within a binary interval, the ‘closest’ fp number within the binary interval is chosen.

If a binary interval is small (because it is close to zero), it is
densely spaced, and the ‘distance between the original number and the represented number’ (the numerical error) is minimized.

If a binary interval is large (because it is far
from zero), it has larger spacing, and the ‘distance between the original number and the represented number’ (the numerical error) is **larger. **

How well did you know this?

Not at all

Perfectly

How does the spacing vary as a function of number magnitude?

how would your find the spacing?

what is epsilon (eps)?

Spacing is denser for numbers close to zero, it is larger for numbers away from zero. Spacing is defined as 2^e-t
, for the binary interval 2^e<= x <= 2^e+1. For double precision, the parameter e is defined as: -1022 <= e <= 1023.

For an interval close to zero:
2^-10 <= x <= 2^-9→ 0.0009765625<= x <=0.001953125
→ there are 2⁵² numbers in this ‘binary interval’
→ spacing in this interval is 2^-10-52=2.27373675443232e-19

eps is the spacing between zero and the next biggest number.

the maximum and minimum number would be2^1023-52 and 2^-1022-52

How well did you know this?

Not at all

Perfectly

What is the relation between the condition number of a matrix and the accuracy of its solution?

‘The condition number of a function with respect to an argument measures how much the output value of the function can change for a small change in the input argument’. it is a measure of its singularity

if you vary slightly your input data you will obtain a large variation in your solution. Thus, your solution is ‘unreliable’ for large condition numbers, while for low condition numbers your solution is ‘reliable’. This is related to the amount of ‘accurate’
decimal places of your solution.

For a condition number where kappa = 10^k, the condition
number is of order k. This estimates that a given arithmetic method will result in the loss of k digits of accuracy.

From the initial 16 digits of accuracy, if the condition number is
of order 4 (for example, 10000 or 61000) the accuracy is only 11 digits, and the last 5 digits are rubbish!

How well did you know this?

Not at all

Perfectly

Name and explain three methods of Interpolation. What are their advantages and disadvantages?

Interpolation is the process of deﬁning a function that takes on speciﬁed values at speciﬁed points.

Polynomial - is generation via vandermonde matrix

Smooth curve

But some things behave linear.

Expensive

Can have extreme values that don’t fit common sense.

Piecewise - uses piecewise cubic graphs with derivative input

Smooth curve
Least extreme values out of poly and spline

Spline - uses cubics but with derivatives as an output

Smooth curve

Less extreme than poly

Smaller error than linear interpolation

How well did you know this?

Not at all

Perfectly

Consider an artificial dataset of 10 data points. What is the difference between generating a polynomial interpolating function and fitting a polynomial function using the least squares method?

Using the polynomial interpolating function it must equal to the number of data points. Using least squares we can fit polynomial function of lower degree that captures the trend of data

How well did you know this?

Not at all

Perfectly

Name and describe a technique to compute the minimum of a function.

The Newton method could be used in reverse to find a point where the gradient of the line was equal to zero and hence was a local minima or maxima.

How well did you know this?

Not at all

Perfectly

What is absolute and relative error?

Absolute error is the actual difference between the real solution and the numerical solution, whereas relative error is Normalised to the scaled of the solution.

How well did you know this?

Not at all

Perfectly

Consider the floating point representation. How does varying the ‘fraction’ and ‘exponent’ affect the set of representable numbers?

The finiteness of f is a limitation on precision. The finiteness of e is a limitation on range. Any numbers that don’t meet these limitations must be approximated by ones that do.

How well did you know this?

Not at all

Perfectly

Define ‘convergence’ and ‘divergence’ of an algorithm. Give an example.

function divergence would be an exponential function or polynomial function which extends to greater magnitudes.

a convergent function would be something like newton/ secant method which closes to a point.

How well did you know this?

Not at all

Perfectly

Name three interpolation methods, list their advantages and disadvantages.

**Bisection **
Slow but a definite method for finding the answer. However, it always converges.
**Newton **
Requires a starting value near to the true value, it is quick however it can get stuck in an infinite (incorrect) loop.
Advantange: Simple formulation; rapidly converges in most cases, easy to understand. only requires one starting point
Disadvantage: It may not converge, likely to have trouble when f’(x)=0. Another disadvantage is that it requires the evaluation of f’(x) (the derivative of the function)

**Secant **
Advantage: it does not require f’(x). Newton and Secant have both superlinear (faster than linear) convergence.
Disadvantages: it requires more iterates than Newton (but it might be faster to compute), it may not converge! , likely to have trouble when f’(x)=0.

**Inverse Quadratic Interpolation **
Approximates very quickly when applied near to the finishing point; however its overall behaviour is very erratic. May not converge.

How well did you know this?

Not at all

Perfectly

Consider the methods Newton, Secant and Bisection. What are these used for? When can they be applied? When can they not be applied? Please provide examples.

they are basic methods for computing zeros of functions

Bisection-

this is a kind of trial and improvement technique where one attempts to get closer and closer to the correct answer every time
Newton-

this method involves the drawing of a tangent to the line at any point and seeing where that tangent then intersects the x-axis. However in order to work accurately the starting value must be close to the final value.
Secant-

this method is similar to Newton’s however it does not have as many draw backs. Instead of drawing a tangent to the line, this is replaced with a secant (line that locally intersects two points on a curve), where the next iteration is intersection between
the secant and the x-axis.

inverse quadratic interpolation (IQI)

the secant method uses two previous points to get the next one. IQI uses three points by creating a parabola between them. the next itterate is where the parabola intersects the x axis. we interpolate using a sideways parabola function of y, which always intersects the x axis.

How well did you know this?

Not at all

Perfectly

Consider an experimental dataset which correlates permeability and porosity:

Setup the system of equations to perform polynomial interpolation. Describe this interpolation.
Setup the system of equations to perform the fit of a quadratic polynomial through the data.

Describe the previous system of linear equations. Name at least two methods (each) that can be used to solve these systems?

Explain how interpolation is different to least squares function fitting, and which is best suited for prediction.

How well did you know this?

Not at all

Perfectly

Explain how the LU decomposition compares to direct matrix inversion

explain how pivoting makes it more exact?

works by turning the matrix into an upper and lower matrix. where the upper matrix is the multipliers used to solve the equation and the lower triangle contains the coefficients. the LU system can then be solved by using numerically stable back substitution methods.

it is cheaper

more exact

selective invertive (pivoting)

computing LU decomposition takes a 1/3 less flops than finding the inverse

direct inversion is less accurate beacuse the fp spacing is larger

if you divide the mumber by the large numbers (partial pivoting) then you get small numbers thus a small incriment (eps) therefore less likely to have catastrophic cancellation.

How well did you know this?

Not at all

Perfectly

Give two examples of a band matrix of bandwidth 2, 3 and 4.

How well did you know this?

Not at all

Perfectly

Why does the Vandermonde matrix always have a large condition number, and what implications does this have?

It has a high condition number because its rows are derived from the same equation so they are not very independent. Singularity is a measure of how independent rows are. The more independent, the lower the kappa.

How well did you know this?

Not at all

Perfectly

What is the Design matrix and how does it relate to the QR method?

The design matrix X is a rectangular matrix of order m by n

The design matrix usually has more rows than columns.

Written in the equation Xβ = Y where X is the design matrix.

Thus there are more equations than unknowns

The design matrix contains data on the independent variables (also called explanatory variables) in statistical models which attempt to explain observed data on a response variable (often called a dependent variable) in terms of the explanatory variables.

X and β are not the same dimensions and the fact that the design matrix isn’t usually square means that we cant use the inverse to calculate β.

QR decomposition (also called a QR factorization) of a matrix is a decomposition of a matrix A into a product A = QR of an orthogonal matrix Q and an upper triangular matrix R. QR decomposition is often used to solve the linear least squares problem

How well did you know this?

Not at all

Perfectly

Explain the steps of the QR method and how it compares to the pseudo-inverse method.

Study These Flashcards

QR factorisation method is used to solve over-determined systems e.g. the design matrix. It solves the over-determined system in a stable and accurate manner without computing the pseudo-inverse

_How it compares to the pseudo-inverse _
The pseudo-inverse method approximates the inverse of X as X^†
The pseudo-inverse method is simple and cheap, but less accurate. In terms of condition numbers: kappa(X^†)= kappa(X)²

The steps are:
For an over-determined system X B = y
1) Compute a Householder reflection H for each column of X (H₁, H₂, H_n)
2) Operate each Householder reflection onto X and y
3) After operating on X and y, the system becomes: R B = z
where (H_n… H₂ H₁₎y=z and Q=(H_n… H₂ H₁)^T
4) Compute B by back substitution.

B is the solution to the over-determined system.

Name and explain the four Quadrature methods.

Study These Flashcards

The mid-point rule (M)
An approximation of the integral by integrating the rectangle of the height taken as the value of the evaluated function at the mid-point of the domain.The order of this quadrature rule is two because it is the lowest degree polynomial that it can integrate
exactly.
The Trapezoid rule (T)
An approximation of the integral with a Trapezium of which the heights are the values of the functions at the extreme of the integration domain. The order of this quadrature rule
is two because it is the lowest degree polynomial that it can integrate exactly. For integrating smooth functions, M is roughly twice as accurate as T.
Simpson’s rule
An approximation of the integral of a smooth function which has an accuracy value much higher than both the mid-point and the trapezoid rule. It is a linear combination of the mid-point and trapezoid rules and it has an order of four.
Composite Simpson’s rule (aka Weddle’s rule)
An approximation of the integral of a smooth function where Simpson’s rule is applied to two sub-intervals of the integration region. This rule effectively refines the integrated domain with respect to the previous. Its order is six.

What are the properties of an orthogonal matrix?

Study These Flashcards

Orthogonal matrices have the following properties:

They are squared
Their columns form an orthonormal basis of the Euclidean space
Its determinant is 1 or -1
They have eigenvalues of magnitude 1
Their condition number is 1, therefore, errors are not magnified when multiplying with an orthogonal matrix
Householder reflections are orthogonal matrices

What are Piecewise Interpolation techniques?

Study These Flashcards

For large sets of data points (think temperature measurements in a river made every 15 minutes over 10 years!) applying polynomial interpolation becomes unfeasible. For example, for 341,769 data points we would need to fit a polynomial with 341,768 degrees (!). Piecewise interpolation subdivides the original range into subsets and fits polynomials or other functions (such as splines) in a regional manner. For example, linear piecewise interpolation will fit a line for every two segments.

How does pivoting reduce numerical error?

Study These Flashcards

by choosing a large number as the pivot. if you divide by the large number then you get a small number where the incriment between numbers is small so we are less likely to get catastrophic cancellation

What is the norm of a vector? What can it be used for? Provide a descriptive example.

Study These Flashcards

norm is a definition of a measurement in space and their measurement must conform to properties.

measurements can be maximum, minimum, or length.

a vector norm is thus a measure of the magnitude of a vector

it can be used to calculate the condition number

Name and explain three methods to solve an overdetermined system

Study These Flashcards

QR decomposition

pseudo inverse

least squares

How does the norm of a vector relate to the condition number of a matrix?

There is a condition number for each norm – The condition number is a relative measure of the behaviour of a matrix The numerical value of the condition number of an n x n matrix depends on the particular norm used but because of the equivalence of the underlying vector norms, these values can differ by at most a ﬁxed constant (which depends on n), and hence they are equally useful as quantitative measure of conditioning

Give an example of a ‘nan’ and ‘inf’ number.

nan --\> not a number --\> 2gt Inf --\> infinity

Why is the exponent of the floating point representation biased?

the largest possible representative value of e is 11111111111 which equals 2047 but a compter needs to account for powers less than 2^0 in floating point representation. hence it splits in half, 1023 for powers above 2^0 and 1022 for powers below 2^0.

Consider a linear system of equations. How can it be verified if the system has one, infinite or no solutions?

a determinent of 0 represents a singularity which means there are infinite number of solutions. a solution will have one solution if there is one defining equation or row for that system. you need to have at least the same number of equations as solutions. a system has no solutions if it contradicts a mathamatical law or each other. using the discriminant b²-4ac b2-4ac = 0 --\> one solution b2-4ac \>0 --\> two solutions b2-4ac \<0 --\> infinite solutions

What is a ‘flop’?

In computing, FLOPS (for FLoating-point Operations Per Second) is a measure of computer performance, useful in fields of scientific calculations that make heavy use of floating-point calculations. FLOP (or flop) is used as an abbreviation for "FLoating-pointOPeration", and a flop count is a count of these operations

# Define floating point representation ‘epsilon’. Consider 32bit vs. 64bit representation. How does the epsilon vary?

double precision (64 bits) compared to single precision (32 bits) more bits larger fraction(f) - thus higher memory precision larger range (e) - donates range of values higher precision more memory as 32 bits uses less bits to store f it has a smaller amount of incriments so larger values have even larger incriments for eps.

Why is there more than one condition number related to a matrix?

there is more than one condition number as there is usually more than one norm to describe a matrix.

Describe two methods to estimate the condition number of a matrix.

_The perturbation method_ Using this method, we literally make a small change in the system and look at the change in the solution. Consider a system Ax=b, for which we know the values for A, x, and b. If we perturb b, for example, such that b’=b – db. The solution to the system is now: Ax’=b’. We can define dx=x-x’ and db = b - b’, as a result of this. Given that, by definition, the condition number is bound by: ||dx||/||x|| \<= kappa(A) ||db||/||b|| Then, kappa(A) \>= (||dx||/||x||) / (||db||/||b||). The exact value of the condition number will depend on the chosen norm (||\*||). _The matrix norm _ The condition number of a matrix, A, can be approximated by the norm of the matrix A and its inverse, as follows: kappa(A) =||A|| ||A^-1|| The numerical values of the norm and the condition number will depend on the chosen norm.

How does interpolation compare to function fitting (Least squares method)?

interpolation requires a particular function to fit every piece of data while the least squares uses a simple function to fit the general trend. Least squares is faster but its accuracy is dependent on the system is superior in an overdetermined system

What is a Householder reflection and what is it used for?

A householder reflection is a matrix transformation that forms the basis for many different numerical algorithms due to its flexibility and power. They can be used for solving eigenvalues and eigenvectors, Least Squares as well as a magnitude of other mathematical algorithms.

# Define Weddle’s Quadrature rule.

the composite simpsons rule describes weddles rule. weddles rule is so exact that it can calculate the error in the simpsons rule.

Explain the three different types of pivoting. What does pivoting seek to achieve?

Pivoting seeks to eliminate any round off errors that may occur during an LU decomposition. _Partial pivoting_- this is the process of keeping the multipliers less than modulus one _Diagonal pivoting_-the diagonals are selected as the pivots for each column _Complete pivoting_-the largest (modulus) element is used in each column

Explain the steps involved in the Least Squares method. What is the aim of this method?

The least squares method aims to find a solution to an over-determined problem. Graphically, it fits a function through a set of points. Instead of attempting to find a function that fits exactly through each point, it fits a given function and shapes it so that its general trend best approximates the trend of the data. 1) choosing a set of basis functions for fitting, 2) evaluating the function in the data points and building the design matrix, 3) Inverting the over determined design matrix using a numerical method – such as the QR decomposition, used to approximate the coefficients of the basis function, 4) make sure that the distance between the fitted curve and the original data is minimized.

What is LU decomposition? What is L and U? What kind of systems can it be used to solve?

LU decomposition is a numerical method used to solve linear systems that generally derive from linear systems of equations (solved simultaneously). Instead of inverting a matrix, the LU decomposition solves a system by decomposing the matrix into a simpler system (LU) formed by triangular matrices, which can be directly solved using numerically stable back substitution methods. L holds the multipliers used for the elimination, whereas U is the final matrix which contains the coefficients. LU can be used to solve well-determined systems of linear equations.

what is meant by catastrophic cancellation?

this is where the incriment is so large that the addition of a number will not make a difference to the output. e.g. eps(1e32) = 1.8e16 thus if we added a million to 1e32 it would not make a difference as a million is much smaller than the incriment.

what is the problem of inverting a matrix of realy small numbers?

if we invert really small numbers in a matrix we get large numbers and thus catastrophic cancellation.

what is meant by singularity of a matrix. what impact does this have on LU decomposition?

a matrix is singular if it does not have an inverse (and if and only its determinent is zero) . Singularity is a measure of how independent rows are. The more independent, the lower the kappa. if it is singular you cannot use the LU method

how do we get the order of a function?

it describes how many times the function crosses the x axis. a fifith order would then cross the x axis either 0 1 2 3 4 or 5 times polynomial will intersect the x axis either 0 1 or 2 times.

what is meant by an outlier?

an outlier is an observation that is numerically distant from the rest of the data are often indicative of measurement error

what is meant by an overdetermined system?

more equations than there are unknowns.

numerical methods 1 Flashcards

(44 cards)