numerical methods 1 Flashcards
How is the spacing between floating point numbers defined, and what controls it?
Representation of floating points is discrete (as opposed to being continuous), because the amount of bits used to store them is finite.
Spacing is related to the size of the fraction/mantissa, f, and
directly influences precision.
The floating point representation defines a set of binary intervals, one for each value of e (exponent).
Within each interval (2e<= x <= 2e+1), spacing is fixed at 2e-t
. Where t is the number of bits used to store f. Therefore, the spacing is controlled by the amount of bits stored in f.
How does numerical error relate to the spacing between floating point numbers?
Binary intervals defined by spacing are smaller close to zero, and larger away from zero.
When a number is defined within a binary interval, the ‘closest’ fp number within the binary interval is chosen.
If a binary interval is small (because it is close to zero), it is
densely spaced, and the ‘distance between the original number and the represented number’ (the numerical error) is minimized.
If a binary interval is large (because it is far
from zero), it has larger spacing, and the ‘distance between the original number and the represented number’ (the numerical error) is **larger. **
How does the spacing vary as a function of number magnitude?
how would your find the spacing?
what is epsilon (eps)?
Spacing is denser for numbers close to zero, it is larger for numbers away from zero. Spacing is defined as 2e-t
, for the binary interval 2e<= x <= 2e+1. For double precision, the parameter e is defined as: -1022 <= e <= 1023.
For an interval close to zero:
2-10 <= x <= 2-9→ 0.0009765625<= x <=0.001953125
→ there are 252 numbers in this ‘binary interval’
→ spacing in this interval is 2-10-52=2.27373675443232e-19
eps is the spacing between zero and the next biggest number.
the maximum and minimum number would be21023-52 and 2-1022-52
What is the relation between the condition number of a matrix and the accuracy of its solution?
‘The condition number of a function with respect to an argument measures how much the output value of the function can change for a small change in the input argument’. it is a measure of its singularity
if you vary slightly your input data you will obtain a large variation in your solution. Thus, your solution is ‘unreliable’ for large condition numbers, while for low condition numbers your solution is ‘reliable’. This is related to the amount of ‘accurate’
decimal places of your solution.
For a condition number where kappa = 10k, the condition
number is of order k. This estimates that a given arithmetic method will result in the loss of k digits of accuracy.
From the initial 16 digits of accuracy, if the condition number is
of order 4 (for example, 10000 or 61000) the accuracy is only 11 digits, and the last 5 digits are rubbish!
Name and explain three methods of Interpolation. What are their advantages and disadvantages?
Interpolation is the process of defining a function that takes on specified values at specified points.
Polynomial - is generation via vandermonde matrix
Smooth curve
But some things behave linear.
Expensive
Can have extreme values that don’t fit common sense.
Piecewise - uses piecewise cubic graphs with derivative input
Smooth curve
Least extreme values out of poly and spline
Spline - uses cubics but with derivatives as an output
Smooth curve
Less extreme than poly
Smaller error than linear interpolation
Consider an artificial dataset of 10 data points. What is the difference between generating a polynomial interpolating function and fitting a polynomial function using the least squares method?
Using the polynomial interpolating function it must equal to the number of data points. Using least squares we can fit polynomial function of lower degree that captures the trend of data
Name and describe a technique to compute the minimum of a function.
The Newton method could be used in reverse to find a point where the gradient of the line was equal to zero and hence was a local minima or maxima.
What is absolute and relative error?
Absolute error is the actual difference between the real solution and the numerical solution, whereas relative error is Normalised to the scaled of the solution.
Consider the floating point representation. How does varying the ‘fraction’ and ‘exponent’ affect the set of representable numbers?
The finiteness of f is a limitation on precision. The finiteness of e is a limitation on range. Any numbers that don’t meet these limitations must be approximated by ones that do.
Define ‘convergence’ and ‘divergence’ of an algorithm. Give an example.
function divergence would be an exponential function or polynomial function which extends to greater magnitudes.
a convergent function would be something like newton/ secant method which closes to a point.
Name three interpolation methods, list their advantages and disadvantages.
**Bisection **
Slow but a definite method for finding the answer. However, it always converges.
**Newton **
Requires a starting value near to the true value, it is quick however it can get stuck in an infinite (incorrect) loop.
Advantange: Simple formulation; rapidly converges in most cases, easy to understand. only requires one starting point
Disadvantage: It may not converge, likely to have trouble when f’(x)=0. Another disadvantage is that it requires the evaluation of f’(x) (the derivative of the function)
**Secant **
Advantage: it does not require f’(x). Newton and Secant have both superlinear (faster than linear) convergence.
Disadvantages: it requires more iterates than Newton (but it might be faster to compute), it may not converge! , likely to have trouble when f’(x)=0.
**Inverse Quadratic Interpolation **
Approximates very quickly when applied near to the finishing point; however its overall behaviour is very erratic. May not converge.
.
Consider the methods Newton, Secant and Bisection. What are these used for? When can they be applied? When can they not be applied? Please provide examples.
they are basic methods for computing zeros of functions
Bisection-
this is a kind of trial and improvement technique where one attempts to get closer and closer to the correct answer every time
Newton-
this method involves the drawing of a tangent to the line at any point and seeing where that tangent then intersects the x-axis. However in order to work accurately the starting value must be close to the final value.
Secant-
this method is similar to Newton’s however it does not have as many draw backs. Instead of drawing a tangent to the line, this is replaced with a secant (line that locally intersects two points on a curve), where the next iteration is intersection between
the secant and the x-axis.
inverse quadratic interpolation (IQI)
the secant method uses two previous points to get the next one. IQI uses three points by creating a parabola between them. the next itterate is where the parabola intersects the x axis. we interpolate using a sideways parabola function of y, which always intersects the x axis.
Consider an experimental dataset which correlates permeability and porosity:
- Setup the system of equations to perform polynomial interpolation. Describe this interpolation.
- Setup the system of equations to perform the fit of a quadratic polynomial through the data.
Describe the previous system of linear equations. Name at least two methods (each) that can be used to solve these systems?
Explain how interpolation is different to least squares function fitting, and which is best suited for prediction.
Explain how the LU decomposition compares to direct matrix inversion
explain how pivoting makes it more exact?
works by turning the matrix into an upper and lower matrix. where the upper matrix is the multipliers used to solve the equation and the lower triangle contains the coefficients. the LU system can then be solved by using numerically stable back substitution methods.
it is cheaper
more exact
selective invertive (pivoting)
computing LU decomposition takes a 1/3 less flops than finding the inverse
direct inversion is less accurate beacuse the fp spacing is larger
if you divide the mumber by the large numbers (partial pivoting) then you get small numbers thus a small incriment (eps) therefore less likely to have catastrophic cancellation.
Give two examples of a band matrix of bandwidth 2, 3 and 4.
Why does the Vandermonde matrix always have a large condition number, and what implications does this have?
It has a high condition number because its rows are derived from the same equation so they are not very independent. Singularity is a measure of how independent rows are. The more independent, the lower the kappa.
What is the Design matrix and how does it relate to the QR method?
The design matrix X is a rectangular matrix of order m by n
The design matrix usually has more rows than columns.
Written in the equation Xβ = Y where X is the design matrix.
Thus there are more equations than unknowns
The design matrix contains data on the independent variables (also called explanatory variables) in statistical models which attempt to explain observed data on a response variable (often called a dependent variable) in terms of the explanatory variables.
X and β are not the same dimensions and the fact that the design matrix isn’t usually square means that we cant use the inverse to calculate β.
QR decomposition (also called a QR factorization) of a matrix is a decomposition of a matrix A into a product A = QR of an orthogonal matrix Q and an upper triangular matrix R. QR decomposition is often used to solve the linear least squares problem