L2 - Transformation of Data Flashcards
Why do we transform data?
- It is often the case that we do not analyse the ‘raw’ data on a particular variable or set of variables, but mathematically manipulate the numbers (in general, transform the data) into a form that we consider to be more amenable to analysis.
- Why should this be? One reason is that, occasionally, economic theory suggests the mathematical form that the variables should take
- For example, the Cobb-Douglas production function links output, Y, to capital and labour inputs, K and L, respectively, by the relationship
- Y=AK^αL^β
Multiplicative relationships like can be tricky to handle, so we line arise the function by taking logarithms to yield - Ln(Y)=Ln(A)+αLn(K)+βLn(L)
- The production function is now linear in the transformed variables , and and is much easier to handle both mathematically and statistically.
- As we shall see, other reasons for transforming data are essentially statistical and are often suggested during the exploratory stage of data analysis.
why do we round?
- Rounding improves readbility
- Too much detail can confuse the message, so rounding the answer make it more memorable
What is the problem with rounding?
- Rounding is a ‘trap door’ function: you cannot obtain the original value form the transformed (rounded) value
- therefore, if you are going to need the original value in further calculations you should not round your answer
- Further more, small rounding error can cumulate, leading to a large error in the final answer
- Therefore you should never round an intermediate answer, only the final one
- Even if you only round the intermediate answer to a small amount the final answer count be grossly inaccurate
why do we group data?
- When there is too much data to present easily, grouping solves the problem although at the cost of hiding some of the information
- Using some raw data would have given us far too much information, so grouping is a first stage in data analysis
- Grouping is another trap door transformation: once it is done you cannot recover the original information
Why do we divide/multiply a set of data by a constant?
- This transformation is carried out to make numbers more readable or to make calculation simpler by removing trailing zeros
- Some summary statistics such as the mean will be affected by the transformation, but not all such as the coefficient of variation
Why do we use differencing on a set of data?
- in time-series data there may be a trend, and it is better to describe the feature of the data relative to the trend
-The result may also be more economically meaningful, for example governments are often more concerned about the growth of output than about its level - Difference is one way of eliminating
the trend - One of the implications of differencing is that information about the level of the variable is lost
why do we take the reciprocal of data?
- The reciprocal of a variable might have a useful interpretation and provide a more intuitive explanation of a phenomenon
- The reciprocal transformation will also turn a linear series into a non-linear one
- The reciprocal of turnover in the labour market (i.e. the number leaving unemployment divided by the number unemployed) gives an idea of the duration of unemployment
- if a graph of turnover shows a linear decline over time, then the average duration of unemployment will be rising, at a faster and faster rate
- Repeating the reciprocal transformation recover the original data
why do we take logs of data?
- one important use of the logarithmic transformation is to linearise time series data tat are growing at a constant rate
- the mathematical formula for calculating a a series growing at a rate of say 10% is
- Y{t} = 1.1Y{t-1}
- and to get further values it is treated like compound interest e.g. Y[3}=1.1^2 x Y[1} –> this as a general formula is Y{n} = growth rate ^n-1 x Y{1}
- Note that although the growth rate is constant, the slope of the function is increasing over time; it is thus incorrect to interpret the slope of a plot of the levels of Y{t} against time as a growth rate
- by taking logs of the growth data you can then infer stuff about the growth rates from the slope
- taking logarithms compresses the scale of the data too
- Thus taking logarithms not only linearises (straightens out) growing time series, but the successive changes in the logarithms can be used as estimates of the growth rate of the series over time.
What is the standard formula to calculate growth rate?
g{t}= 100 x ((Y{t}-Y{t-1})/(Y{t-1}))
which is the same as
g{t} = 100 x ((Y{t})/(Y{t-1}) -1)
what can be seen to the growth rate of time series data if it changes by a constant amount each period?
A time series that changes by a constant amount each period thus exhibits declining growth rates over time.
What is the growth rate formula once you have taken logs?
- LnY{t}-LnY{t-1}=Ln(growth rate)
- LnY{t}-LnY{t-1} = Ln (1 + g/100) = Ln(1+x)
where x=g/100 must be a positive but small numer (if g=10, x=0.1). The logarithmic series expansion under these condition is: - Ln(1+x)= x - (x^2/2)+(x^3/3)+(x^4/4)+…
- Now, for small x, the terms in the expansion containing , , etc. will all be much smaller than x, so that the expansion can be approximated as:
- Ln(1+x)≈x
Thus, since x is the growth rate measured in decimals: - LnY{t}-LnY{t-1}=x
or
g=100x=100(LnY{t}-LnY{t-1})
i.e., the change in the logarithms (multiplied by 100) is an estimate of the percentage growth rate at time t. Such a change is often denoted by an upper case delta, , e.g. .
how can the whole growth rate formula be summarised?
We thus have the approximate equivalence:
g{t}=100 x ((Y{t}-Y{t-1})/(Y{t-1})) = = 100 x ((Y{t})/(Y{t-1}) -1)≈ 100(LnY{t}-LnY{t-1}) = 100Ln((Y{t})/(Y{t-1}))
-as long as ((Y{t})/(Y{t-1}) -1) is small. For a growth rate of 10%, using the change in the logarithms to approximate this rate gives . For a smaller growth rate, say 2%, gives a much more accurate approximation, as predicted.
How can you calculate the monthly rate of inflation?
π{t}^m= 100 x ((P{t}-P{t-1})/(P{t-1}))≈ 100(LnP{t}-LnP{t-1})
- where P{t} is current price level and the logarithm of the price level is ln{P{t})
- The notation π{t}^m is used to signify that what we are calculating is the monthly rate of inflation.
Can you get the annual rate of inflation from the monthly rate of inflation?
- if plotted on a graph the monthly rate of inflation scaled up by a factor of 12. It is an extremely volatile series which
- if these changes had occurred in the rate of inflation would have
resulted in a boom and bust business cycle.
-We have used the
standard formula to calculate the growth rate but we have made a
mistake in the implementation - The problem is the scaling factor of 12 which we use to get the annualised rate of inflation, we could ignore the scaling factor and get monthly rates of inflation but this is not what analyst want to focus on
What is the annual rate of inflation formula?
- π{t}^a= 100 x ((P{t}-P{t-12})/(P{t-12}))≈ 100(LnP{t}-LnP{t-12)
- where the rate of inflation is calculated by comparing the price level at time t with the level that occurred one year (12 months) previously. The time series for π{t}^a is shown in Figure 2.5 and is much smoother and less volatile than that for π{t}^m . The two rates are on the same scale, though, so that comparisons are valid. This is why π{t}^m needed to be scaled by a factor of 12.