L2 - Transformation of Data Flashcards

1
Q

Why do we transform data?

A
  • It is often the case that we do not analyse the ‘raw’ data on a particular variable or set of variables, but mathematically manipulate the numbers (in general, transform the data) into a form that we consider to be more amenable to analysis.
  • Why should this be? One reason is that, occasionally, economic theory suggests the mathematical form that the variables should take
  • For example, the Cobb-Douglas production function links output, Y, to capital and labour inputs, K and L, respectively, by the relationship
  • Y=AK^αL^β
    Multiplicative relationships like can be tricky to handle, so we line arise the function by taking logarithms to yield
  • Ln(Y)=Ln(A)+αLn(K)+βLn(L)
  • The production function is now linear in the transformed variables , and and is much easier to handle both mathematically and statistically.
  • As we shall see, other reasons for transforming data are essentially statistical and are often suggested during the exploratory stage of data analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

why do we round?

A
  • Rounding improves readbility

- Too much detail can confuse the message, so rounding the answer make it more memorable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the problem with rounding?

A
  • Rounding is a ‘trap door’ function: you cannot obtain the original value form the transformed (rounded) value
  • therefore, if you are going to need the original value in further calculations you should not round your answer
  • Further more, small rounding error can cumulate, leading to a large error in the final answer
  • Therefore you should never round an intermediate answer, only the final one
  • Even if you only round the intermediate answer to a small amount the final answer count be grossly inaccurate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

why do we group data?

A
  • When there is too much data to present easily, grouping solves the problem although at the cost of hiding some of the information
  • Using some raw data would have given us far too much information, so grouping is a first stage in data analysis
  • Grouping is another trap door transformation: once it is done you cannot recover the original information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do we divide/multiply a set of data by a constant?

A
  • This transformation is carried out to make numbers more readable or to make calculation simpler by removing trailing zeros
  • Some summary statistics such as the mean will be affected by the transformation, but not all such as the coefficient of variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why do we use differencing on a set of data?

A
  • in time-series data there may be a trend, and it is better to describe the feature of the data relative to the trend
    -The result may also be more economically meaningful, for example governments are often more concerned about the growth of output than about its level
  • Difference is one way of eliminating
    the trend
  • One of the implications of differencing is that information about the level of the variable is lost
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

why do we take the reciprocal of data?

A
  • The reciprocal of a variable might have a useful interpretation and provide a more intuitive explanation of a phenomenon
  • The reciprocal transformation will also turn a linear series into a non-linear one
  • The reciprocal of turnover in the labour market (i.e. the number leaving unemployment divided by the number unemployed) gives an idea of the duration of unemployment
  • if a graph of turnover shows a linear decline over time, then the average duration of unemployment will be rising, at a faster and faster rate
  • Repeating the reciprocal transformation recover the original data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

why do we take logs of data?

A
  • one important use of the logarithmic transformation is to linearise time series data tat are growing at a constant rate
  • the mathematical formula for calculating a a series growing at a rate of say 10% is
  • Y{t} = 1.1Y{t-1}
  • and to get further values it is treated like compound interest e.g. Y[3}=1.1^2 x Y[1} –> this as a general formula is Y{n} = growth rate ^n-1 x Y{1}
  • Note that although the growth rate is constant, the slope of the function is increasing over time; it is thus incorrect to interpret the slope of a plot of the levels of Y{t} against time as a growth rate
  • by taking logs of the growth data you can then infer stuff about the growth rates from the slope
  • taking logarithms compresses the scale of the data too
  • Thus taking logarithms not only linearises (straightens out) growing time series, but the successive changes in the logarithms can be used as estimates of the growth rate of the series over time.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the standard formula to calculate growth rate?

A

g{t}= 100 x ((Y{t}-Y{t-1})/(Y{t-1}))
which is the same as
g{t} = 100 x ((Y{t})/(Y{t-1}) -1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what can be seen to the growth rate of time series data if it changes by a constant amount each period?

A

A time series that changes by a constant amount each period thus exhibits declining growth rates over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the growth rate formula once you have taken logs?

A
  • LnY{t}-LnY{t-1}=Ln(growth rate)
  • LnY{t}-LnY{t-1} = Ln (1 + g/100) = Ln(1+x)
    where x=g/100 must be a positive but small numer (if g=10, x=0.1). The logarithmic series expansion under these condition is:
  • Ln(1+x)= x - (x^2/2)+(x^3/3)+(x^4/4)+…
  • Now, for small x, the terms in the expansion containing , , etc. will all be much smaller than x, so that the expansion can be approximated as:
  • Ln(1+x)≈x
    Thus, since x is the growth rate measured in decimals:
  • LnY{t}-LnY{t-1}=x
    or
    g=100x=100(LnY{t}-LnY{t-1})
    i.e., the change in the logarithms (multiplied by 100) is an estimate of the percentage growth rate at time t. Such a change is often denoted by an upper case delta, , e.g. .
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how can the whole growth rate formula be summarised?

A

We thus have the approximate equivalence:
g{t}=100 x ((Y{t}-Y{t-1})/(Y{t-1})) = = 100 x ((Y{t})/(Y{t-1}) -1)≈ 100(LnY{t}-LnY{t-1}) = 100Ln((Y{t})/(Y{t-1}))

-as long as ((Y{t})/(Y{t-1}) -1) is small. For a growth rate of 10%, using the change in the logarithms to approximate this rate gives . For a smaller growth rate, say 2%, gives a much more accurate approximation, as predicted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can you calculate the monthly rate of inflation?

A

π{t}^m= 100 x ((P{t}-P{t-1})/(P{t-1}))≈ 100(LnP{t}-LnP{t-1})

  • where P{t} is current price level and the logarithm of the price level is ln{P{t})
  • The notation π{t}^m is used to signify that what we are calculating is the monthly rate of inflation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Can you get the annual rate of inflation from the monthly rate of inflation?

A
  • if plotted on a graph the monthly rate of inflation scaled up by a factor of 12. It is an extremely volatile series which
  • if these changes had occurred in the rate of inflation would have
    resulted in a boom and bust business cycle.
    -We have used the
    standard formula to calculate the growth rate but we have made a
    mistake in the implementation
  • The problem is the scaling factor of 12 which we use to get the annualised rate of inflation, we could ignore the scaling factor and get monthly rates of inflation but this is not what analyst want to focus on
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the annual rate of inflation formula?

A
  • π{t}^a= 100 x ((P{t}-P{t-12})/(P{t-12}))≈ 100(LnP{t}-LnP{t-12)
  • where the rate of inflation is calculated by comparing the price level at time t with the level that occurred one year (12 months) previously. The time series for π{t}^a is shown in Figure 2.5 and is much smoother and less volatile than that for π{t}^m . The two rates are on the same scale, though, so that comparisons are valid. This is why π{t}^m needed to be scaled by a factor of 12.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is The Phillips Curve?

A
  • One of the most famous examples of applied economics was published in Economica by A.W. (Bill) Phillips in 1957. This investigated the relationship between the rate of inflation (π) and the unemployment rate (U) in the U.K. from 1861 to 1956 and found that a curve of the form
  • π=α + β(1/U)
  • This is an example of the first case in Figure 2.6 and implies that there is a trade-off between inflation and unemployment: to get inflation low, unemployment has to be high and vice versa
17
Q

What is some issues of the Philips Curve?

A
  • However, we should treat this fitted curve with some scepticism. The fit is quite loose, with many observations lying some distance from the line: we shall propose a measure of ‘goodness of fit’ in due course.
  • Furthermore, for some historical periods the Phillips Curve does not seem to hold at all. The observations shown with green dots are those between 1968 and 1981: a curve fitted to just these observations has a positive slope, implying that high rates of unemployment are accompanied by high inflation.
  • This is a period that is often referred to as an era of stagflation, a term defined as the concatenation of stagnation (i.e., high unemployment) and inflation.
18
Q

What are the most functional forms of equations used in Economics?

A
  • the shapes of three functional forms commonly found in economics. The first two use an inverse transformation on the independent variable X (i.e., ), while the third just transforms Y logarithmically (the semi-log functional form). Various other functional forms based on transformation of variables may be conceived of.
  • Y=α + β(1/X)
  • Y= α - β(1/X)
  • Ln(Y) = α + βX
19
Q

How do you calculate moving averages?

A
  • With time series data, we often want to focus on long term (permanent) movements without the eye being distracted by short run (transitory) fluctuations.
  • We thus often want to smooth the data and, while there are many very sophisticated ways of doing this, the simplest is to calculate a moving average. Perhaps the simplest of these is the three period, equal weighted, centred moving average, defined for a time series as:
  • MA{t}(3) = (X{t-1}+X{t}+X{t+1})/3
  • The more future and past values included in the Moving average calculation the smoother will be the Moving Average
20
Q

How can a odd moving averages be represented as a series?

A

The general formula for the “n+1 period moving average as follows where it is important to note that 2n+1 is odd:

  • MA{t}(2n+1)= (x{t-n}+…+x{t-1}+x{t}+x{t+1}+…+X{t+n})/(2n+1)
  • Σ_i=-n^n(x{t-i}/2n+1
  • Note that n observation are lost at the start and the end, and each value is weight by 1/(2n+1)
21
Q

How can a even moving average be calculated?

A
  • If analyst wants to use an even number of observations an adjustment must be made to the Moving Average
  • WMA{t}(5)=1/8(x{t-2}) + 1/4 (x{t-1})+1/4(x{t})+1/4(x{t+1})+1/8(x{t+2})
  • We can get an equal weight 4 period Moving Average by centering an even order moving average (centering it between the two middle values)
    e. g.
  • MA{t+1/2)}(4)= (x{t-1}+x({t}+x{t+1}+x{t+2})/(4)
22
Q

How do you calculate an odd weight moving average over an odd number of variables?

A
  • the general formula for a Weighted Moving Average:
  • WMA{t}(2n+1)=Σ_i=-n^n(w{i}x{t-i}
    where Σ_i=-n^n(w{i}) =1
23
Q

How can you decompose time Series Data?

A
  • The moving averages fitted in the above examples have been interpreted as trends, the long run, smoothly evolving component of a time series. In general, an observed time series may be decomposed into several components. We will consider a three component decomposition in which the observed series X{t} is decomposed into trend,T{t} , seasonal, S{t}, and irregular, I{t} , components.
    The decomposition can either be additive –> X{t}=T{t}+S{t}+I{t}
    Or multiplicative–>X{t}=T{t}xS{t}xI{t}
    although this distinction is in a sense artificial as taking logarithms ofthe multiplicative equation produces an additive decomposition for LnX{t}
24
Q

How can you decompose a seasonally adjusted time series?

A
  • The seasonal component is a regular, short term, annual cycle, so that it can only appear in series observed at higher than an annual frequency, typically monthly or quarterly. Since it is a regular cycle it should be relatively easy to isolate. The irregular component is what is left over after the trend and seasonal components have been ‘taken out’. It therefore should be random and hence unpredictable.
    -The seasonally adjusted series is then defined as either:
  • X{t}-S{t}=T{t}+I{t}
    OR
    -X{t}/S{t}=T{t}xI{t}
    -Other components are sometimes considered. With macroeconomic series such as GDP, a business cycle component is often considered. We assume here that the cycle is part of the trend component. With sales data, there can also be a trading day component, where the irregular needs adjusting for, say, the number of trading days, weekends or bank holidays in a month.