Pandas Time Series Flashcards by James Monahan

from datetime import datetime

datetime(year=2015, month=7, day=4)

manually build a date using the datetime type

datetime.datetime(2015, 7, 4, 0, 0)

How well did you know this?

Not at all

Perfectly

from dateutil import parser
date = parser.parse(“4th of July, 2015”)
date

using the dateutil module, you can parse dates from a variety of string formats

How well did you know this?

Not at all

Perfectly

date.strftime(‘%A’)

Once you have a datetime object, you can do things like printing the day of the week:

How well did you know this?

Not at all

Perfectly

datetime(year=1976, month=9, day=13).strftime(‘%A’+’ %B’)

‘Monday September’

How well did you know this?

Not at all

Perfectly

import numpy as np
date = np.array(‘2015-07-04’, dtype=np.datetime64)
date

NumPy team to add a set of native time series data type to NumPy

How well did you know this?

Not at all

Perfectly

numpy datetime

date + np.arange(12)

Once we have this date formatted, however, we can quickly do vectorized operations on it

How well did you know this?

Not at all

Perfectly

np.datetime64(‘2015-07-04 12:00’)

Here is a minute-based datetime

NumPy will infer the desired unit from the input

How well did you know this?

Not at all

Perfectly

np.datetime64(‘2015-07-04 12:59:59.50’, ‘ns’)

Y Year ± 9.2e18 years [9.2e18 BC, 9.2e18 AD]
M Month ± 7.6e17 years [7.6e17 BC, 7.6e17 AD]
W Week ± 1.7e17 years [1.7e17 BC, 1.7e17 AD]
D Day ± 2.5e16 years [2.5e16 BC, 2.5e16 AD]
h Hour ± 1.0e15 years [1.0e15 BC, 1.0e15 AD]
m Minute ± 1.7e13 years [1.7e13 BC, 1.7e13 AD]
s Second ± 2.9e12 years [ 2.9e9 BC, 2.9e9 AD]
ms Millisecond ± 2.9e9 years [ 2.9e6 BC, 2.9e6 AD]

The following table, drawn from the NumPy datetime64 documentation, lists the available format codes along with the relative and absolute timespans that they can encode

How well did you know this?

Not at all

Perfectly

Pandas TIMESTAMP
import pandas as pd
date = pd.to_datetime(“4th of July, 2015”)
date

Timestamp(‘2015-07-04 00:00:00’)

How well did you know this?

Not at all

Perfectly

numpy style operations on pandas object

date + pd.to_timedelta(np.arange(12), ‘D’)

DatetimeIndex([‘2015-07-04’, ‘2015-07-05’, ‘2015-07-06’, ‘2015-07-07’,
‘2015-07-08’, ‘2015-07-09’, ‘2015-07-10’, ‘2015-07-11’,
‘2015-07-12’, ‘2015-07-13’, ‘2015-07-14’, ‘2015-07-15’],
dtype=’datetime64[ns]’, freq=None)

How well did you know this?

Not at all

Perfectly

index = pd.DatetimeIndex([‘2014-07-04’, ‘2014-08-04’,
‘2015-07-04’, ‘2015-08-04’])
data = pd.Series([0, 1, 2, 3], index=index)
data

Pandas time series tools really become useful is when you begin to index data by timestamps

How well did you know this?

Not at all

Perfectly

data[‘2014-07-04’:’2015-07-04’]

data[‘2015’]

make use of any of the Series indexing patterns we discussed in previous sections, passing values that can be coerced into dates:

passing a year to obtain a slice of all data from that year:

How well did you know this?

Not at all

Perfectly

dates = pd.to_datetime([datetime(2015, 7, 3), ‘4th of July, 2015’,
‘2015-Jul-6’, ‘07-07-2015’, ‘20150708’])
dates

passing a series of dates by default yields a DatetimeIndex

How well did you know this?

Not at all

Perfectly

dates.to_period(‘D’)

Any DatetimeIndex can be converted to a PeriodIndex with the to_period() function with the addition of a frequency code; here we’ll use ‘D’ to indicate daily frequency:

How well did you know this?

Not at all

Perfectly

dates - dates[0]

A TimedeltaIndex is created, for example, when a date is subtracted from another:

How well did you know this?

Not at all

Perfectly

pd. date_range(‘2015-07-03’, ‘2015-07-10’)
pd. date_range(‘2015-07-03’, periods=8)
pd. date_range(‘2015-07-03’, periods=8, freq=’H’)

Study These Flashcards

pd.date_range() accepts a start date, an end date, and an optional frequency code to create a regular sequence of dates. By default, the frequency is one day:

Alternatively, the date range can be specified not with a start and endpoint, but with a startpoint and a number of periods:

The spacing can be modified by altering the freq argument, which defaults to D. For example, here we will construct a range of hourly timestamps:

pd. period_range(‘2015-07’, periods=8, freq=’M’)

pd. timedelta_range(0, periods=10, freq=’H’)

Study These Flashcards

To create regular sequences of Period or Timedelta values, the very similar pd.period_range() and pd.timedelta_range() functions are useful.

Code	Description	Code	Description
D	Calendar day	B	Business day
W	Weekly		
M	Month end	BM	Business month end
Q	Quarter end	BQ	Business quarter end
A	Year end	BA	Business year end
H	Hours	BH	Business hours
T	Minutes		
S	Seconds		
L	Milliseonds		
U	Microseconds		
N	nanoseconds

Study These Flashcards

Fundamental to these Pandas time series tools is the concept of a frequency or date offset. Just as we saw the D (day) and H (hour) codes above, we can use such codes to specify any desired frequency spacing. The following table summarizes the main codes available:

Code Description Code Description
MS Month start BMS Business month start
QS Quarter start BQS Business quarter start
AS Year start BAS Business year start

Study These Flashcards

The monthly, quarterly, and annual frequencies are all marked at the end of the specified period. By adding an S suffix to any of these, they instead will be marked at the beginning:

Additionally, you can change the month used to mark any quarterly or annual code by adding a three-letter month code as a suffix:

Q-JAN, BQ-FEB, QS-MAR, BQS-APR, etc.
A-JAN, BA-FEB, AS-MAR, BAS-APR, etc.
In the same way, the split-point of the weekly frequency can be modified by adding a three-letter weekday code:

W-SUN, W-MON, W-TUE, W-WED, etc.

Study These Flashcards

3 letter codes

pd.timedelta_range(0, periods=9, freq=”2H30T”)

Study These Flashcards

On top of this, codes can be combined with numbers to specify other frequencies. For example, for a frequency of 2 hours 30 minutes, we can combine the hour (H) and minute (T) codes as follows:

from pandas.tseries.offsets import BDay

pd.date_range(‘2015-07-01’, periods=5, freq=BDay())

Study These Flashcards

All of these short codes refer to specific instances of Pandas time series offsets, which can be found in the pd.tseries.offsets module. For example, we can create a business day offset directly as follows:

the accompanying pandas-datareader package (installable via conda install pandas-datareader)

Study These Flashcards

knows how to import financial data from a number of available sources, including Yahoo finance, Google Finance, and others. Here we will load Google’s closing price history:

from pandas_datareader import data

goog = data.DataReader(‘GOOG’, start=’2004’, end=’2016’,
data_source=’yahoo’)
goog.head()

Study These Flashcards

datareader

goog.plot(alpha=0.5, style='-') goog.resample('BA').mean().plot(style=':') goog.asfreq('BA').plot(style='--'); plt.legend(['input', 'resample', 'asfreq'], loc='upper left');

at each point, resample reports the average of the previous year, while asfreq reports the value at the end of the year.

fig, ax = plt.subplots(2, sharex=True) data = goog.iloc[:10] data.asfreq('D').plot(ax=ax[0], marker='o') data.asfreq('D', method='bfill').plot(ax=ax[1], style='-o') data.asfreq('D', method='ffill').plot(ax=ax[1], style='--o') ax[1].legend(["back-fill", "forward-fill"]);

For up-sampling, resample() and asfreq() are largely equivalent, though resample has many more options available. In this case, the default for both methods is to leave the up-sampled points empty, that is, filled with NA values. Just as with the pd.fillna() function discussed previously, asfreq() accepts a method argument to specify how values are imputed. Here, we will resample the business day data at a daily frequency (i.e., including weekends):

``` # apply a frequency to the data goog = goog.asfreq('D', method='pad') ``` goog. plot(ax=ax[0]) goog. shift(900).plot(ax=ax[1]) goog. tshift(900).plot(ax=ax[2])

Another common time series-specific operation is shifting of data in time. Pandas has two closely related methods for computing this: shift() and tshift() In short, the difference between them is that shift() shifts the data, while tshift() shifts the index. In both cases, the shift is specified in multiples of the frequency. Here we will both shift() and tshift() by 900 days; For example, we use shifted values to compute the one-year return on investment for Google stock over the course of the dataset: ?????????????????

rolling = goog.rolling(365, center=True) data = pd.DataFrame({'input': goog, 'one-year rolling_mean': rolling.mean(), 'one-year rolling_std': rolling.std()}) ax = data.plot(style=['-', '--', ':']) ax.lines[0].set_alpha(0.3)

here is the one-year centered rolling mean and standard deviation of the Google stock prices

data = pd.read_csv('FremontBridge.csv', index_col='Date', parse_dates=True) data.head()

We will specify that we want the Date as an index, and we want these dates to be automatically parsed:

weekly = data.resample('W').sum()

We can gain more insight by resampling the data to a coarser grid. Let's resample by week:

daily.rolling(50, center=True, | win_type='gaussian').sum(std=10).plot(style=[':', '--', '-']);

The jaggedness of the result is due to the hard cutoff of the window. We can get a smoother version of a rolling mean using a window function–for example, a Gaussian window. The following code specifies both the width of the window (we chose 50 days) and the width of the Gaussian within the window (we chose 10 days):

weekend = np.where(data.index.weekday < 5, 'Weekday', 'Weekend')

numpy.where(condition[, x, y]) Return elements, either from x or y, depending on condition.

Pandas Time Series Flashcards

(32 cards)