pandas Flashcards
info about the df
df.info()
dimension
df.shape
3,1 entry of df
df.iloc[3,1]
3rd entry of column called A
df.A[2]
1st-3rd rows, 1st-3rd columns
df.iloc[0:3,], df.iloc[:,0:3], note the colon needed to get the columns
replace something in list of strings
temp_names = [word.replace(“.”, “_”) for word in list(df)]
add to front of list
a.insert(0,x)
sort list
sorted(mylist), or to modify the list mylist.sort()
drop element of list
a.pop(5)
list of lists
a[2][3]
selection from lists
[x for x in nums if x>=0]
inert items into list
r=[1,2,3,4]
r[1:1] =[9,8]
r
[1, 9, 8, 2, 3, 4]
sample dict
looping over dict key-value pairs
looping over keys
looping over values
ratings = {‘4+’: 4433, ‘9+’: 987}
for fruit, qty in fruit_freq.items():
for fruit in fruit_freq.keys():
for qty in fruit_freq.values()
repeat list
a = [2,0]*4
append vs extend
x = [1,2]
x.append([3,4]) gives [1,2,[3,4]]
x.extend([3,4]) gives [1,2,3,4]
zip and lists
x=[1,2,3]
y=[4,5,6]
list(zip(x,y))
[(1, 4), (2, 5), (3, 6)]
convert string to list
list(‘hello’)
get multiple values from list
lst=[1,5,8,9]
indices=[1,3]
[value for (i, value) in enumerate(lst) if i in set(indices) ]
Out[35]: [5, 9]
count number of occurrences in list
y = [1,2,3,1,4]
y.count(1)
2
get index of item in list
first index:
[“foo”,”bar”,”baz”].index(‘bar’)
all indices
indexes = [i for i,x in enumerate(xs) if x == ‘foo’]
unique elements of list
mynewlist = list(set(mylist))
Example creating df
In [9]: df2 = pd.DataFrame(
…: {
…: “A”: 1.0,
…: “B”: pd.Timestamp(“20130102”),
…: “C”: pd.Series(1, index=list(range(4)), dtype=”float32”),
…: “D”: np.array([3] * 4, dtype=”int32”),
…: “E”: pd.Categorical([“test”, “train”, “test”, “train”]),
…: “F”: “foo”,
…: }
…: )
…:
In [10]: df2
Out[10]:
A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo
Types of columns
df.dtypes
Summary each column
df.describe()
Transpose
df T
Sort by an index or by a column
In [22]: df.sort_index(axis=1, ascending=False)
Out[22]:
D C B A
2013-01-01 -1.135632 -1.509059 -0.282863 0.469112
2013-01-02 -1.044236 0.119209 -0.173215 1.212112
….
In [23]: df.sort_values(by=”B”)
Out[23]:
A B C D
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804
2013-01-04 0.721555 -0.706771 -1.039575 0.271860
2013-01-01 0.469112 -0.282863 -1.509059 -1.135632
….
Select row with value of index
In [27]: df.loc[dates[0]]
Out[27]:
A 0.469112
B -0.282863
C -1.509059
D -1.135632
Name: 2013-01-01 00:00:00, dtype: float64
Select by multiple indices
NOTE: For label slicing, both endpoints are included:
df.loc[“20130102”:”20130104”, [“A”, “B”]]
Out[29]:
A B
2013-01-02 1.212112 -0.173215
2013-01-03 -0.861849 -2.104569
2013-01-04 0.721555 -0.706771
For getting fast access to a scalar
df.at[dates[0], “A”] Out[31]: 0.4691122999071863
Does same thing as below, but above faster
In [30]: df.loc[dates[0], “A”]
Out[30]: 0.4691122999071863
Select multiple rows and positions by number
In [33]: df.iloc[3:5, 0:2]
Out[33]:
A B
2013-01-04 0.721555 -0.706771
2013-01-05 -0.424972 0.567020