Pandas Flashcards
Create a new DataFrame
df = pd.DataFrame() (Atenção com as maiúsculas!)
parâmetro axis=0
eixo x
parâmetro axis=1
eixo y
selecionar colunas W e Z do dataframe
df[[‘W’,’Z’]]
cria coluna nova no dataframe
df[‘new’] = df[‘W’] + df[‘Y’]
Algo to cut the lines of the DataFrame up to a value
Find the line index using line= df.loc[df[‘COL’] == ‘Limite’].index.min()
df = df[df.index < line]
Cut the columns on a DataFrame
col_names = [‘Data’, ‘A’]
df = df[col_names]
Caps Lock no nome das colunas
df.columns = df.columns.str.upper()
Substituir nan em uma coluna por um valor
df[‘A’] = df[‘A’].fillna(‘0,0’)
aplicar uma funcao em uma serie
df[‘A’] = df[‘A’].apply(function)
change a DataFrame column from string to date
import datetime
df[‘A’] = pd.to_datetime(df[‘A’], format=’%d/%m/%y’)
truncate (cut) values on a column
df[‘A’] = df[‘A’].map(str).str.slice(0,10)
consolida dataframes em um dataframe final
df_final.append(df_bradesco)
df_final = pd.concat(df_final, axis=0)
transforma uma coluna para o tipo string
df[‘A’].astype(str)
remover duplicatas de um dataframe
df.drop_duplicates(inplace=True)
left join de dois dataframes
df_final = pd.merge(df_final,df_teste,how=’left’,left_on=’KEY’,right_on=’KEY’)
para cada linha da serie, recebe um valor de outra serie
df.at[chave_base, ‘A’] = df_teste.at[chave_teste, ‘A’]
Saving DataFrames to an Excel Workbook
from pandas import ExcelWriter
writer = ExcelWriter(‘filename.xlsx’)
df1. to_excel(writer, ‘Sheet1’)
writer. save()
Save DataFrame as a dictionary
d = df.to_dict()
Save DataFrame as a string
str = df.to_string()
Save DataFrame as a numpy matrix
m = df.to_matrix()
Transpose rows and columns in a DataFrame
df = df.T
Iterate between columns
df.iteritems()
Iterate between rows
df.iterrows()
Filter a DataFrame with like
df = df.filter(like=’x’)
Get first column label in a DataFrame
label = df.columns[0]
Get list of column labels in a DataFrame
lis = df.columns.tolist()
Get an array of column labels in a DataFrame
a = df.columns.values
Select column to series
s= df[‘colName’]
Select column to DataFrame
df = df[[‘colName] ]
Select all but last column to a DataFrame
df = df[df.columns[:-1] ]
Swap columns content in a DataFrame
df[[‘B’, ‘A’] ] = df[[‘A’, ‘B’] ]
Dropping (deleting) columns
df.drop(‘col1’, axis=1, inplace=True)
Apply log to a column
df[‘log_data’] = np.log( df[‘col1’] )
Set column values based on criteria
df]’d’] = df[‘a].where(df.b != 0, other=df.c)
Find index label for min/max values in column
label = df[‘col1’].idxmin()
label = df[‘col1’].idxmax()
Module of a column in a DataFrame
df[‘col’] = df[‘col’].abs()
Convert column to date
s = df[‘col’],to_datetime()
Create a column with a rolling pct change
s = df[‘col’].pct_change(periods=4)
Create a column with a rolling calculation
s = df[‘col’].rolling(window=4, min_periods=4, center=False).sum()
Append a column of row sums in a DataFrame
df[‘Total’] = df.sum(axis=1)
Get the integer position of a column index label
i = df.columns.get_loc(‘col_name’)
Adding rows to a DataFrame
df = original_df.append(more_rows_in_df)
Dropping rows (by name)
df = df.drop(‘row_label’)
Boolean row selection by values greater than
df = df[df[‘col2’] >= 0.0]
Boolean row selection by values with OR condition
df = df[(df[‘col3’] >= 0.0) | (df[‘col1’] < 0.0)
need parenthesis around comparisons
Boolean row selection by values in list
df = df[df[‘col’].isin( [1, 2, 5, 7, 11] ) ]
Boolean row selection by values NOT in list
df = df[~ df[‘col’].isin( [1, 2, 5, 7, 11] ) ]
Boolean row selection by values containing
df = df[df[‘col’].str.contains(‘hello’) ]
Get integer position of rows that meet condition
a = np.where(df[‘col’] >= 2)
produces a numpy array
Find row index duplicates
if df.items.has_duplicates:
prin(df.index.duplicated() )
Select a cell by row and column labels
value = df.at[‘row’, ‘col’]
.at[] is the fastest label based scalar lookup
Grouping with an aggregating function
s = df.groupby(‘cat’)[‘col1’].sum()
Change a string to lower case
s = df[‘col’].str.lower()
Get the length of the strings in a column
s =df[‘col’].str.len()
Append values to a string
df[‘col’] += ‘suffix’
Filter a DataFrame with a list of items
df = df.filter(items[‘a’ , ‘b’], axis=0)
Remove lines with NaN
df.dropna(inplace=True)
Definition of the Strip method for strings
Returns a copy of the string with both leading and trailing characters removed (based on the string argument passed).