Pan__das 2 Flashcards
Row slicing
dict(df.iloc[-2]) #-n nth row
df.iloc[-n,:]
column slicing
df.iloc[:,-n]#nth column data
check empty df
if df.empty:
print(“Empty”)
or check len(df)
set column to index
df.set_index(‘column name’)
Select values in one column based on the condition from other
df.loc[df[‘column name’] == ‘condition’] #full row
df[‘other column name].loc[df[‘column name’] == ‘condition’]
pandas list append
df.append(pd.series([the list], inex=df.columns), ignore_index=True)
pandas datatype change
df = df.astype({“col name”: “datatype”})
delete pandas column on condition
df = df.drop(
df [ (df.score < 50) & (df.score > 20) ].index
)
Append rows to dataframe using series
# A series object with same index as dataframe series_obj = pd.Series( ['Raju', 21, 'Bangalore', 'India'], index=df.columns ) # Add a series as a row to the dataframe mod_df = df.append( series_obj, ignore_index=True)
Append rows to dataframe using loc
# New list for append into df list = ["Saurabh", 23, "Delhi", "india"]
# using loc methods df.loc[len(df)] = list
Append rows to dataframe using iloc
# new list to append into df list = ['Ujjawal', 22, 'Fathua', 'India']
# usinf iloc df.iloc[2] = list
how to check empty df
if df.empty:
print(‘DataFrame is empty!’)
tqdm
What is tqdm? tqdm is a Python library that allows you to output a smart progress bar by wrapping around any iterable. A tqdm progress bar not only shows you how much time has elapsed, but also shows the estimated time remaining for the iterable.
import pandas as pd
import numpy as np
from tqdm import tqdm
# Generate a dataframe with random numbers of shape 1,000 x 1,000 df = pd.DataFrame(np.random.randint(0, 100, (100000, 1000))) # Register `pandas.progress_apply` with `tqdm` tqdm.pandas(desc='Processing Dataframe') # Add 3 to each value then cube for entire dataframe df.progress_apply(lambda x: (x+3)**3)
list of uneven list to pandas –> at the list index level
import itertools, pandas
pandas.DataFrame((_ for _ in itertools.zip_longest(*ff))).T
[[[1, 2, 3, 4], [1, 2], [], [2, 2, 2]]
df = 1,2,34 1,2,nan,nan nan, nan, nan, nan 2,2,2,nan
list of uneven list to pandas –> at the list of list index level
df = pd.DataFrame(list(res[213].values())).T
[[[1, 2, 3, 4], [1, 2], [], [2, 2, 2]]
1,1,nan,2
2,2,nan,2
3,nan,nan,2
4,nan,nan,nan