subset observations (Rows) and subset variables (columns) Flashcards
Extract rows that meet logical criteria.
df[df.Length > 7]
Remove duplicate rows (only considers columns).
df.drop_duplicates()
Select first n rows.
df.head(n)
Select last n rows.
df.tail(n)
Randomly select fraction of rows.
df.sample(frac=0.5)
Randomly select n rows.
df.sample(n=10)
Select rows by position.
df.iloc[10:20]
Get the rows of a DataFrame sorted by the n largest values of columns.
DataFrame.nlargest(n, columns, keep=’first’)
Get the rows of a DataFrame sorted by the n smallest values of columns.
DataFrame.nlargest(n, columns, keep=’first’)
Select multiple columns with specific names.
df[[‘width’,’length’,’species’]]
Select single column with specific name.
df[‘width’] or df.width
Select columns whose name matches regular expression regex.
df.filter(regex=’regex’)
Select all columns between x2 and x4 (inclusive).
df.loc[:,’x2’:’x4’]
Select columns in positions 1, 2 and 5 (first column is 0).
df.iloc[:,[1,2,5]]
Select rows meeting logical condition, and only the specific columns .
df.loc[df[‘a’] > 10, [‘a’,’c’]]