Pandas Flashcards

Question 1

Q

read csv file into dataframe data

Answer

A

data = pd.read_csv(‘weights_heights.csv’, index_col=’Index’)

Question 2

Q

plot histogram

Answer

A

data.plot(y=’Height’, kind=’hist’,

color=’red’, title=’Height’)

Question 3

Q

look first five records in dataframe

Answer

A

data.head(5)

Question 4

Q

use lambda function to create new column in dataframe, which is result of function acting on two other columns

def make_bmi(height_inch, weight_pound):
    METER_TO_INCH, KILO_TO_POUND = 39.37, 2.20462
    return (weight_pound / KILO_TO_POUND) / \
           (height_inch / METER_TO_INCH) ** 2

Answer

A

data[‘BMI’] = data.apply(lambda row: make_bmi(row[‘Height’],
row[‘Weight’]), axis=1)

Question 5

Q

create new column with categories for other column

def weight_category(weight):
    if weight < 120:
        return 1
    elif weight > 150:
        return 3
    else:
        return 2

Answer

A

data[‘weight_category’] = data[‘Weight’].apply(weight_category)

Question 6

Q

plot scatterplot

Answer

A

data.plot(‘Weight’, ‘Height’, kind=’scatter’,title=’Height/Weight’)

Question 7

Q

look on statistics for features in dataframe

Answer

A

data.describe()

Question 8

Q

create new dataframe X_sub using 3 columns of dataframe data

Answer

A

X_sub = data.iloc[:,[0, 1, 2]]

Question 9

Q

create numpy array X_np from pandas dataframe data

Answer

A

X_np = data.values

Question 10

Q

load big data file ‘checkins.dat’ into python

Answer

A

checkins = pd.read_csv(‘checkins.dat’, header=0, skipinitialspace = True, names=[‘lat’, ‘lng’], usecols = [3,4], engine=’python’, sep = ‘|’, skipfooter=1)

Question 11

Q

drop rows of Pandas DataFrame whose value in certain columns is NaN
(Cliffs:Just take rows where EPS is finite)

Answer

A

1) df = df[np.isfinite(df[‘all_integer’])]
2) df.dropna() #drop all rows that have any NaN values
3) df.dropna(how=’all’) #drop only if ALL columns are NaN
4) df[df.all_integer.notnull()]

Pandas Flashcards

(11 cards)