Pandas Flashcards

1
Q

read csv file into dataframe data

A

data = pd.read_csv(‘weights_heights.csv’, index_col=’Index’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

plot histogram

A

data.plot(y=’Height’, kind=’hist’,

color=’red’, title=’Height’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

look first five records in dataframe

A

data.head(5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

use lambda function to create new column in dataframe, which is result of function acting on two other columns

def make_bmi(height_inch, weight_pound):
    METER_TO_INCH, KILO_TO_POUND = 39.37, 2.20462
    return (weight_pound / KILO_TO_POUND) / \
           (height_inch / METER_TO_INCH) ** 2
A

data[‘BMI’] = data.apply(lambda row: make_bmi(row[‘Height’],
row[‘Weight’]), axis=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

create new column with categories for other column

def weight_category(weight):
    if weight < 120:
        return 1
    elif weight > 150:
        return 3
    else:
        return 2
A

data[‘weight_category’] = data[‘Weight’].apply(weight_category)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

plot scatterplot

A

data.plot(‘Weight’, ‘Height’, kind=’scatter’,title=’Height/Weight’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

look on statistics for features in dataframe

A

data.describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

create new dataframe X_sub using 3 columns of dataframe data

A

X_sub = data.iloc[:,[0, 1, 2]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

create numpy array X_np from pandas dataframe data

A

X_np = data.values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

load big data file ‘checkins.dat’ into python

A

checkins = pd.read_csv(‘checkins.dat’, header=0, skipinitialspace = True, names=[‘lat’, ‘lng’], usecols = [3,4], engine=’python’, sep = ‘|’, skipfooter=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

drop rows of Pandas DataFrame whose value in certain columns is NaN
(Cliffs:Just take rows where EPS is finite)

A

1) df = df[np.isfinite(df[‘all_integer’])]
2) df.dropna() #drop all rows that have any NaN values
3) df.dropna(how=’all’) #drop only if ALL columns are NaN
4) df[df.all_integer.notnull()]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly