Pandas Pt 1 (Wk 4 UCSD) Flashcards by Douglas pereira

What is pandas

a library built on numpy, with flexible data structures

How well did you know this?

Not at all

Perfectly

how to use the pandas library

import pandas as pd

How well did you know this?

Not at all

Perfectly

What is a series in pandas

A one dimensional dict like structure (index, values), that allows for diff data types, and works w/ most numpy functions

How well did you know this?

Not at all

Perfectly

how to declare a series

ser = pd.series( data=[ values], index = [indices] ) (don’t have to say data = and index =)

How well did you know this?

Not at all

Perfectly

print the in dices of a series

print (ser.index)

How well did you know this?

Not at all

Perfectly

retrieve data from a series at a given index ‘Bob’

ser[ ‘Bob’ ] or ser.loc[ ‘Bob’ ]

How well did you know this?

Not at all

Perfectly

retrieve multiple data points in a series with index values

ser[ [ ‘bob’, ‘nancy’ ] ]

How well did you know this?

Not at all

Perfectly

retrieve data from series by indexing on position

ser[ [ 1, 2, 3 ] ]

How well did you know this?

Not at all

Perfectly

test if a given index is present in a series

‘bob’ in ser&raquo_space; returns boolean

How well did you know this?

Not at all

Perfectly

can you perform operations on a series, like you can with arrays?

yes. ser * 2 multiplies all values in series by 2

How well did you know this?

Not at all

Perfectly

What is a dataframe

it’s like a 2d series, where indices become row names, and name of each series becomes col names

How well did you know this?

Not at all

Perfectly

How do you create a dictionary with multiple sets of series, which you could then assign to a dataframe?

d = {‘one’ : pd.Series([values], index=[indices]),

‘two’ : pd.Series([values], index=[indices])}

How well did you know this?

Not at all

Perfectly

create a dataframe using a dictionary of series

pd_dataframe = pd. dataframe(dict_of_series)

How well did you know this?

Not at all

Perfectly

retrieve the row names from a dataframe

df.index

How well did you know this?

Not at all

Perfectly

retrieve the column names from a dataframe

df.columns

How well did you know this?

Not at all

Perfectly

print a dataframe in a nice tabular format

Study These Flashcards

df_var (just list the var, which prints the value of the variable)

create a dataframe with a subset of the rows (indices)

Study These Flashcards

pd.dataframe(df_var, index = [indices] )

create a dataframe with a subset of rows and columns

Study These Flashcards

pd.dataframe(df_var, index = [indices], columns= [column_values] )

what happens if you create a data frame from a list of dictionaries?

Study These Flashcards

the indices from each dictionary become the column names, and the rows represent each dict (opp. of when you build w/ dict. of series)

retrieve the values of a column from a dataframe, w/ a column name of ‘two’

Study These Flashcards

df [ ‘two’ ]

create a third col in a df that equals the product of df columns ‘one’ and ‘two’

Study These Flashcards

df[ ‘three’ ] = df[‘one’] * df[‘two’]

create a bool col ‘flag’ based on the values of col ‘two’ that are greater than 100

Study These Flashcards

df[‘flag’] = df[‘two’] > 100

retrieve the value and delete a column from a df

Study These Flashcards

three = df.pop(‘three’)

how to delete col ‘two’ from a dataframe

Study These Flashcards

del df[‘two’]

append a new column onto the end of a df

df.insert(2, 'copy_of_one', df['one']) (where two would be the position of the next column)

function to read csvs, json, html into pandas

read_csv, read_json, read_html, read_sql_query, read_sql_table

function to read json into pandas

read_json

retrieve the values in a dataframe from row 1 and 2

df_var[ 1:2 ] df_var.iloc[ [1,2] ] ## without iloc, have to slice (can't just say [1] for instance

ingest csv data into a data frame

df_var_csv = pd.read_csv('filename', sep = ' , ' ) ## comma separated

slice out column with name 'ratings' from a dataframe

df[ 'ratings' ]

specify a column as the index, and then retrieve values on that index

movies = movies.set_index("movieId") | movies.loc[ 1 ] ## using the movieId as index now

Pandas Pt 1 (Wk 4 UCSD) Flashcards

(31 cards)