Pandas Pt 1 (Wk 4 UCSD) Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is pandas

A

a library built on numpy, with flexible data structures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how to use the pandas library

A

import pandas as pd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a series in pandas

A

A one dimensional dict like structure (index, values), that allows for diff data types, and works w/ most numpy functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how to declare a series

A

ser = pd.series( data=[ values], index = [indices] ) (don’t have to say data = and index =)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

print the in dices of a series

A

print (ser.index)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

retrieve data from a series at a given index ‘Bob’

A

ser[ ‘Bob’ ] or ser.loc[ ‘Bob’ ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

retrieve multiple data points in a series with index values

A

ser[ [ ‘bob’, ‘nancy’ ] ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

retrieve data from series by indexing on position

A

ser[ [ 1, 2, 3 ] ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

test if a given index is present in a series

A

‘bob’ in ser&raquo_space; returns boolean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

can you perform operations on a series, like you can with arrays?

A

yes. ser * 2 multiplies all values in series by 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a dataframe

A

it’s like a 2d series, where indices become row names, and name of each series becomes col names

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you create a dictionary with multiple sets of series, which you could then assign to a dataframe?

A

d = {‘one’ : pd.Series([values], index=[indices]),

‘two’ : pd.Series([values], index=[indices])}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

create a dataframe using a dictionary of series

A

pd_dataframe = pd. dataframe(dict_of_series)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

retrieve the row names from a dataframe

A

df.index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

retrieve the column names from a dataframe

A

df.columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

print a dataframe in a nice tabular format

A

df_var (just list the var, which prints the value of the variable)

17
Q

create a dataframe with a subset of the rows (indices)

A

pd.dataframe(df_var, index = [indices] )

18
Q

create a dataframe with a subset of rows and columns

A

pd.dataframe(df_var, index = [indices], columns= [column_values] )

19
Q

what happens if you create a data frame from a list of dictionaries?

A

the indices from each dictionary become the column names, and the rows represent each dict (opp. of when you build w/ dict. of series)

20
Q

retrieve the values of a column from a dataframe, w/ a column name of ‘two’

A

df [ ‘two’ ]

21
Q

create a third col in a df that equals the product of df columns ‘one’ and ‘two’

A

df[ ‘three’ ] = df[‘one’] * df[‘two’]

22
Q

create a bool col ‘flag’ based on the values of col ‘two’ that are greater than 100

A

df[‘flag’] = df[‘two’] > 100

23
Q

retrieve the value and delete a column from a df

A

three = df.pop(‘three’)

24
Q

how to delete col ‘two’ from a dataframe

A

del df[‘two’]

25
Q

append a new column onto the end of a df

A

df.insert(2, ‘copy_of_one’, df[‘one’]) (where two would be the position of the next column)

26
Q

function to read csvs, json, html into pandas

A

read_csv, read_json, read_html, read_sql_query, read_sql_table

27
Q

function to read json into pandas

A

read_json

28
Q

retrieve the values in a dataframe from row 1 and 2

A

df_var[ 1:2 ] df_var.iloc[ [1,2] ] ## without iloc, have to slice (can’t just say [1] for instance

29
Q

ingest csv data into a data frame

A

df_var_csv = pd.read_csv(‘filename’, sep = ‘ , ‘ ) ## comma separated

30
Q

slice out column with name ‘ratings’ from a dataframe

A

df[ ‘ratings’ ]

31
Q

specify a column as the index, and then retrieve values on that index

A

movies = movies.set_index(“movieId”)

movies.loc[ 1 ] ## using the movieId as index now