Python Pandas - Udemy Flashcards

Question

What does series.apply() do?

Answer 1

series.apply() accepts a function as a parameter and then applies that function to all the values in the series. eg: series.apply( lambda stockprice : stockprice + 1)

Answer 2

performs a v lookup type function on 2 seperate series. I need to explore this further

Answer 3

DataFrames are 2 dimensional array. What does 2 dimensions mean : it means you need 2 pieces of info. to access a particular value i.e row and column #

Answer 4

If some of the values in the columns are NANs pandas DataFrames converts the entire column into Floats...reason not yet known

Answer 5

Basic info about the dataframe as well as the number of non null values in each column.

Answer 6

returns the combined result of df.index() and df.columns

Answer 7

what does df.sum(axis=1) return?

Answer 8

df ["abc"] this command returns a series

Answer 9

df [["abc","def"] ] or select = ["abc","def"] df [select] both the above return the same resulting DataFrame

Answer 10

**df ["Sport"] = " Basket Ball"** inserts the column Sport at the end of the DataFrame and populates all rows with the value 'Basket Ball' **df.insert( 5, column = "Sport", value= "Basket Ball")** this inserts the 'Sport' column in the 5'th position with the value 'Basket Ball' in all rows

Answer 11

df ["Salary"].add(20) or df ["Salary"] + 20 These are called Broadcast methods and can be used with all the other mathematical functions as well.

Answer 12

df ["abc"].value\_counts().head(). Imp: value\_counts() can only be used on series objects

Answer 13

df.dropna() by default this method will remove all rows even with a single nan value in the columns

Answer 14

df.dropna( subset= ["column name", "column name 2"] )

Answer 15

df [" abc"].fillna( "Hello", inplace=True) This will fill all NaN values in the 'abc' column with the string "Hello"

Answer 16

df ["Salary"] .astype ("int") * Must remember that all NaNs must be removed or replaced for this method to work * There is no inplace parameter so you must assign value to a variable for the change to be permanent

Answer 17

An entire DataFrame can be sorted only by a particular column. df.sort\_values("Salary") If the column has NaN values they will be at the end of the dataframe or will occupy the last position.

Answer 18

df.sort\_values( [" col 1" , " col 2"], ascending = [True , False] ) This sorts the dataframe first based on col 1 and then col 2. Col 1 in ascending order and col 2 in descending order

Answer 19

df ["String\_Date"] = pd.to\_datetime (df ["String\_Date"])

Answer 20

df[" Management"] = df [" Management"].astype("Category")

Answer 21

df [df ["Gender"] == " Male"] or filter = df [" Gender"] = " Male" df [filter]

Answer 22

filter 1 = df [" Gender"] == ' Male' filter 2 = df [" Team"] == ' Marketing' df [filter 1 & filter 2]

Answer 23

filter = df [" Team "].isin( [" Legal" , " Sales", " Marketing "] ) df [filter] You can also pass a series into the isin() method eg. df [" Team"].isin ( df2 [" Team"] )

Answer 24

isnull() returns True if a given column is a NaN else False. notnull() returns True if a given column is not a Nan else False.

Answer 25

df [" Salary"].between( 60000, 70000) or x= df [" Salary"] \> = 60000 y= df [" Salary"] \< = 70000 df [x & y]

Answer 26

It returns the reverse of a Boolean value. i.e. True becomes False False becomes True

Answer 27

Returns the boolean value True for all duplicate values of the Name column except for the first occurance which returns False. If the are 4 Toms it will returns 1 False and 3 Trues

Answer 28

df.drop\_duplicates (subset= [" Name", " Team"],keep=False,inplace=True)

Answer 29

unique () returns an array of unique values that will also count NaN as unique. nunique( ) will return an inter of the count of unique values. This will not count the NaN as the parameter dropna=True is set by default

Answer 30

df.set\_index( "Col\_name") to reverse change df.reset\_index()

Answer 31

extract rows using index labels

Answer 32

df.iloc [18 : 36] note index 36 will not be returned in iloc[]

Answer 33

It is a combination of the iloc[] and the loc[] methods. It accepts both string labels as well as integer indexes as arguments. Note : When using labels in ix[] and you specify a range or a list and one of the labels does not exist in the dataframe python returns a NaN value for the missing label. BUT When using index values in ix[] and you specify a range or a list and one of the indeces does not exist in the dataframe python returns an error value for the entire query.

Answer 34

df.ix["James", "Salary"] = 80000 This changes the James row and Salary column to 80000

Answer 35

Finds all instances where Team = Marketing and then replaces ' Marketing' with ' Online Marketing' in the dataframe

Answer 36

df.rename ( { " Team":"Dept", " Salary": " Compensation"}, inplace=True) The rename ( ) accepts a dictionary as a parameter.

Answer 37

df.drop( " Team",inplace=True) or df.pop( "Team") This method removes "Team" from the dataframe and returns the column team as a series. or del df.Team

Answer 38

df.sample( n=5) and df.sample(frac =.25)

Answer 39

df.nlargest(5,"Revenue") or df ["Revenue"].nlargest(5) The same syntax cane be used for the nsamllest() as well

Answer 40

all methods must be prefixed with the .str. name eg df [" Name"].str.len() df [" Name"].str.upper() df [" Name"].str.lower() df [" Name"].str.title()

Answer 41

df ["Team"] = df ["Team"].str.replace( " Mkt", " Marketing")

Answer 42

df ["Name"].str.lower().str.contains("john") returns all rows where Name contains 'john' irrespective of the position df ["Name"].str.lower().str.startswith("john") returns all rows where Name begins with 'john' df ["Name"].str.lower().str.endsswith("john") returns all rows where Name endss with 'john'

Answer 43

Removes spaces from left and right,left and right of a string

Answer 44

String methods are called in the same way on the index and columns as well. eg. df.index.str.upper() and df.columns.str.upper()

Answer 45

df ["Name"].str.split(", ").str.get(0).value\_counts().head()

Answer 46

df["Name"].str.split(",").str.get(1).str.strip().str.split(" ").str.get(0).value\_counts().head(10)

Answer 47

the str.split( expand = True,n=2 ) has a parameter expand when set to True returns a dataframe n determines the number of splits

Answer 48

x=df["NM"].tolist() y=df["NM"].to\_frame()

Answer 49

df.to\_csv("Tial and Error",index=False,Columns=["BRTH\_YR","NM"]) index=False does not copy the index Columns=[] allows you to copy only certain columns if you so desire

Answer 50

df= pd.read\_excel('C:/Users/SHAWN/Desktop/Python Pandas/Data - Multiple Worksheets.xlsx',sheetname=None). The resulting output will be a dictionary.

Answer 51

df.set\_index( ["Date","Country"],inplace=True) OR You can do it directly while importing the csv file like this df= pd.read\_csv('C:/Users/SHAWN/Desktop/bigmac.csv',index\_col= ["Date", "Country"] )

Answer 52

df.index.get\_level\_values(0) or df.index.get\_level\_values("Date")

Answer 53

df.index.set\_index( ["Day", "Location"] ) Tip: Assume you want the first index to stay the same but change the second level,then just pass the same index name in the arguments.

Answer 54

df.loc [( " 2016-10-10", " China)] for a multi index the .loc [] accepts a tupule as an argument

Answer 55

df.transpose()

Answer 56

df.swaplevel()

Answer 57

stack() takes the columns and stacks the columns as rows. unstack() unstacks the rows and makes them columns

Answer 58

Group=df.groupby( “Dept.”) The groupby () creates a separate groupby object. Groupby by itself is meaning less until you call methods on it.

Answer 59

It returns the index value of all of the rows that fall within each group

Answer 60

G1= df.groupby(“Dept”) G1.get\_group(“Marketing”)

Answer 61

* Returns sum of ‘Revenue’ column for N groups present in sectors * Returns Max of ‘Profits’ column for N groups present in sectors * Returns Min of ‘Profits’ column for N groups present in sectors * Returns average no of ‘Employees’ column for N groups present in sectors * This is how you choose more than one column and return their sum.

Answer 62

sectors = df.groupby(["Sector", "Industry"])

Answer 63

There are 2 ways to use the .agg() by : * Passing a dictionary as a parameter * Passing a list as a parameter sectors.agg ({"Revenue" : ["sum", "mean"], "Profits" : "sum", "Employees" : "mean"}) and sectors.agg(["size", "sum", "mean"])

Python Pandas - Udemy Flashcards

(91 cards)