Python Pandas - Udemy Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What are Variables?

A

Placeholders

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Are the terms list and array the same in python?

A

True! Yes!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does len(df) return?

A

The number of elements in the list or array

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Dictionary?

A

A data type that stores keys and corresponding values.

A dictionary is represented by { }

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Series?

A

A series is a one dimensional labeled array

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you convert a list to a series object?

A

pd.Series(list)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between a list and series?

A

The index of a list can be only numeriv values and the index of a series can be abything you like it to be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what does series.values give us?

A

All the values in the series as an array

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does series. index give us?

A

The index of the series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do the

series. sum()
series. product()
series. mean()

return?

A

the

sum

product

mean

of the series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does pd.read_csv(usecols=’abc’, squeeze=True) do?

A

It selects a single column ‘abc’ from a dataframe and converts it into a series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does x=df.head() or df.tail() do?

A

head() or tail() methods actually create a new series from the original dataframe so the variable ‘X’ will contain the new series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does dir(s) do? ( where ‘s’ is a series)

A

gives you a list of attributes and methods available with that series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does sort ( series ) do?

A

sort all the values in the series in ascending order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does

list(series)

dict(series)

do?

A

list(series) turns the series into a list

dict(series) turns the series into a dictionary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does

series.is_unique

do?

A

returns True or False to show if all values in the series are unique

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does

series.sort_values do?

A

sorts the series in ascending order and returns a brand new series. You can also run it’s own methods on the newly returned series

eg. series.sort_values().head() will return the top 5 values of the newly created series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the inplace=True parameter do?

A

makes changes to the series in place

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does the statement:

‘abc’ in series do?

A

returns a boolean value by checking for ‘abc’ in the index of the series. If you want to check for ‘abc’ in the values of the series you must use:

‘abc’ in series.values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what does

series[-30 : - 10]

return?

A

returns all the values from the -30 to the -10 position.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the difference between

len(series) and series.count() ?

A

len(series) returns the length of the series including the rows having nan values.

series.count() only returns a count of the rows that have values and excludes rows that have NANs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Good to rememember:

What are some of the mathematical functions available with series?

A

series. sum()
series. mean()
series. std()
series. median()
series. describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does

series. idxmax()
series. idxmin()

retuen?

A

returns the index of the position that holds the min and max values in the series.

Nice way of using this is:

series[series.idxmax()]

will return the same value as

series.max()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does the

series.values_counts()

do?

A

returns the number of times all the unique values occur.

series.value_counts().sum()

will retutn the lenght of the string same as len(series).

Good to remember the value_counts() has the ascending=True/False parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does series.apply()

do?

A

series.apply() accepts a function as a parameter and then applies that function to all the values in the series.

eg:

series.apply( lambda stockprice : stockprice + 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What does the

series.map()

do?

A

performs a v lookup type function on 2 seperate series.

I need to explore this further

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

True or False:

The index labels in a panda Series must be unique

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are pandas DataFrames?

A

DataFrames are 2 dimensional array. What does 2 dimensions mean : it means you need 2 pieces of info. to access a particular value i.e row and column #

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

A csv file contains integer values but when you read it into a dataframe it shows up as a float….When?

A

If some of the values in the columns are NANs pandas DataFrames converts the entire column into Floats…reason not yet known

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does

df.info()

return?

A

Basic info about the dataframe as well as the number of non null values in each column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does

df.axes()

return?

A

returns the combined result of

df.index() and df.columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

df.sum(axis=1)

or

df.sum(axis=”columns”)

return the horizontal left to right total of a dataframe

A

what does

df.sum(axis=1)

return?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How to extract a single column ‘abc’ from a Dataframe df?

A

df [“abc”]

this command returns a series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How do you extract multiple columns from a DataFrame?

A

df [[“abc”,”def”] ]

or

select = [“abc”,”def”]

df [select]

both the above return the same resulting DataFrame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How do you insert a new column ‘Sport’ in a Dataframe?

A

df [“Sport”] = “ Basket Ball”

inserts the column Sport at the end of the DataFrame and populates all rows with the value ‘Basket Ball’

df.insert( 5, column = “Sport”, value= “Basket Ball”)

this inserts the ‘Sport’ column in the 5’th position with the value ‘Basket Ball’ in all rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How do you add 20 to every value in the column ‘Salary’ of a dataframe?

A

df [“Salary”].add(20)

or

df [“Salary”] + 20

These are called Broadcast methods and can be used with all the other mathematical functions as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How do you use value_counts() with a DataFrame?

A

df [“abc”].value_counts().head().

Imp: value_counts() can only be used on series objects

38
Q

How do you remove rows with null values in a DataFrame?

A

df.dropna()

by default this method will remove all rows even with a single nan value in the columns

39
Q

How do you remove a row from a DataFrame only when a particular column has a Nan value?

A

df.dropna( subset= [“column name”, “column name 2”] )

40
Q

How do you replace a nan value with a particular value in a column of a DataFrame?

A

df [” abc”].fillna( “Hello”, inplace=True)

This will fill all NaN values in the ‘abc’ column with the string “Hello”

41
Q

How do your convert the ‘Salary’ column from float to intiger in a DataFrame?

A

df [“Salary”] .astype (“int”)

  • Must remember that all NaNs must be removed or replaced for this method to work
    • There is no inplace parameter so you must assign value to a variable for the change to be permanent
42
Q

How do you sort a dataframe?

A

An entire DataFrame can be sorted only by a particular column.

df.sort_values(“Salary”)

If the column has NaN values they will be at the end of the dataframe or will occupy the last position.

43
Q

How do you do a sort on multiple columns in a dataframe?

A

df.sort_values( [” col 1” , “ col 2”], ascending = [True , False] )

This sorts the dataframe first based on col 1 and then col 2. Col 1 in ascending order and col 2 in descending order

44
Q

How do you convert a string to a Date type?

A

df [“String_Date”] = pd.to_datetime (df [“String_Date”])

45
Q

How do you convert a string type to a category type?

A

df[” Management”] = df [” Management”].astype(“Category”)

46
Q

How do you filter a datframe so that only the columns where gender = ‘Male’ is returned as a dataframe?

A

df [df [“Gender”] == “ Male”]

or

filter = df [” Gender”] = “ Male”

df [filter]

47
Q

How to filter a dataframe using more than one condition eg

Gender = Male

Team = Marketing

?

A

filter 1 = df [” Gender”] == ‘ Male’

filter 2 = df [” Team”] == ‘ Marketing’

df [filter 1 & filter 2]

48
Q

Write code to filter a dataframe where Team = ‘ Legal’, ‘Marketing’ or ‘Sales’

A

filter = df [” Team “].isin( [” Legal” , “ Sales”, “ Marketing “] )

df [filter]

You can also pass a series into the isin() method

eg. df [” Team”].isin ( df2 [” Team”] )

49
Q

What do the isnull() and notnull() methods do?

A

isnull() returns True if a given column is a NaN else False.

notnull() returns True if a given column is not a Nan else False.

50
Q

Write code to filter Salary >= 60,000 and <= 70,000?

A

df [” Salary”].between( 60000, 70000)

or

x= df [” Salary”] > = 60000

y= df [” Salary”]

df [x & y]

51
Q

What does the ~ symbol do?

A

It returns the reverse of a Boolean value.

i.e. True becomes False

False becomes True

52
Q

What does df [” Name”].duplicated( ) return?

A

Returns the boolean value True for all duplicate values of the Name column except for the first occurance which returns False.

If the are 4 Toms it will returns 1 False and 3 Trues

53
Q

Remove duplicate valued from dataframe where Name and Team are duplicates

A

df.drop_duplicates (subset= [” Name”, “ Team”],keep=False,inplace=True)

54
Q

What do the unique( ) and nunique ( ) do?

A

unique () returns an array of unique values that will also count NaN as unique.

nunique( ) will return an inter of the count of unique values. This will not count the NaN as the parameter dropna=True is set by default

55
Q

How do you set a particular column as the index of a dataframe?

A

df.set_index( “Col_name”)

to reverse change

df.reset_index()

56
Q

What does the df.loc[] method do?

A

extract rows using index labels

57
Q

Extract rows from a dataframe between index 18 and 35?

A

df.iloc [18 : 36]

note index 36 will not be returned in iloc[]

58
Q

What is the df.ix[] method ?

A

It is a combination of the iloc[] and the loc[] methods. It accepts both string labels as well as integer indexes as arguments.

Note :

When using labels in ix[] and you specify a range or a list and one of the labels does not exist in the dataframe python returns a NaN value for the missing label.

BUT

When using index values in ix[] and you specify a range or a list and one of the indeces does not exist in the dataframe python returns an error value for the entire query.

59
Q

How do you write a value to a given row and column in a dataframe using the ix[]?

A

df.ix[“James”, “Salary”] = 80000

This changes the James row and Salary column to 80000

60
Q

filter = df [” Team”] == “Marketing”

df.ix [filter, “Team”] = “ Online Marketing”

What does this piece of code do?

A

Finds all instances where Team = Marketing and then replaces ‘ Marketing’ with ‘ Online Marketing’ in the dataframe

61
Q

How do you change the name of columns in a dataframe?

A

df.rename ( { “ Team”:”Dept”, “ Salary”: “ Compensation”}, inplace=True)

The rename ( ) accepts a dictionary as a parameter.

62
Q

What are the 3 methods to delete columns from a dataframe?

A

df.drop( “ Team”,inplace=True)

or

df.pop( “Team”)

This method removes “Team” from the dataframe and returns the column team as a series.

or

del df.Team

63
Q

How do you extract 5 random rows from your dataset? Also how do you extract 25% of your data set randomly?

A

df.sample( n=5)

and

df.sample(frac =.25)

64
Q

How to find the 5 highest values in the ‘Revenue’ column without using sort method?

A

df.nlargest(5,”Revenue”)

or

df [“Revenue”].nlargest(5)

The same syntax cane be used for the nsamllest() as well

65
Q

How do you use the string methods on a column of a dataframe?

A

all methods must be prefixed with the .str. name

eg

df [” Name”].str.len()

df [” Name”].str.upper()

df [” Name”].str.lower()

df [” Name”].str.title()

66
Q

Write code to replace ‘Mkt’ with ‘Marketing’ in the ‘Team’ Column?

A

df [“Team”] = df [“Team”].str.replace( “ Mkt”, “ Marketing”)

67
Q

What do the following methods do?

  1. str.contains()
  2. str.startswith()
  3. str.endswith()
A

df [“Name”].str.lower().str.contains(“john”)

returns all rows where Name contains ‘john’ irrespective of the position

df [“Name”].str.lower().str.startswith(“john”)

returns all rows where Name begins with ‘john’

df [“Name”].str.lower().str.endsswith(“john”)

returns all rows where Name endss with ‘john’

68
Q

What does

.str.strip()

.str.lstrip()

.str.rstrip()

do?

A

Removes spaces from left and right,left and right of a string

69
Q

Give an example each of using the string methods on the index and columns of a dataframe?

A

String methods are called in the same way on the index and columns as well.

eg.

df.index.str.upper()

and

df.columns.str.upper()

70
Q

How do you extract the last name from the ‘ Name’ column that has both last name and first name and is seperated by a space?

A

df [“Name”].str.split(“, “).str.get(0).value_counts().head()

71
Q

Write code to extract the first name from the ‘Name’ column of a dataframe?

A

df[“Name”].str.split(“,”).str.get(1).str.strip().str.split(“ “).str.get(0).value_counts().head(10)

72
Q

Good to remember about the str.split( )

A

the str.split( expand = True,n=2 )

has a parameter expand when set to True returns a dataframe

n determines the number of splits

73
Q

How do you convert a series into a list and a dataframe?

A

x=df[“NM”].tolist()
y=df[“NM”].to_frame()

74
Q

How do you export a dataframe to a csv file?

A

df.to_csv(“Tial and Error”,index=False,Columns=[“BRTH_YR”,”NM”])

index=False does not copy the index

Columns=[] allows you to copy only certain columns if you so desire

75
Q

How do you read an excel file with multiple worksheets?

A

df= pd.read_excel(‘C:/Users/SHAWN/Desktop/Python Pandas/Data - Multiple Worksheets.xlsx’,sheetname=None).

The resulting output will be a dictionary.

76
Q

How do you set multiple indexes to a dataframe?

A

df.set_index( [“Date”,”Country”],inplace=True)

OR

You can do it directly while importing the csv file like this

df= pd.read_csv(‘C:/Users/SHAWN/Desktop/bigmac.csv’,index_col= [“Date”, “Country”] )

77
Q

How do you access the values in a multi index dataframe?

A

df.index.get_level_values(0)

or

df.index.get_level_values(“Date”)

78
Q

How do you change the name of an index in an multi level dataframe?

A

df.index.set_index( [“Day”, “Location”] )

Tip:

Assume you want the first index to stay the same but change the second level,then just pass the same index name in the arguments.

79
Q

How to extract a row from a multi index dataframe?

A

df.loc [( “ 2016-10-10”, “ China)]

for a multi index the .loc [] accepts a tupule as an argument

80
Q

How do you interchange the rows and columns in a dataframe?

A

df.transpose()

81
Q

How do you swap the index levels in a multi index dataframe?

A

df.swaplevel()

82
Q

What do the stack() and unstack() methods?

A

stack() takes the columns and stacks the columns as rows.

unstack()

unstacks the rows and makes them columns

83
Q

How do you use the groupby () on a dataframe and group by department?

A

Group=df.groupby( “Dept.”)

The groupby () creates a separate groupby object. Groupby by itself is meaning less until you call methods on it.

84
Q

How do you find out the number of dataframes in a group called G1?

A

len(G1)

85
Q

How do you find the number of rows within each group in G1?

A

g1.size()

86
Q

What does the following command return G1.groups where G1 is a group dataframe?

A

It returns the index value of all of the rows that fall within each group

87
Q

How do you extract all the rows from the ‘Marketing’ department?

A

G1= df.groupby(“Dept”)

G1.get_group(“Marketing”)

88
Q

What do the following methods do where ‘sectors’ is the group object?

sectors[“Revenue”].sum()

sectors[“Profits”].max()

sectors[“Profits”].min()

sectors[“Employees”].mean()

sectors[[“Revenue”, “Profits”]].sum()

A
  • Returns sum of ‘Revenue’ column for N groups present in sectors
  • Returns Max of ‘Profits’ column for N groups present in sectors
  • Returns Min of ‘Profits’ column for N groups present in sectors
  • Returns average no of ‘Employees’ column for N groups present in sectors
    • This is how you choose more than one column and return their sum.
89
Q

How do you group by multiple columns?

A

sectors = df.groupby([“Sector”, “Industry”])

90
Q

What are the 2 ways to use the .agg() with the groupby object?

sectors.agg([“size”, “sum”, “mean”])

A

There are 2 ways to use the .agg() by :

  • Passing a dictionary as a parameter
  • Passing a list as a parameter

sectors.agg ({“Revenue” : [“sum”, “mean”],

“Profits” : “sum”,

“Employees” : “mean”})

and

sectors.agg([“size”, “sum”, “mean”])

91
Q

fortune = pd.read_csv(“fortune1000.csv”, index_col = “Rank”)

sectors = fortune.groupby(“Sector”)

fortune.head(3)

for sector, data in sectors:

highest_revenue_company_in_group = data.nlargest(1, “Revenue”)

df = df.append(highest_revenue_company_in_group

What does this code accomplish?

A