Python Pandas - Udemy Flashcards

1
Q

What are Variables?

A

Placeholders

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Are the terms list and array the same in python?

A

True! Yes!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does len(df) return?

A

The number of elements in the list or array

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Dictionary?

A

A data type that stores keys and corresponding values.

A dictionary is represented by { }

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Series?

A

A series is a one dimensional labeled array

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you convert a list to a series object?

A

pd.Series(list)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between a list and series?

A

The index of a list can be only numeriv values and the index of a series can be abything you like it to be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what does series.values give us?

A

All the values in the series as an array

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does series. index give us?

A

The index of the series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do the

series. sum()
series. product()
series. mean()

return?

A

the

sum

product

mean

of the series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does pd.read_csv(usecols=’abc’, squeeze=True) do?

A

It selects a single column ‘abc’ from a dataframe and converts it into a series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does x=df.head() or df.tail() do?

A

head() or tail() methods actually create a new series from the original dataframe so the variable ‘X’ will contain the new series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does dir(s) do? ( where ‘s’ is a series)

A

gives you a list of attributes and methods available with that series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does sort ( series ) do?

A

sort all the values in the series in ascending order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does

list(series)

dict(series)

do?

A

list(series) turns the series into a list

dict(series) turns the series into a dictionary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does

series.is_unique

do?

A

returns True or False to show if all values in the series are unique

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does

series.sort_values do?

A

sorts the series in ascending order and returns a brand new series. You can also run it’s own methods on the newly returned series

eg. series.sort_values().head() will return the top 5 values of the newly created series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the inplace=True parameter do?

A

makes changes to the series in place

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does the statement:

‘abc’ in series do?

A

returns a boolean value by checking for ‘abc’ in the index of the series. If you want to check for ‘abc’ in the values of the series you must use:

‘abc’ in series.values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what does

series[-30 : - 10]

return?

A

returns all the values from the -30 to the -10 position.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the difference between

len(series) and series.count() ?

A

len(series) returns the length of the series including the rows having nan values.

series.count() only returns a count of the rows that have values and excludes rows that have NANs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Good to rememember:

What are some of the mathematical functions available with series?

A

series. sum()
series. mean()
series. std()
series. median()
series. describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does

series. idxmax()
series. idxmin()

retuen?

A

returns the index of the position that holds the min and max values in the series.

Nice way of using this is:

series[series.idxmax()]

will return the same value as

series.max()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does the

series.values_counts()

do?

A

returns the number of times all the unique values occur.

series.value_counts().sum()

will retutn the lenght of the string same as len(series).

Good to remember the value_counts() has the ascending=True/False parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does series.apply() do?
series.apply() accepts a function as a parameter and then applies that function to all the values in the series. eg: series.apply( lambda stockprice : stockprice + 1)
26
What does the series.map() do?
performs a v lookup type function on 2 seperate series. I need to explore this further
27
True or False: The index labels in a panda Series must be unique
False
28
What are pandas DataFrames?
DataFrames are 2 dimensional array. What does 2 dimensions mean : it means you need 2 pieces of info. to access a particular value i.e row and column #
29
A csv file contains integer values but when you read it into a dataframe it shows up as a float....When?
If some of the values in the columns are NANs pandas DataFrames converts the entire column into Floats...reason not yet known
30
What does df.info() return?
Basic info about the dataframe as well as the number of non null values in each column.
31
What does df.axes() return?
returns the combined result of df.index() and df.columns
32
df.sum(axis=1) or df.sum(axis="columns") return the horizontal left to right total of a dataframe
what does df.sum(axis=1) return?
33
How to extract a single column 'abc' from a Dataframe df?
df ["abc"] this command returns a series
34
How do you extract multiple columns from a DataFrame?
df [["abc","def"] ] or select = ["abc","def"] df [select] both the above return the same resulting DataFrame
35
How do you insert a new column 'Sport' in a Dataframe?
**df ["Sport"] = " Basket Ball"** inserts the column Sport at the end of the DataFrame and populates all rows with the value 'Basket Ball' **df.insert( 5, column = "Sport", value= "Basket Ball")** this inserts the 'Sport' column in the 5'th position with the value 'Basket Ball' in all rows
36
How do you add 20 to every value in the column 'Salary' of a dataframe?
df ["Salary"].add(20) or df ["Salary"] + 20 These are called Broadcast methods and can be used with all the other mathematical functions as well.
37
How do you use value\_counts() with a DataFrame?
df ["abc"].value\_counts().head(). Imp: value\_counts() can only be used on series objects
38
How do you remove rows with null values in a DataFrame?
df.dropna() by default this method will remove all rows even with a single nan value in the columns
39
How do you remove a row from a DataFrame only when a particular column has a Nan value?
df.dropna( subset= ["column name", "column name 2"] )
40
How do you replace a nan value with a particular value in a column of a DataFrame?
df [" abc"].fillna( "Hello", inplace=True) This will fill all NaN values in the 'abc' column with the string "Hello"
41
How do your convert the 'Salary' column from float to intiger in a DataFrame?
df ["Salary"] .astype ("int") * Must remember that all NaNs must be removed or replaced for this method to work * There is no inplace parameter so you must assign value to a variable for the change to be permanent
42
How do you sort a dataframe?
An entire DataFrame can be sorted only by a particular column. df.sort\_values("Salary") If the column has NaN values they will be at the end of the dataframe or will occupy the last position.
43
How do you do a sort on multiple columns in a dataframe?
df.sort\_values( [" col 1" , " col 2"], ascending = [True , False] ) This sorts the dataframe first based on col 1 and then col 2. Col 1 in ascending order and col 2 in descending order
44
How do you convert a string to a Date type?
df ["String\_Date"] = pd.to\_datetime (df ["String\_Date"])
45
How do you convert a string type to a category type?
df[" Management"] = df [" Management"].astype("Category")
46
How do you filter a datframe so that only the columns where gender = 'Male' is returned as a dataframe?
df [df ["Gender"] == " Male"] or filter = df [" Gender"] = " Male" df [filter]
47
How to filter a dataframe using more than one condition eg Gender = Male Team = Marketing ?
filter 1 = df [" Gender"] == ' Male' filter 2 = df [" Team"] == ' Marketing' df [filter 1 & filter 2]
48
Write code to filter a dataframe where Team = ' Legal', 'Marketing' or 'Sales'
filter = df [" Team "].isin( [" Legal" , " Sales", " Marketing "] ) df [filter] You can also pass a series into the isin() method eg. df [" Team"].isin ( df2 [" Team"] )
49
What do the isnull() and notnull() methods do?
isnull() returns True if a given column is a NaN else False. notnull() returns True if a given column is not a Nan else False.
50
Write code to filter Salary \>= 60,000 and \<= 70,000?
df [" Salary"].between( 60000, 70000) or x= df [" Salary"] \> = 60000 y= df [" Salary"] \< = 70000 df [x & y]
51
What does the ~ symbol do?
It returns the reverse of a Boolean value. i.e. True becomes False False becomes True
52
What does df [" Name"].duplicated( ) return?
Returns the boolean value True for all duplicate values of the Name column except for the first occurance which returns False. If the are 4 Toms it will returns 1 False and 3 Trues
53
Remove duplicate valued from dataframe where Name and Team are duplicates
df.drop\_duplicates (subset= [" Name", " Team"],keep=False,inplace=True)
54
What do the unique( ) and nunique ( ) do?
unique () returns an array of unique values that will also count NaN as unique. nunique( ) will return an inter of the count of unique values. This will not count the NaN as the parameter dropna=True is set by default
55
How do you set a particular column as the index of a dataframe?
df.set\_index( "Col\_name") to reverse change df.reset\_index()
56
What does the df.loc[] method do?
extract rows using index labels
57
Extract rows from a dataframe between index 18 and 35?
df.iloc [18 : 36] note index 36 will not be returned in iloc[]
58
What is the df.ix[] method ?
It is a combination of the iloc[] and the loc[] methods. It accepts both string labels as well as integer indexes as arguments. Note : When using labels in ix[] and you specify a range or a list and one of the labels does not exist in the dataframe python returns a NaN value for the missing label. BUT When using index values in ix[] and you specify a range or a list and one of the indeces does not exist in the dataframe python returns an error value for the entire query.
59
How do you write a value to a given row and column in a dataframe using the ix[]?
df.ix["James", "Salary"] = 80000 This changes the James row and Salary column to 80000
60
filter = df [" Team"] == "Marketing" df.ix [filter, "Team"] = " Online Marketing" What does this piece of code do?
Finds all instances where Team = Marketing and then replaces ' Marketing' with ' Online Marketing' in the dataframe
61
How do you change the name of columns in a dataframe?
df.rename ( { " Team":"Dept", " Salary": " Compensation"}, inplace=True) The rename ( ) accepts a dictionary as a parameter.
62
What are the 3 methods to delete columns from a dataframe?
df.drop( " Team",inplace=True) or df.pop( "Team") This method removes "Team" from the dataframe and returns the column team as a series. or del df.Team
63
How do you extract 5 random rows from your dataset? Also how do you extract 25% of your data set randomly?
df.sample( n=5) and df.sample(frac =.25)
64
How to find the 5 highest values in the 'Revenue' column without using sort method?
df.nlargest(5,"Revenue") or df ["Revenue"].nlargest(5) The same syntax cane be used for the nsamllest() as well
65
How do you use the string methods on a column of a dataframe?
all methods must be prefixed with the .str. name eg df [" Name"].str.len() df [" Name"].str.upper() df [" Name"].str.lower() df [" Name"].str.title()
66
Write code to replace 'Mkt' with 'Marketing' in the 'Team' Column?
df ["Team"] = df ["Team"].str.replace( " Mkt", " Marketing")
67
What do the following methods do? 1. str.contains() 2. str.startswith() 3. str.endswith()
df ["Name"].str.lower().str.contains("john") returns all rows where Name contains 'john' irrespective of the position df ["Name"].str.lower().str.startswith("john") returns all rows where Name begins with 'john' df ["Name"].str.lower().str.endsswith("john") returns all rows where Name endss with 'john'
68
What does .str.strip() .str.lstrip() .str.rstrip() do?
Removes spaces from left and right,left and right of a string
69
Give an example each of using the string methods on the index and columns of a dataframe?
String methods are called in the same way on the index and columns as well. eg. df.index.str.upper() and df.columns.str.upper()
70
How do you extract the last name from the ' Name' column that has both last name and first name and is seperated by a space?
df ["Name"].str.split(", ").str.get(0).value\_counts().head()
71
Write code to extract the first name from the 'Name' column of a dataframe?
df["Name"].str.split(",").str.get(1).str.strip().str.split(" ").str.get(0).value\_counts().head(10)
72
Good to remember about the str.split( )
the str.split( expand = True,n=2 ) has a parameter expand when set to True returns a dataframe n determines the number of splits
73
How do you convert a series into a list and a dataframe?
x=df["NM"].tolist() y=df["NM"].to\_frame()
74
How do you export a dataframe to a csv file?
df.to\_csv("Tial and Error",index=False,Columns=["BRTH\_YR","NM"]) index=False does not copy the index Columns=[] allows you to copy only certain columns if you so desire
75
How do you read an excel file with multiple worksheets?
df= pd.read\_excel('C:/Users/SHAWN/Desktop/Python Pandas/Data - Multiple Worksheets.xlsx',sheetname=None). The resulting output will be a dictionary.
76
How do you set multiple indexes to a dataframe?
df.set\_index( ["Date","Country"],inplace=True) OR You can do it directly while importing the csv file like this df= pd.read\_csv('C:/Users/SHAWN/Desktop/bigmac.csv',index\_col= ["Date", "Country"] )
77
How do you access the values in a multi index dataframe?
df.index.get\_level\_values(0) or df.index.get\_level\_values("Date")
78
How do you change the name of an index in an multi level dataframe?
df.index.set\_index( ["Day", "Location"] ) Tip: Assume you want the first index to stay the same but change the second level,then just pass the same index name in the arguments.
79
How to extract a row from a multi index dataframe?
df.loc [( " 2016-10-10", " China)] for a multi index the .loc [] accepts a tupule as an argument
80
How do you interchange the rows and columns in a dataframe?
df.transpose()
81
How do you swap the index levels in a multi index dataframe?
df.swaplevel()
82
What do the stack() and unstack() methods?
stack() takes the columns and stacks the columns as rows. unstack() unstacks the rows and makes them columns
83
How do you use the groupby () on a dataframe and group by department?
Group=df.groupby( “Dept.”) The groupby () creates a separate groupby object. Groupby by itself is meaning less until you call methods on it.
84
How do you find out the number of dataframes in a group called G1?
len(G1)
85
How do you find the number of rows within each group in G1?
g1.size()
86
What does the following command return G1.groups where G1 is a group dataframe?
It returns the index value of all of the rows that fall within each group
87
How do you extract all the rows from the ‘Marketing’ department?
G1= df.groupby(“Dept”) G1.get\_group(“Marketing”)
88
What do the following methods do where ‘sectors’ is the group object? sectors["Revenue"].sum() sectors["Profits"].max() sectors["Profits"].min() sectors["Employees"].mean() sectors[["Revenue", "Profits"]].sum()
* Returns sum of ‘Revenue’ column for N groups present in sectors * Returns Max of ‘Profits’ column for N groups present in sectors * Returns Min of ‘Profits’ column for N groups present in sectors * Returns average no of ‘Employees’ column for N groups present in sectors * This is how you choose more than one column and return their sum.
89
How do you group by multiple columns?
sectors = df.groupby(["Sector", "Industry"])
90
What are the 2 ways to use the .agg() with the groupby object? sectors.agg(["size", "sum", "mean"])
There are 2 ways to use the .agg() by : * Passing a dictionary as a parameter * Passing a list as a parameter sectors.agg ({"Revenue" : ["sum", "mean"], "Profits" : "sum", "Employees" : "mean"}) and sectors.agg(["size", "sum", "mean"])
91
fortune = pd.read\_csv("fortune1000.csv", index\_col = "Rank") sectors = fortune.groupby("Sector") fortune.head(3) for sector, data in sectors: highest\_revenue\_company\_in\_group = data.nlargest(1, "Revenue") df = df.append(highest\_revenue\_company\_in\_group **What does this code accomplish?**