Python Pandas - Udemy Flashcards
What are Variables?
Placeholders
Are the terms list and array the same in python?
True! Yes!
What does len(df) return?
The number of elements in the list or array
What is a Dictionary?
A data type that stores keys and corresponding values.
A dictionary is represented by { }
What is a Series?
A series is a one dimensional labeled array
How do you convert a list to a series object?
pd.Series(list)
What is the difference between a list and series?
The index of a list can be only numeriv values and the index of a series can be abything you like it to be.
what does series.values give us?
All the values in the series as an array
What does series. index give us?
The index of the series
What do the
series. sum()
series. product()
series. mean()
return?
the
sum
product
mean
of the series
What does pd.read_csv(usecols=’abc’, squeeze=True) do?
It selects a single column ‘abc’ from a dataframe and converts it into a series.
What does x=df.head() or df.tail() do?
head() or tail() methods actually create a new series from the original dataframe so the variable ‘X’ will contain the new series
what does dir(s) do? ( where ‘s’ is a series)
gives you a list of attributes and methods available with that series.
what does sort ( series ) do?
sort all the values in the series in ascending order
what does
list(series)
dict(series)
do?
list(series) turns the series into a list
dict(series) turns the series into a dictionary
what does
series.is_unique
do?
returns True or False to show if all values in the series are unique
What does
series.sort_values do?
sorts the series in ascending order and returns a brand new series. You can also run it’s own methods on the newly returned series
eg. series.sort_values().head() will return the top 5 values of the newly created series.
What does the inplace=True parameter do?
makes changes to the series in place
What does the statement:
‘abc’ in series do?
returns a boolean value by checking for ‘abc’ in the index of the series. If you want to check for ‘abc’ in the values of the series you must use:
‘abc’ in series.values
what does
series[-30 : - 10]
return?
returns all the values from the -30 to the -10 position.
What is the difference between
len(series) and series.count() ?
len(series) returns the length of the series including the rows having nan values.
series.count() only returns a count of the rows that have values and excludes rows that have NANs
Good to rememember:
What are some of the mathematical functions available with series?
series. sum()
series. mean()
series. std()
series. median()
series. describe()
What does
series. idxmax()
series. idxmin()
retuen?
returns the index of the position that holds the min and max values in the series.
Nice way of using this is:
series[series.idxmax()]
will return the same value as
series.max()
What does the
series.values_counts()
do?
returns the number of times all the unique values occur.
series.value_counts().sum()
will retutn the lenght of the string same as len(series).
Good to remember the value_counts() has the ascending=True/False parameter
What does series.apply()
do?
series.apply() accepts a function as a parameter and then applies that function to all the values in the series.
eg:
series.apply( lambda stockprice : stockprice + 1)
What does the
series.map()
do?
performs a v lookup type function on 2 seperate series.
I need to explore this further
True or False:
The index labels in a panda Series must be unique
False
What are pandas DataFrames?
DataFrames are 2 dimensional array. What does 2 dimensions mean : it means you need 2 pieces of info. to access a particular value i.e row and column #
A csv file contains integer values but when you read it into a dataframe it shows up as a float….When?
If some of the values in the columns are NANs pandas DataFrames converts the entire column into Floats…reason not yet known
What does
df.info()
return?
Basic info about the dataframe as well as the number of non null values in each column.
What does
df.axes()
return?
returns the combined result of
df.index() and df.columns
df.sum(axis=1)
or
df.sum(axis=”columns”)
return the horizontal left to right total of a dataframe
what does
df.sum(axis=1)
return?
How to extract a single column ‘abc’ from a Dataframe df?
df [“abc”]
this command returns a series
How do you extract multiple columns from a DataFrame?
df [[“abc”,”def”] ]
or
select = [“abc”,”def”]
df [select]
both the above return the same resulting DataFrame
How do you insert a new column ‘Sport’ in a Dataframe?
df [“Sport”] = “ Basket Ball”
inserts the column Sport at the end of the DataFrame and populates all rows with the value ‘Basket Ball’
df.insert( 5, column = “Sport”, value= “Basket Ball”)
this inserts the ‘Sport’ column in the 5’th position with the value ‘Basket Ball’ in all rows
How do you add 20 to every value in the column ‘Salary’ of a dataframe?
df [“Salary”].add(20)
or
df [“Salary”] + 20
These are called Broadcast methods and can be used with all the other mathematical functions as well.