3.1 Dataframe Basics Flashcards
What is “pandas”?
Pandas is a third‑party Python library that gives you types and
functions for working with tabular data.
What is “DataFrame”
DataFrame, is a container type like a list or dict and holds a single data table.
One column of a Data Frame is its own type, called a Series, which you’ll also sometimes use.
Why does “path” have to be imported?
Though path is part of the standard library (i.e. no third party installation necessary), we still have to import it in order to use it.
How do you import “pandas”?
“import pandas as pd”
When importing Pandas, the convention is to import it under the name pd.
This lets us use any Pandas function by calling pd. (i.e. pd dot — type the period) and the name of our function.
What is the function used to read a CSV file?
“pd.read_csv”
What path method can be used as a clean way of reading files?
First grab the path to the folder where files are that you want to load in, and put this in a variable (eg. DATA_DIR).
Then use the path.join() method, which allows you to append the DATA_DIR path and add a single string (eg. ‘shots.csv’) - which is the name of the particular file you want to open.
How do you check the “type”?
print(type())
What method allows you to print the first rows of data (default = 5)?
shots.head() = first 5
shots.head(x) = first X
(where “shots” is the var where data is loaded)
What method allows you to output all the columns?
print(shots.columns)
(where “shots” is the var where data is loaded)
What method allows you to output the number of rows and columns?
print(shots.shape)
(where “shots” is the var where the data is loaded)
How do you refer a single column in a dataframe?
print(shots[‘name’].head())
where ‘name’ is a column, and “shots” is the var where the dara is loaded
Referring to a single column in a DataFrame is similar to returning a value from a dictionary, you put
the name of the column (usually a string) in brackets.
What is the type() when you retrieve a single column from a DataFrame?
A single column is a Series, not a DataFrame (quite technical).
Can check by using type(shots[‘name’)]
Where ‘name’ is a single column in the dataframe “shots”
How can a series be turned into a one-column DataFrame?
Calling the to_frame method will turn any Series into a one‑column DataFrame
In:
type(shots[‘name’].to_frame().head())
Out:
pandas.core.frame.DataFrame
How do you refer multiple columns in a DataFrame?
To refer to multiple columns in a DataFrame, you pass it a list. The result — unlike the single column
case —is another DataFrame.
shots[[‘name’, ‘foot’, ‘goal’, ‘period’]].head()
Where ‘name’, ‘foot’, ‘goal’ and ‘period’ are columns in the dataframe “shots”
What is important to remember when calling a list of columns from a DataFrame?
One column:
shots[‘name’]
Multiple columns:
shots[[‘name’, ‘foot’, ‘goal’, ‘period’]]
Multiple columns has double brackets as you’re putting a list with your column names inside another pair of brackets