Pandas Flashcards
pandas
The most popular library for data analysis in python
What are the two core objects in pandas?
DataFrame
Series
Line of code to import pandas
import pandas as pd
DataFrame
A DataFrame is a table, it contains an array of individual entries each of which has a certain value and corresponds to a row and a column
It can be thought of as a bunch of Series joined together
pd.DataFrame({‘Yes’: [50, 21], ‘No’: [131, 2]})
Yes No 0 50 131 1 21 2
pd.DataFrame()
This is a standard method of producing dataframes
Within the brackets you place a dictionary in which the column names are the keys and whose values are the entries
The list of row labels is known as an index, and it can be used in the dictionary to outline the names of the rows
pd.DataFrame({‘Bob’: [‘I liked it.’, ‘It was awful.’],
‘Sue’: [‘Pretty good.’, ‘Bland.’]},
index=[‘Product A’, ‘Product B’])
Bob Sue
Product A I liked it. Pretty good.
Product B It was awful. Bland.
Series
A sequence of data values
If a DataFrame is a table, a Series is a list. It is in essence a single column of a DataFrame
You can assign row names using the same method as with a DataFrame and the single column name can be assigned using name
e.g. pd.Series([1, 2, 3], index = [‘Mon’, ‘Tue’, ‘Wed’], name = Date)
Reading a dataset into a DataFrame and checking the shape of the data
we use pd.read_csv(file_name)
we then use the .shape attribute to show the shape of the data dataframename.shape
This will return the tuple in the form (number of rows, number of columns)
DataFrameName.head()
Shows the first 5 rows of the DataFrame
Saving a DataFrame as a csv
DataFrameName.to_csv(‘file_name.csv’)
How to access the data held in a column of a dataframe? / How to access a series of a dataframe?
We can access it the same way way we would for values in dictionary e.g. DataFrameName[‘column_name’]
Or we can use dot notation, which is like accessing an attribute/property of a class
DataFrameName.column_name
Accessing an individual data point from a DataFrame?
We use chaining of indices
e.g. DataFrameName[‘column_name’][index number]
We are following the order column first, row second
Using iloc
This is one of pandas methods of retrieving data or columns from a dataframe, they work in the opposite way - row first, column second
It uses indices the same way python normally uses indices
e.g. DataFrame.iloc[row index, column index]
we can still use index slicing etc and we can also pass a list of indices
Using loc
Similar to iloc and again is a way of accessing data but it is a bit simpler
DataFrameName.loc[row number, column_name]
loc uses indexes differently, it uses them inclusively e.g. [0:10] would return all the rows from 0 to 10 including 0 and 10
Manipulating the index by setting the rows of the first column the index
We can use the set_title() method
e.g. DataFrameName.set_title(‘column_name’, inplace = True)
making inplace = True ensure this method makes changes to the dataset
this method will make the rows of the column selected, the new indices
Using pipe | and ampersand for conditional selection
& is used when we are trying to select with this property AND that property
is used when we are saying we want to select data that has this property OR that property