Pandas Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

pandas

A

The most popular library for data analysis in python

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two core objects in pandas?

A

DataFrame

Series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Line of code to import pandas

A

import pandas as pd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DataFrame

A

A DataFrame is a table, it contains an array of individual entries each of which has a certain value and corresponds to a row and a column

It can be thought of as a bunch of Series joined together

pd.DataFrame({‘Yes’: [50, 21], ‘No’: [131, 2]})

    Yes	No 0	50	131 1	21	2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

pd.DataFrame()

A

This is a standard method of producing dataframes

Within the brackets you place a dictionary in which the column names are the keys and whose values are the entries

The list of row labels is known as an index, and it can be used in the dictionary to outline the names of the rows

pd.DataFrame({‘Bob’: [‘I liked it.’, ‘It was awful.’],
‘Sue’: [‘Pretty good.’, ‘Bland.’]},
index=[‘Product A’, ‘Product B’])
Bob Sue
Product A I liked it. Pretty good.
Product B It was awful. Bland.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Series

A

A sequence of data values

If a DataFrame is a table, a Series is a list. It is in essence a single column of a DataFrame

You can assign row names using the same method as with a DataFrame and the single column name can be assigned using name

e.g. pd.Series([1, 2, 3], index = [‘Mon’, ‘Tue’, ‘Wed’], name = Date)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Reading a dataset into a DataFrame and checking the shape of the data

A

we use pd.read_csv(file_name)

we then use the .shape attribute to show the shape of the data dataframename.shape

This will return the tuple in the form (number of rows, number of columns)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DataFrameName.head()

A

Shows the first 5 rows of the DataFrame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Saving a DataFrame as a csv

A

DataFrameName.to_csv(‘file_name.csv’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to access the data held in a column of a dataframe? / How to access a series of a dataframe?

A

We can access it the same way way we would for values in dictionary e.g. DataFrameName[‘column_name’]

Or we can use dot notation, which is like accessing an attribute/property of a class

DataFrameName.column_name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Accessing an individual data point from a DataFrame?

A

We use chaining of indices

e.g. DataFrameName[‘column_name’][index number]

We are following the order column first, row second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Using iloc

A

This is one of pandas methods of retrieving data or columns from a dataframe, they work in the opposite way - row first, column second

It uses indices the same way python normally uses indices

e.g. DataFrame.iloc[row index, column index]

we can still use index slicing etc and we can also pass a list of indices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Using loc

A

Similar to iloc and again is a way of accessing data but it is a bit simpler

DataFrameName.loc[row number, column_name]

loc uses indexes differently, it uses them inclusively e.g. [0:10] would return all the rows from 0 to 10 including 0 and 10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Manipulating the index by setting the rows of the first column the index

A

We can use the set_title() method

e.g. DataFrameName.set_title(‘column_name’, inplace = True)

making inplace = True ensure this method makes changes to the dataset

this method will make the rows of the column selected, the new indices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Using pipe | and ampersand for conditional selection

A

& is used when we are trying to select with this property AND that property

is used when we are saying we want to select data that has this property OR that property

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

isin() method

A

This lets you select data whose value is in a list of values

We could combine this with loc as follows

DataFrameName.loc[DataFrameName.column_name.isin([list if values we want that are in the column]]

17
Q

isnull() method and notnull() method

A

These two methods help identify all the rows that have no data in a specific column or those that do have data

e.g. DataFrameName.loc[DataFrameName.column_name.isnull()]

18
Q

describe() method

A

Provides a high level summary of a specific column of data including the mean and quartiles etc

e.g. DataFrameName.column_name.describe()

19
Q

unique() method

A

Provides a list of all the unique values in a column

e.g. DataFrameName.column_name.unique()

20
Q

value_counts() method

A

Returns a list of the unique values and how often they occur for a particular column

e.g. df_name.column_name.value_counts()

21
Q

map() function

A

Used to substitute the values of a series in a data table in accordance to the other input which could be a function

The function you pass to map() should expect a single value from the Series, and return a transformed version of that value. map() returns a new Series where all the values have been transformed by your function.

e.g. df_name.column_name.map(function)

22
Q

apply()

A

Similar to map() except it is used to transform the whole DataFrame using a custom function

It takes the custom function as one of its arguments and axis = 0 or 1 or ‘columns’

When axis = 0 it applies the function to each column, when its 1 it applies the function to each row

When axis = ‘columns’ it applies the function to each row and when axis = ‘index’ it applies the function to each column

e.g. dataframe.apply(customfunction, axis)

or can be used for individual column dataframe.column,apply(function)

23
Q

idxmax() method

A

Used to get the row label/the index of the maximum value in a series

Series.idxmax(self, axis, skipna)

axis is only used if we are applying idxmax to the whole dataframe

if skipna = True, then the function will not include NA values

24
Q

groupby() method

A

Used to split the data into groups based on the criteria entered within the brackets such as column names etc

Sometimes produces a multi index

25
Q

agg() method

A

Lets you run multiple functions on your dataset at once

26
Q

reset_index() method

A

Used to convert a multi index to a regular index

27
Q

sort_values() method

A

Used to sort a series into an order based on the values it contains, by default this is in ascending order

dataframe.sort_values(by = ‘len’, ascending = True)
We use by to determine how we are sorting the values and if ascending is set to false it will be in descending order

28
Q

sort_index() method

A

Used to sort a series into order based on index of rows

e.g. dataframe.sort_index(axis = ‘column name’ or index, ascending = true or false)