M04 - Pandas Flashcards

Question

Method to fill in rows w/ NaNs + syntax

Answer 1

fillna( ) var_df.fillna(value/'value')

Answer 2

1. Boolean 2. Integer (32bit) 3. Integer (64bit) 4. Float 5. Object 6. Datetime

Answer 3

Name: bool Ex: True and False

Answer 4

Name: int32 or int64 Ex. int32: -2,147,483,648 to 2,147,483,647 Ex. int64: -9,223,3720,036,854,775,808 to 9,223,372,036,854,775,807

Answer 5

float64 Floating Decimal

Answer 6

Name: O or object Ex: Typically strings; often used as a catchall for columns w/ different data types or other Python objects like tuples, lists, an dictionaries

Answer 7

datetime64 Ex: Specific moment in time w/ nanosecond precision 2019-06-03 16:04:00.465107

Answer 8

Lets you check the data type of each column on a DataFrame var_df.dtypes Returns Column headers w/ Pandas Data name

Answer 9

If column has NO SPACES in name: var_df.column.dtypes If column has SPACES: var_df['column name'].dtypes

Answer 10

Create a copy of or new, separate file for cleaning/testing from the source code you are working on.

Answer 11

-Will add all data in a specified column to a list tolist_var = var_df[ "Column Name"].tolist( )

Answer 12

- Will split a Python string object on whitespace, or where there is no text var. split( )

Answer 13

len(var.split( ) )

Answer 14

-Returns all unique items/values in a LIST when the list is added inside parentheses set(list_var)

Answer 15

- Removes any combination of letters and words that are inside the parentheses var. strip("value")

Answer 16

- Replaces and 'old' phrase/string with a new one | var. replace('Old' , 'New')

Answer 17

Merges two DataFrames on a common column (think Join) merged_var_df = pandas.merge(var1_df , var2_df, on = ['var1_df_columnheader' , 'var2_df_columnheader'] )

Answer 18

-Rename the columns to match, this helps avoid duplicate columns or merging issues

Answer 19

-Returns an ARRAY or LIST of all unique values in a given column of a DATAFRAME varX_df = var_df['column_name'].unique( )

Answer 20

len(var_df['column_name'].unique( ) )

Answer 21

mean( ) var_df['column_name'].mean( )

Answer 22

- Used for substituting each value in a Series with another value. Where the new value is generated from a function, a dictionary, or a Series - Note, if there are multiples of a current value, you only need to map it once it will change all instances to the new value series_var.map( { 'current value1' : 'new value1' , 'current value2' : 'new value 2' , ... } )

Answer 23

- Smaller, more manageable piece of code - Good for repetitive tasks 1. The name, which is what we call the function 2. The parameters, which are values we send to the function 3. The code block, which are the statements under the function that perform the task 4. The return value, which is what the function gives back, or 'returns' to use when the task is complete

Answer 24

``` def fxn_name( ): (tab)instructions ```

Answer 25

- Used to format a value to a specific format - I.e. decimal places, adding separators, etc. "{value : format specification}".format(value) Ex. format 92.34 held as my_var print("{: .0f}".format(my_var) Output: 92

Answer 26

``` #Set var w/ column order how you want new_column_order = ['col2' , 'col4' , 'col1' , 'col3'] ``` ``` # Assign a new or same DataFrame to the new column order var_df = var_df[new_column_order] ```

Answer 27

-Returns a series wit the index set to a specified column var_name = var_df.set_index( [ 'column_name1' ] ) ["column_name2"] column_name1 will be the index, column_name2 will be the variable

Answer 28

-Returns a Series that counted + totals each unique entry in a column var_Series = var_df[ 'column_to_count'].value_counts( )

Answer 29

-Splits an object (like a DataFrame), apply a mathematical operation, and combine the results var = var_df.groupby( ['column_name'] ).mean( )

Answer 30

- Sorts values in a DataFrame or Series for a given text, index, or column that is passed within the parentheses - Can add parameter named 'ascending' (type: bool), default is ascending=True var = var_df.sort_values(['Column_name'], ascending=False)

Answer 31

-Runs on DataFrame or Series -Returns: +Number of rows in DF or Series +Average of the rows as mean +St Dev of the rows as std +Minimum value of the rows as min +25th percentile as 25% +50th percentile as 50% +75th percentile as 75% +Maximum value of the rows as max

Answer 32

var_df.describe( )

Answer 33

- Segments and sorts data values into bins - When making a variable for ranges, must include a value lower than the lowest value (i.e. 0 in the case of school district analysis) pandas.cut(var_df , var_ranges)

Answer 34

var_df = pandas.DataFrame({'Col_Name1': 'Col_Values1' , 'Col_Name2' : Col_Values2 , ... })

M04 - Pandas Flashcards

(58 cards)