Merging Dataframes / Datacamp(1,2,3 not project) Flashcards by Juan Torregrosa Pisonero

Use a loop to store these files in a list as data frames and unpacked the list into variables called gold,silver, bronze

Gold.csv
Silver.csv
Bronze.csv

import pandas as pd
filenames = [‘Gold.csv’, ‘Silver.csv’, ‘Bronze.csv’]
dataframes = []
for filename in filenames:
dataframes.append(pd.read_csv(filename))

gold,silver,bronze=dataframes

How well did you know this?

Not at all

Perfectly

Make a copy of the df gold and call it medals

medals = gold.copy()

How well did you know this?

Not at all

Perfectly

Rename columns of df as Code, Country, Total

A B C
0 USA United States 2088.0
1 URS Soviet Union 838.0

new_labels = ['NOC', 'Country', 'Gold']
medals.columns = new_labels

How well did you know this?

Not at all

Perfectly

Add columns ‘Total’ from data frames Silver and Bronze into dataframe gold

medals['Silver'] = silver['Total']
medals['Bronze'] = bronze['Total']

How well did you know this?

Not at all

Perfectly

Change order of the dataframe

0 1
2013-12-31 -0.31 0.83
2014-12-31 -0.63 -0.19
2015-12-31 -0.53 -0.85

0 1
2015-12-31 -0.53 -0.85
2014-12-31 -0.63 -0.19
2013-12-31 -0.31 0.83

df.sort_index(ascending=False,inplace=True)

How well did you know this?

Not at all

Perfectly

Change order of the dataframe

0 1
2013-12-31 -0.31 0.83
2014-12-31 -0.63 -0.19
2015-12-31 -0.53 -0.85

#Outcome:
 0 1
2014-12-31 -0.63 -0.19
2015-12-31 -0.53 -0.85
2013-12-31 -0.31 0.83

df.sort_values(0, ascending=True, inplace=True)

How well did you know this?

Not at all

Perfectly

Fill forward NaN

 A B C D
0 5.0 NaN 4 5.0
1 3.0 2.0 3 4.0
2 NaN 4.0 8 2.0
3 4.0 3.0 5 NaN

 A B C D
0 5.0 NaN 4 5.0
1 3.0 2.0 3 4.0
2 3.0 4.0 8 2.0
3 4.0 3.0 5 2.0

df.ffill(axis = 0)

How well did you know this?

Not at all

Perfectly

Fill forward NaN

 A B C D
0 5.0 NaN 4 5.0
1 3.0 2.0 3 4.0
2 NaN 4.0 8 2.0
3 4.0 3.0 5 NaN

A B C D
0 5.0 5.0 4.0 5.0
1 3.0 2.0 3.0 4.0
2 NaN 4.0 8.0 2.0
3 4.0 3.0 5.0 5.0

df.ffill(axis = 1)

How well did you know this?

Not at all

Perfectly

Outcome after reindexing df

a b c
A 0.338244 0.548040 0.898135
B 0.709872 0.965031 0.521506
C 0.605053 0.340734 0.139700

df1.reindex([‘Ro’, ‘A’, ‘B’, ‘Ri’,])

a b c
Ro NaN NaN NaN
A 0.187632 0.587335 0.141530
B 0.900119 0.984410 0.648169
Ri NaN NaN NaN

How well did you know this?

Not at all

Perfectly

Transform the first dataframe into the second one using a list called new_indexes

a b c
A 0.800152 0.485063 0.607966
B 0.509770 0.856334 0.820924
C 0.249211 0.675843 0.096703

 a b c
Ro missing missing missing
A 0.800152 0.485063 0.607966
B 0.50977 0.856334 0.820924
Ri missing missing missing

new_index=[‘Ro’, ‘A’, ‘B’, ‘Ri’,]

print(df1.reindex(new_index, axis =’rows’, fill_value =’missing’))

How well did you know this?

Not at all

Perfectly

Add these two dataframes and guess the output

a b c
A 0.8 0.4 0.3
B 0.3 0.9 0.2
C 0.9 0.9 0.4

a b c
Z 0.4 0.5 0.3
B 0.1 0.4 0.4
C 0.4 0.7 0.7

df0+df1

print(df0+df1)

aN NaN NaN
B 0.4 1.3 0.6
C 1.3 1.6 1.1
Z NaN NaN NaN

How well did you know this?

Not at all

Perfectly

What is the output of multiplying 10 with this dataframe?

ax bx cx
Z 0.1 0.7 0.9
B 0.8 0.6 0.1
C 0.1 0.5 0.1

ax bx cx
Z 1.0 7.0 9.0
B 8.0 6.0 1.0
C 1.0 5.0 1.0

How well did you know this?

Not at all

Perfectly

Resample into year and get the mean

 0 1
2013-01-31 -2.10 -1.73
2013-02-28 -1.15 -0.74
2013-03-31 -0.51 1.65
2013-04-30 -1.10 -2.13

yearly = df.resample(‘A’).mean()
yearly

0 1
2013-12-31 -1.215 -0.7375

How well did you know this?

Not at all

Perfectly

Percentage growth of ‘c’

a b c
A 0.5 1.0 0.4
B 0.2 0.8 0.5
C 0.8 0.5 0.6

df_1=df[‘growth_c%’] = df0[‘c’].pct_change()*100

How well did you know this?

Not at all

Perfectly

Make sure the dataframe gets the dates

sp500 = pd.read_csv(‘sp500.csv’,index_col=’Date’,…)

sp500 = pd.read_csv(‘sp500.csv’,index_col=’Date’, parse_dates=True)

How well did you know this?

Not at all

Perfectly

Convert the sp500 into GBP

#df_sp500 in USD
 Open Close
Date 
2015-01-02 2058.899902 2058.199951
2015-01-05 2054.439941 2020.579956
2015-01-06 2022.150024 2002.609985

#df_exchange GBP/USD
Date 
2015-01-02 0.65101
2015-01-05 0.65644
2015-01-06 0.65896

pounds = df_sp500.multiply(df_exchange[‘GBP/USD’], axis=’rows’)

How well did you know this?

Not at all

Perfectly

Append df_silver and df_bronze into a df called combined and print United States result

#df_silver
Country
United States 1052.0
Soviet Union 584.0
United Kingdom 505.0

#df_bronze
Country
United States 1195.0
Soviet Union 627.0
United Kingdom 591.0

combined=bronze.append(silver)

combined.loc[‘United States’]

How well did you know this?

Not at all

Perfectly

Append the Series Units for both df into a series called q1 and add all the units sold

#df_jan
Date 
2015-01-21 Streeplex Hardware 11
2015-01-09 Streeplex Service 8
2015-01-06 Initech Hardware 17

#df_feb
 Company Product Units
Date 
2015-02-26 Streeplex Service 4
2015-02-16 Hooli Software 10
2015-02-03 Initech Software 13

jan_units = jan[‘Units’]
feb_units = feb[‘Units’]
q1=jan_units.append(feb_units)
q1.sum()

How well did you know this?

Not at all

Perfectly

Use a loop to concatenate data frames (jan,feb, mar) by its Units into a dataframe called quarter1

 Company Product Units
Date 
2015-02-26 Streeplex Service 4
2015-02-16 Hooli Software 10
2015-02-03 Initech Software 13

Study These Flashcards

units = []
for month in [jan, feb, mar]:
units.append(month[‘Units’])
quarter1 = pd.concat(units, axis=’rows’)

Concatenate both data frames

df1
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2

df2
A B C D
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6

Study These Flashcards

Import pandas as pd

pd.concat([df1, df2])

difference between append and concat method in pandas

Study These Flashcards

Append combines the rows of one dataframe to another dataframe. To be honest, I don’t use this method ever.
Concat can take a group of 2+ dataframes and combines the dataframes via the rows or columns.

Outcome of:

df1.append(df2)

Study These Flashcards

Append df1 and df2 and make the index to be a sequence of integers starting from 0

Study These Flashcards

df1.append(df2, ignore_index = True)

Concatenate horizontally df1 and df2

Study These Flashcards

pd.concat([df1,df2],axis=1)

#Concatenate vertically df1 and df2, and make the index to be a sequence of integers starting from 0

pd.concat([df1,df2],axis=0,ignore_index=True)

Use a loop to read csv files named [gold_top5.csv, silver.., bronze] . Create a df with 4 columns (Country, medals) and make the Country the index. Use append method ``` #Outcome: bronze silver gold France 475.0 461.0 NaN Germany 454.0 NaN 407.0 Italy NaN 394.0 460.0 ```

medals_type=['bronze', 'silver', 'gold'] medals =[] ``` for medal in medal_types: file_name = "%s_top5.csv" % medal columns = ['Country', medal] medal_df = pd.read_csv(file_name, index_col='Country', names=columns) medals.append(medal_df) ``` medals_df = pd.concat(medals, axis='columns') print(medals_df.head(3)

``` #Concatenating vertically to get MultiIndexed rows #Concatenate a list of dataframes ```

To do medals = pd.concat(medals,keys=['bronze', 'silver', 'gold'])

``` #Code to get the given outcome: df=medals ``` ``` Total Country bronze United States 1052.0 Soviet Union 584.0 United Kingdom 505.0 … gold United Kingdom 498.0 Italy 460.0 Germany 407.0 ``` ``` #Outcome: Total Country bronze France 475.0 silver France 461.0 bronze Germany 454.0 gold Germany 407.0 ```

medals.sort_index(level=1)

#Slice index to get medals from UK ``` Country bronze United States 1052.0 Soviet Union 584.0 United Kingdom 505.0 ... ```

idx = pd.IndexSlice medals_sorted.loc[idx[:,'United Kingdom'],:] ``` #Outcome: Total Country bronze United Kingdom 505.0 gold United Kingdom 498.0 silver United Kingdom 591.0 ```

#Get silver medals Country bronze United States 1052.0 Soviet Union 584.0 United Kingdom 505.0

#medals_sorted.loc['silver'] ``` Outcome: Total Country France 461.0 Italy 394.0 Soviet Union 627.0 United Kingdom 591.0 United States 1195.0 ```

#You have 3 dataframes in a dataframes list. Concatenate horizontally using their product a hierarchical column that contains the rest of the information ``` #df1 Company Product Units Date 2015-02-04 Acme Coporation Hardware 14 2015-02-07 Acme Coporation Hardware 1 2015-02-19 Mediacore Hardware 16 ``` #df2 and df3 have products 'Software','Service' ``` #Outcome: Hardware Software Service Company Product Units ……. Date 2015-02-02 Hooli Software 3 ….. ```

pd.concat(dataframes, keys=['Hardware','Software','Service'], axis=1)

#In a multi-indexed column dataframe with Hardware','Software','Service as top columns and Company Product Units second level coluumns, get Company data for 2015-2-2 to 2015-2-8 ``` Hardware Software Service Company Company Company Date 2015-02-02 08:33:01 NaN Hooli NaN 2015-02-02 20:54:49 Mediacore NaN NaN ```

idx = pd.IndexSlice | february.loc['2015-2-2':'2015-2-8', idx[:,'Company']]

#Use the list month_list to populate a dict the value as df. Grouped by Company. Sum its values. Then concatenate the dictionary into a dataframe called sales month_list = [('january', jan), ('february', feb), ('march', mar)]

month_dict = {} for month_name, month_data in month_list: month_dict[month_name] = month_data.groupby('Company').sum() sales = pd.concat(month_dict) #Outcome: ``` Units Company february Acme Coporation 34 Hooli 30 Initech 30 Mediacore 45 ```

#Inner join of three similar dataframes (gold, silver, bronze) and visualize more or less the final dataframe ``` #bronze Total Country United States 1052.0 Soviet Union 584.0 United Kingdom 505.0 ```

medals=pd.concat(medal_list,keys=['bronze', 'silver', 'gold'],axis=1,join='inner') ``` #Outcome bronze silver gold Total Total Total Country United States 1052.0 1195.0 2088.0 Soviet Union 584.0 627.0 838.0 ```

#What does inner join do?

keep only rows that share common index labels.

Change Date to annually using the last value of each year, chain a percentage growth and delete null values ``` China Year 1961-01-01 49.557050 1962-01-01 46.685179 1963-01-01 50.097303 ```

china_annual = china.resample('A').last().pct_change(10).dropna() #Outcome: China Year 1971-12-31 0.988860 1972-12-31 1.402472

#Use inner join with two dataframes called china, us with similar structure ``` #Outcome China US Year 1971-12-31 0.988860 1.052270 1972-12-31 1.402472 1.172566 ```

pd.concat([china_annual, us_annual], axis=1, join='inner')

#Merge both df. How many rows will have the new df? ``` #df_1 city revenue 0 Austin 100 1 Denver 83 2 Springfield 4 ``` ``` #df_2 city manager 0 Austin Charlers 1 Denver Joel 2 Mendocino Brett ```

combined = pd.merge(revenue, managers, on='city') Two lines

#Merge both using cities. What will be the structure of the new dataframe? #df1 city branch_id state revenue 0 Austin 10 TX 100 1 Denver 20 CO 83 #df2 branch branch_id state manager 0 Austin 10 TX Charlers 1 Denver 20 CO Joel

combined = pd.merge(revenue,managers,left_on='city',right_on='branch') Columns of the new df=city branch_id_x state_x revenue branch branch_id_y state_y manager

#Outcome of: pd.merge(revenue, managers, on='city') #revenue city branch_id state revenue 0 Austin 10 TX 100 1 Denver 20 CO 83 #managers branch branch_id state manager 0 Austin 10 TX Charlers 1 Denver 20 CO Joel

``` Traceback (most recent call last): ... ... pd.merge(revenue, managers, on='city') ... ... KeyError: 'city' ```

#Join both dataframes df1,df2 and visualize Outcome: A B K0 A0 B0 K1 A1 B1 K2 A2 B2 C D K0 C0 D0 K1 C1 D1 K2 C2 D2

left.join(right) A B C D K0 A0 B0 C0 D0 K1 A1 B1 C1 D1 K2 A2 B2 C2 D2

#Join using suffix and prefix. Visualize outcome ``` #revenue city state revenue branch_id 10 Austin TX 100 20 Denver CO 83 30 Springfield IL 4 ``` ``` #managers branch state manager branch_id 10 Austin TX Charlers 20 Denver CO Joel 47 Mendocino CA Brett ```

revenue.join(managers,lsuffix='_rev',rsuffix='_mng',how='outer') ``` city state_rev revenue branch state_mng manager branch_id 10 Austin TX 100.0 Austin TX Charlers 20 Denver CO 83.0 Denver CO Joel 30 Springfield IL 4.0 NaN NaN NaN 31 NaN NaN NaN Springfield MO Sally ```

What is the difference between merge and join in pandas?

join() methods as a convenient way to access the capabilities of pandas. ... join(df2) always joins via the index of df2 , but df1. merge(df2) can join to one or more columns of df2 (default) or to the index of df2 (with right_index=True ). lookup on left table: by default, df1

#What does left join do?

Left outer join produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null. pd.merge(df_a, df_b, on='subject_id', how='left') #Can add suffixes pd.merge(df_a, df_b, on='subject_id', how='left', suffixes=('_left', '_right')

#What does merge_ordered do?

Perform merge with optional filling/interpolation. | Designed for ordered data like time series data

Merging Dataframes / Datacamp(1,2,3 not project) Flashcards

(45 cards)