Merging Dataframes / Datacamp(1,2,3 not project) Flashcards

1
Q

Use a loop to store these files in a list as data frames and unpacked the list into variables called gold,silver, bronze

Gold.csv
Silver.csv
Bronze.csv

A

import pandas as pd
filenames = [‘Gold.csv’, ‘Silver.csv’, ‘Bronze.csv’]
dataframes = []
for filename in filenames:
dataframes.append(pd.read_csv(filename))

gold,silver,bronze=dataframes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Make a copy of the df gold and call it medals

A

medals = gold.copy()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Rename columns of df as Code, Country, Total

A B C
0 USA United States 2088.0
1 URS Soviet Union 838.0

A
new_labels = ['NOC', 'Country', 'Gold']
medals.columns = new_labels
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Add columns ‘Total’ from data frames Silver and Bronze into dataframe gold

A
medals['Silver'] = silver['Total']
medals['Bronze'] = bronze['Total']
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Change order of the dataframe

0 1
2013-12-31 -0.31 0.83
2014-12-31 -0.63 -0.19
2015-12-31 -0.53 -0.85

0 1
2015-12-31 -0.53 -0.85
2014-12-31 -0.63 -0.19
2013-12-31 -0.31 0.83

A

df.sort_index(ascending=False,inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Change order of the dataframe

0 1
2013-12-31 -0.31 0.83
2014-12-31 -0.63 -0.19
2015-12-31 -0.53 -0.85

#Outcome:
 0 1
2014-12-31 -0.63 -0.19
2015-12-31 -0.53 -0.85
2013-12-31 -0.31 0.83
A

df.sort_values(0, ascending=True, inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Fill forward NaN

 A B C D
0 5.0 NaN 4 5.0
1 3.0 2.0 3 4.0
2 NaN 4.0 8 2.0
3 4.0 3.0 5 NaN
 A B C D
0 5.0 NaN 4 5.0
1 3.0 2.0 3 4.0
2 3.0 4.0 8 2.0
3 4.0 3.0 5 2.0
A

df.ffill(axis = 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Fill forward NaN

 A B C D
0 5.0 NaN 4 5.0
1 3.0 2.0 3 4.0
2 NaN 4.0 8 2.0
3 4.0 3.0 5 NaN
A B C D
0 5.0 5.0 4.0 5.0
1 3.0 2.0 3.0 4.0
2 NaN 4.0 8.0 2.0
3 4.0 3.0 5.0 5.0
A

df.ffill(axis = 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Outcome after reindexing df

a b c
A 0.338244 0.548040 0.898135
B 0.709872 0.965031 0.521506
C 0.605053 0.340734 0.139700

df1.reindex([‘Ro’, ‘A’, ‘B’, ‘Ri’,])

A
a b c
Ro NaN NaN NaN
A 0.187632 0.587335 0.141530
B 0.900119 0.984410 0.648169
Ri NaN NaN NaN
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Transform the first dataframe into the second one using a list called new_indexes

a b c
A 0.800152 0.485063 0.607966
B 0.509770 0.856334 0.820924
C 0.249211 0.675843 0.096703

 a b c
Ro missing missing missing
A 0.800152 0.485063 0.607966
B 0.50977 0.856334 0.820924
Ri missing missing missing
A

new_index=[‘Ro’, ‘A’, ‘B’, ‘Ri’,]

print(df1.reindex(new_index, axis =’rows’, fill_value =’missing’))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Add these two dataframes and guess the output

a b c
A 0.8 0.4 0.3
B 0.3 0.9 0.2
C 0.9 0.9 0.4

a b c
Z 0.4 0.5 0.3
B 0.1 0.4 0.4
C 0.4 0.7 0.7

A

df0+df1

print(df0+df1)

aN NaN NaN
B 0.4 1.3 0.6
C 1.3 1.6 1.1
Z NaN NaN NaN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the output of multiplying 10 with this dataframe?

ax bx cx
Z 0.1 0.7 0.9
B 0.8 0.6 0.1
C 0.1 0.5 0.1

A

ax bx cx
Z 1.0 7.0 9.0
B 8.0 6.0 1.0
C 1.0 5.0 1.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Resample into year and get the mean

 0 1
2013-01-31 -2.10 -1.73
2013-02-28 -1.15 -0.74
2013-03-31 -0.51 1.65
2013-04-30 -1.10 -2.13
A

yearly = df.resample(‘A’).mean()
yearly

0 1
2013-12-31 -1.215 -0.7375

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Percentage growth of ‘c’

a b c
A 0.5 1.0 0.4
B 0.2 0.8 0.5
C 0.8 0.5 0.6

A

df_1=df[‘growth_c%’] = df0[‘c’].pct_change()*100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Make sure the dataframe gets the dates

sp500 = pd.read_csv(‘sp500.csv’,index_col=’Date’,…)

A

sp500 = pd.read_csv(‘sp500.csv’,index_col=’Date’, parse_dates=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Convert the sp500 into GBP

#df_sp500 in USD
 Open Close
Date 
2015-01-02 2058.899902 2058.199951
2015-01-05 2054.439941 2020.579956
2015-01-06 2022.150024 2002.609985
#df_exchange GBP/USD
Date 
2015-01-02 0.65101
2015-01-05 0.65644
2015-01-06 0.65896
A

pounds = df_sp500.multiply(df_exchange[‘GBP/USD’], axis=’rows’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Append df_silver and df_bronze into a df called combined and print United States result

#df_silver
Country
United States 1052.0
Soviet Union 584.0
United Kingdom 505.0
#df_bronze
Country
United States 1195.0
Soviet Union 627.0
United Kingdom 591.0
A

combined=bronze.append(silver)

combined.loc[‘United States’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Append the Series Units for both df into a series called q1 and add all the units sold

#df_jan
Date 
2015-01-21 Streeplex Hardware 11
2015-01-09 Streeplex Service 8
2015-01-06 Initech Hardware 17
#df_feb
 Company Product Units
Date 
2015-02-26 Streeplex Service 4
2015-02-16 Hooli Software 10
2015-02-03 Initech Software 13
A

jan_units = jan[‘Units’]
feb_units = feb[‘Units’]
q1=jan_units.append(feb_units)
q1.sum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Use a loop to concatenate data frames (jan,feb, mar) by its Units into a dataframe called quarter1

 Company Product Units
Date 
2015-02-26 Streeplex Service 4
2015-02-16 Hooli Software 10
2015-02-03 Initech Software 13
A

units = []
for month in [jan, feb, mar]:
units.append(month[‘Units’])
quarter1 = pd.concat(units, axis=’rows’)

20
Q

Concatenate both data frames

df1
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
df2
A B C D
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
A

Import pandas as pd

pd.concat([df1, df2])

21
Q

difference between append and concat method in pandas

A

Append combines the rows of one dataframe to another dataframe. To be honest, I don’t use this method ever.
Concat can take a group of 2+ dataframes and combines the dataframes via the rows or columns.

22
Q

Outcome of:

df1.append(df2)

#df1
a b
0 1 5
1 2 6
2 3 7
#df2
a b
0 9 4
1 6 2
2 3 10
A
a b
0 1 5
1 2 6
2 3 7
3 4 8
0 9 4
1 6 2
2 3 10
23
Q

Append df1 and df2 and make the index to be a sequence of integers starting from 0

A

df1.append(df2, ignore_index = True)

24
Q

Concatenate horizontally df1 and df2

A

pd.concat([df1,df2],axis=1)

25
#Concatenate vertically df1 and df2, and make the index to be a sequence of integers starting from 0
pd.concat([df1,df2],axis=0,ignore_index=True)
26
Use a loop to read csv files named [gold_top5.csv, silver.., bronze] . Create a df with 4 columns (Country, medals) and make the Country the index. Use append method ``` #Outcome: bronze silver gold France 475.0 461.0 NaN Germany 454.0 NaN 407.0 Italy NaN 394.0 460.0 ```
medals_type=['bronze', 'silver', 'gold'] medals =[] ``` for medal in medal_types: file_name = "%s_top5.csv" % medal columns = ['Country', medal] medal_df = pd.read_csv(file_name, index_col='Country', names=columns) medals.append(medal_df) ``` medals_df = pd.concat(medals, axis='columns') print(medals_df.head(3)
27
``` #Concatenating vertically to get MultiIndexed rows #Concatenate a list of dataframes ```
To do medals = pd.concat(medals,keys=['bronze', 'silver', 'gold'])
28
``` #Code to get the given outcome: df=medals ``` ``` Total Country bronze United States 1052.0 Soviet Union 584.0 United Kingdom 505.0 … gold United Kingdom 498.0 Italy 460.0 Germany 407.0 ``` ``` #Outcome: Total Country bronze France 475.0 silver France 461.0 bronze Germany 454.0 gold Germany 407.0 ```
medals.sort_index(level=1)
29
#Slice index to get medals from UK ``` Country bronze United States 1052.0 Soviet Union 584.0 United Kingdom 505.0 ... ```
idx = pd.IndexSlice medals_sorted.loc[idx[:,'United Kingdom'],:] ``` #Outcome: Total Country bronze United Kingdom 505.0 gold United Kingdom 498.0 silver United Kingdom 591.0 ```
30
#Get silver medals Country bronze United States 1052.0 Soviet Union 584.0 United Kingdom 505.0
#medals_sorted.loc['silver'] ``` Outcome: Total Country France 461.0 Italy 394.0 Soviet Union 627.0 United Kingdom 591.0 United States 1195.0 ```
31
#You have 3 dataframes in a dataframes list. Concatenate horizontally using their product a hierarchical column that contains the rest of the information ``` #df1 Company Product Units Date 2015-02-04 Acme Coporation Hardware 14 2015-02-07 Acme Coporation Hardware 1 2015-02-19 Mediacore Hardware 16 ``` #df2 and df3 have products 'Software','Service' ``` #Outcome: Hardware Software Service Company Product Units ……. Date 2015-02-02 Hooli Software 3 ….. ```
pd.concat(dataframes, keys=['Hardware','Software','Service'], axis=1)
32
#In a multi-indexed column dataframe with Hardware','Software','Service as top columns and Company Product Units second level coluumns, get Company data for 2015-2-2 to 2015-2-8 ``` Hardware Software Service Company Company Company Date 2015-02-02 08:33:01 NaN Hooli NaN 2015-02-02 20:54:49 Mediacore NaN NaN ```
idx = pd.IndexSlice | february.loc['2015-2-2':'2015-2-8', idx[:,'Company']]
33
#Use the list month_list to populate a dict the value as df. Grouped by Company. Sum its values. Then concatenate the dictionary into a dataframe called sales month_list = [('january', jan), ('february', feb), ('march', mar)]
month_dict = {} for month_name, month_data in month_list: month_dict[month_name] = month_data.groupby('Company').sum() sales = pd.concat(month_dict) #Outcome: ``` Units Company february Acme Coporation 34 Hooli 30 Initech 30 Mediacore 45 ```
34
#Inner join of three similar dataframes (gold, silver, bronze) and visualize more or less the final dataframe ``` #bronze Total Country United States 1052.0 Soviet Union 584.0 United Kingdom 505.0 ```
medals=pd.concat(medal_list,keys=['bronze', 'silver', 'gold'],axis=1,join='inner') ``` #Outcome bronze silver gold Total Total Total Country United States 1052.0 1195.0 2088.0 Soviet Union 584.0 627.0 838.0 ```
35
#What does inner join do?
keep only rows that share common index labels.
36
Change Date to annually using the last value of each year, chain a percentage growth and delete null values ``` China Year 1961-01-01 49.557050 1962-01-01 46.685179 1963-01-01 50.097303 ```
china_annual = china.resample('A').last().pct_change(10).dropna() #Outcome: China Year 1971-12-31 0.988860 1972-12-31 1.402472
37
#Use inner join with two dataframes called china, us with similar structure ``` #Outcome China US Year 1971-12-31 0.988860 1.052270 1972-12-31 1.402472 1.172566 ```
pd.concat([china_annual, us_annual], axis=1, join='inner')
38
#Merge both df. How many rows will have the new df? ``` #df_1 city revenue 0 Austin 100 1 Denver 83 2 Springfield 4 ``` ``` #df_2 city manager 0 Austin Charlers 1 Denver Joel 2 Mendocino Brett ```
combined = pd.merge(revenue, managers, on='city') Two lines
39
#Merge both using cities. What will be the structure of the new dataframe? #df1 city branch_id state revenue 0 Austin 10 TX 100 1 Denver 20 CO 83 #df2 branch branch_id state manager 0 Austin 10 TX Charlers 1 Denver 20 CO Joel
combined = pd.merge(revenue,managers,left_on='city',right_on='branch') Columns of the new df=city branch_id_x state_x revenue branch branch_id_y state_y manager
40
#Outcome of: pd.merge(revenue, managers, on='city') #revenue city branch_id state revenue 0 Austin 10 TX 100 1 Denver 20 CO 83 #managers branch branch_id state manager 0 Austin 10 TX Charlers 1 Denver 20 CO Joel
``` Traceback (most recent call last): ... ... pd.merge(revenue, managers, on='city') ... ... KeyError: 'city' ```
41
#Join both dataframes df1,df2 and visualize Outcome: A B K0 A0 B0 K1 A1 B1 K2 A2 B2 C D K0 C0 D0 K1 C1 D1 K2 C2 D2
left.join(right) A B C D K0 A0 B0 C0 D0 K1 A1 B1 C1 D1 K2 A2 B2 C2 D2
42
#Join using suffix and prefix. Visualize outcome ``` #revenue city state revenue branch_id 10 Austin TX 100 20 Denver CO 83 30 Springfield IL 4 ``` ``` #managers branch state manager branch_id 10 Austin TX Charlers 20 Denver CO Joel 47 Mendocino CA Brett ```
revenue.join(managers,lsuffix='_rev',rsuffix='_mng',how='outer') ``` city state_rev revenue branch state_mng manager branch_id 10 Austin TX 100.0 Austin TX Charlers 20 Denver CO 83.0 Denver CO Joel 30 Springfield IL 4.0 NaN NaN NaN 31 NaN NaN NaN Springfield MO Sally ```
43
What is the difference between merge and join in pandas?
join() methods as a convenient way to access the capabilities of pandas. ... join(df2) always joins via the index of df2 , but df1. merge(df2) can join to one or more columns of df2 (default) or to the index of df2 (with right_index=True ). lookup on left table: by default, df1
44
#What does left join do?
Left outer join produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null. pd.merge(df_a, df_b, on='subject_id', how='left') #Can add suffixes pd.merge(df_a, df_b, on='subject_id', how='left', suffixes=('_left', '_right')
45
#What does merge_ordered do?
Perform merge with optional filling/interpolation. | Designed for ordered data like time series data