Merging Dataframes / Datacamp(1,2,3 not project) Flashcards
Use a loop to store these files in a list as data frames and unpacked the list into variables called gold,silver, bronze
Gold.csv
Silver.csv
Bronze.csv
import pandas as pd
filenames = [‘Gold.csv’, ‘Silver.csv’, ‘Bronze.csv’]
dataframes = []
for filename in filenames:
dataframes.append(pd.read_csv(filename))
gold,silver,bronze=dataframes
Make a copy of the df gold and call it medals
medals = gold.copy()
Rename columns of df as Code, Country, Total
A B C
0 USA United States 2088.0
1 URS Soviet Union 838.0
new_labels = ['NOC', 'Country', 'Gold'] medals.columns = new_labels
Add columns ‘Total’ from data frames Silver and Bronze into dataframe gold
medals['Silver'] = silver['Total'] medals['Bronze'] = bronze['Total']
Change order of the dataframe
0 1
2013-12-31 -0.31 0.83
2014-12-31 -0.63 -0.19
2015-12-31 -0.53 -0.85
0 1
2015-12-31 -0.53 -0.85
2014-12-31 -0.63 -0.19
2013-12-31 -0.31 0.83
df.sort_index(ascending=False,inplace=True)
Change order of the dataframe
0 1
2013-12-31 -0.31 0.83
2014-12-31 -0.63 -0.19
2015-12-31 -0.53 -0.85
#Outcome: 0 1 2014-12-31 -0.63 -0.19 2015-12-31 -0.53 -0.85 2013-12-31 -0.31 0.83
df.sort_values(0, ascending=True, inplace=True)
Fill forward NaN
A B C D 0 5.0 NaN 4 5.0 1 3.0 2.0 3 4.0 2 NaN 4.0 8 2.0 3 4.0 3.0 5 NaN
A B C D 0 5.0 NaN 4 5.0 1 3.0 2.0 3 4.0 2 3.0 4.0 8 2.0 3 4.0 3.0 5 2.0
df.ffill(axis = 0)
Fill forward NaN
A B C D 0 5.0 NaN 4 5.0 1 3.0 2.0 3 4.0 2 NaN 4.0 8 2.0 3 4.0 3.0 5 NaN
A B C D 0 5.0 5.0 4.0 5.0 1 3.0 2.0 3.0 4.0 2 NaN 4.0 8.0 2.0 3 4.0 3.0 5.0 5.0
df.ffill(axis = 1)
Outcome after reindexing df
a b c
A 0.338244 0.548040 0.898135
B 0.709872 0.965031 0.521506
C 0.605053 0.340734 0.139700
df1.reindex([‘Ro’, ‘A’, ‘B’, ‘Ri’,])
a b c Ro NaN NaN NaN A 0.187632 0.587335 0.141530 B 0.900119 0.984410 0.648169 Ri NaN NaN NaN
Transform the first dataframe into the second one using a list called new_indexes
a b c
A 0.800152 0.485063 0.607966
B 0.509770 0.856334 0.820924
C 0.249211 0.675843 0.096703
a b c Ro missing missing missing A 0.800152 0.485063 0.607966 B 0.50977 0.856334 0.820924 Ri missing missing missing
new_index=[‘Ro’, ‘A’, ‘B’, ‘Ri’,]
print(df1.reindex(new_index, axis =’rows’, fill_value =’missing’))
Add these two dataframes and guess the output
a b c
A 0.8 0.4 0.3
B 0.3 0.9 0.2
C 0.9 0.9 0.4
a b c
Z 0.4 0.5 0.3
B 0.1 0.4 0.4
C 0.4 0.7 0.7
df0+df1
print(df0+df1)
aN NaN NaN
B 0.4 1.3 0.6
C 1.3 1.6 1.1
Z NaN NaN NaN
What is the output of multiplying 10 with this dataframe?
ax bx cx
Z 0.1 0.7 0.9
B 0.8 0.6 0.1
C 0.1 0.5 0.1
ax bx cx
Z 1.0 7.0 9.0
B 8.0 6.0 1.0
C 1.0 5.0 1.0
Resample into year and get the mean
0 1 2013-01-31 -2.10 -1.73 2013-02-28 -1.15 -0.74 2013-03-31 -0.51 1.65 2013-04-30 -1.10 -2.13
yearly = df.resample(‘A’).mean()
yearly
0 1
2013-12-31 -1.215 -0.7375
Percentage growth of ‘c’
a b c
A 0.5 1.0 0.4
B 0.2 0.8 0.5
C 0.8 0.5 0.6
df_1=df[‘growth_c%’] = df0[‘c’].pct_change()*100
Make sure the dataframe gets the dates
sp500 = pd.read_csv(‘sp500.csv’,index_col=’Date’,…)
sp500 = pd.read_csv(‘sp500.csv’,index_col=’Date’, parse_dates=True)
Convert the sp500 into GBP
#df_sp500 in USD Open Close Date 2015-01-02 2058.899902 2058.199951 2015-01-05 2054.439941 2020.579956 2015-01-06 2022.150024 2002.609985
#df_exchange GBP/USD Date 2015-01-02 0.65101 2015-01-05 0.65644 2015-01-06 0.65896
pounds = df_sp500.multiply(df_exchange[‘GBP/USD’], axis=’rows’)
Append df_silver and df_bronze into a df called combined and print United States result
#df_silver Country United States 1052.0 Soviet Union 584.0 United Kingdom 505.0
#df_bronze Country United States 1195.0 Soviet Union 627.0 United Kingdom 591.0
combined=bronze.append(silver)
combined.loc[‘United States’]
Append the Series Units for both df into a series called q1 and add all the units sold
#df_jan Date 2015-01-21 Streeplex Hardware 11 2015-01-09 Streeplex Service 8 2015-01-06 Initech Hardware 17
#df_feb Company Product Units Date 2015-02-26 Streeplex Service 4 2015-02-16 Hooli Software 10 2015-02-03 Initech Software 13
jan_units = jan[‘Units’]
feb_units = feb[‘Units’]
q1=jan_units.append(feb_units)
q1.sum()