Week 2 Flashcards
sports = {'Archery': 'Bhutan', 'Golf': 'Scotland', 'Sumo': 'Japan', 'Taekwondo': 'South Korea'} s = pd.Series(sports) s
Archery Bhutan Golf Scotland Sumo Japan Taekwondo South Korea dtype: object
numbers = [1, 2, 3]
pd.Series(numbers)
0 1
1 2
2 3
dtype: int64
numbers = [1, 2, None]
pd.Series(numbers)
numbers = [1, 2, None]
pd.Series(numbers)
0 1.0
1 2.0
2 NaN
dtype: float64
sports = {'Archery': 'Bhutan', 'Golf': 'Scotland', 'Sumo': 'Japan', 'Taekwondo': 'South Korea'} s = pd.Series(sports) s.index
Index([‘Archery’, ‘Golf’, ‘Sumo’, ‘Taekwondo’], dtype=’object’)
s = pd.Series([‘Tiger’, ‘Bear’, ‘Moose’], index=[‘India’, ‘America’, ‘Canada’])
s
India Tiger
America Bear
Canada Moose
dtype: object
sports = {'Archery': 'Bhutan', 'Golf': 'Scotland', 'Sumo': 'Japan', 'Taekwondo': 'South Korea'} s = pd.Series(sports, index=['Golf', 'Sumo', 'Hockey']) s
Golf Scotland
Sumo Japan
Hockey NaN
dtype: object
sports = {'Archery': 'Bhutan', 'Golf': 'Scotland', 'Sumo': 'Japan', 'Taekwondo': 'South Korea'} s = pd.Series(sports)
s.iloc[3]
query the number 3 location series position which is the fourth entry.
‘South Korea’
sports = {'Archery': 'Bhutan', 'Golf': 'Scotland', 'Sumo': 'Japan', 'Taekwondo': 'South Korea'} s = pd.Series(sports)
s.loc[‘Golf’]
query based on key value. iloc and loc are not methods. They’re attributes so they use [].
import numpy as np
s = pd.Series([100.00, 120.00, 101.00, 3.00])
total = np.sum(s)
print(total)
returns sum of numbers in array
324.0
s = pd.Series(np.random.randint(0,1000,10000))
s.head()
this creates a big series of random numbers indexed by ordered integers
0 96 1 643 2 202 3 393 4 250 dtype: int64
s = pd.Series(np.random.randint(0,1000,10000))
len(s)
- Third param.
s = pd.Series(np.random.randint(0,1000,10000))
s+=2
s.head() returns: 0 96 1 643 2 202 3 393 4 250 dtype: int64
adds two to each item in s using broadcasting
0 98 1 645 2 204 3 395 4 252 dtype: int64
for label, value in s.iteritems():
s. set_value(label, value+2) s. head()
what’s an easier way to do this?
if you find yourself iterating through a series, question if you’re doing process the best possible way.
0 100 1 647 2 206 3 397 4 254 dtype: int64
adds two to each value in series
s = pd.Series([1, 2, 3])
s.loc[‘Animal’] = ‘Bears’
s
.loc doesn’t only allow you to access data. Also allows you to add new data.
indeces can have mixed types. pandas automatically changes underlyin types as appropriate.
0 1 1 2 2 3 Animal Bears dtype: object
original_sports = pd.Series({‘Archery’: ‘Bhutan’,
‘Golf’: ‘Scotland’,
‘Sumo’: ‘Japan’,
‘Taekwondo’: ‘South Korea’})
cricket_loving_countries = pd.Series([‘Australia’,
‘Barbados’,
‘Pakistan’,
‘England’],
index=[‘Cricket’,
‘Cricket’,
‘Cricket’,
‘Cricket’])
all_countries = original_sports.append(cricket_loving_countries)
cricket_loving_countries
works a lot like a sql table query.
Cricket Australia Cricket Barbados Cricket Pakistan Cricket England dtype: object