1. Series Flashcards
what is a series
one dimensional labeled array
one dimensional means 1 key to get the value
creating a series
lottery = [4,8,5,6,42,23]
pd.Series(lottery)
if it is a dictionary, the key will go into pd index
index doesn’t need to be unique
**Assign index
Fruits = [“Apple”,”Orange”,”Plum”]
weekdays = [“Monday”,”Tuesday”,”Wednesday”]
pd.Series(Fruits, index = weekdays)
attribute vs method
attribute just tells you some info, no () at the end
methods do something on it
3 basic series attribute
s. values => return array
s. index => return RangeIndex object
s. dtype
s. shape => Tuple
s. size (count null rows)
basic calculation method
s. sum()
s. product()
s. mean()
s. count() exclude null values, while len(s) does not
s. describe()
s. max() vs s.idxmax()
s. min() vs s.idxmin()
s. head(n)
s. tail(n)
read csv method
pd. read_csv(“pokemon.csv”) => df
pd. read_csv(“pokemon.csv”, usecols=[“Pokemon”])
pd. read_csv(“pokemon.csv”, usecols=[“Pokemon”], squeeze = True) => series
sort
s. sort_values()
s. sort_index()
**Inplace
google = google.sort_values()
or just
google.sort_values(inplace = True)
Note:
sort_values() will have the same index,
pokemon[0] doesn’t mean the first one in the series but the index with 0. because the “0” refer to a key first, then position
if the index is not a number then it’s the first one
however, slicing follows pokemon[0:50] position
check if something is in a series
by default it look into the key
“Bulbasaur” in pokemon.values
extracting multiple member
pokemon[[459,62]]
pokemon.reindex(index = [459,62])
both return a new seires
.get() method on Series
s.get([list of valid keys], default=”abc”)
if any of the key does not exist, just return default and nothing else
can use reindex to bypass
count all the unique values
s.value_counts()
apply method
s. apply(myfunc)
s. apply(lambda price: price ** 2)
map method
similar to vlookup, pass the value into key (from the original series) and return the values in second series
pokemon_names.map(pokemon_types)
squeeze
nba = pd.read_csv(“nba.csv”, usecols= [“Team”,”Name”],index_col = “Name”, squeeze= True)
turn df into Series
nba = pd.read_csv(“nba.csv”, usecols= [“Team”],index_col = “Name”, squeeze= True)
This won’t work as not reading in the index column