Panda's Flashcards
Panda data structure Series. what is it?
A Series is a one-dimentional array-like object, including a sequence of value (similar to NumPy array) and an associated array of index. obj=pd.Series([4,5,-3,2]) obj 0 4 1 5 2 -3 3 2 dtype: int64
output the array values
obj.values
array([ 4, 5, -3, 2], dtype=int64)
output the list of index values in panda series
obj.index
RangeIndex(start=0, stop=4, step=1)
how to assign a different index in panda series
#Specify a different index obj2=pd.Series([4,5,-3,2],index=['d','c','a','b']) obj2
get value from index in panda series
pandas has more fexibility to use index than NumPy.
obj2[‘c’]
5
show same index but use normal index position
still works.
#pandas has more fexibility to use index than NumPy.
obj2[1]
5
get 2 values using the assigned letters in panda series.
obj2[[‘a’,’d’]]
#[‘a’,’d’] can be seen as a list of indices. It returns to a subset of the original Seires, which is also a Seiries.
a -3
d 4
you can do numpy like operations on the series array.
obj2[obj2>0] d 4 c 5 b 2 dtype: int64
find data type
type(new)
find missing data in pandas
pd. isnull(obj4)
obj3. isnull()
bool to find missing data
pd.notnull(obj4)
assign value 300 to bread
obj4[‘bread’]=300
DataFrame
DataFrame¶
There are many possible data inputs to DataFrame. Such as, np array, dict of lists ot tuples, dict of Series, dict of dicts and so on…
We only intorudce how to contruct DataFrame through dict of lists
create a dataframe
create a DataFrame through a dict of equal length lists or NumPy arrays:
data={‘state’:[‘Ohio’,’Ohio’,’Ohio’,’Nevada’,’Nevada’,’Nevada’],
‘year’:[2000,2001,2002,2000,2001,2002],
‘pop’:[1.5,1.7,3.6,2.4,2.9,3.2]}
frame=pd.DataFrame(data)
frame
state year pop 0 Ohio 2000 1.5 1 Ohio 2001 1.7 2 Ohio 2002 3.6 3 Nevada 2000 2.4 4 Nevada 2001 2.9 5 Nevada 2002 3.2
create another dataframe from dictionary
election = {'state':['New Jersey','Ohio','West Virginia'], 'Winner':['Hillary','Trump','Trump'], 'Margin':[5,7,15]} election type(election) electionresult = pd.DataFrame(election) #electionresult electionresult.head()
show first 2 indexes of dataframe
electionresult2=pd.DataFrame(electionresult,index=[0,1])
electionresult2
are lists mutable?
You have to understand that Python represents all its data as objects. … Some of these objects like lists and dictionaries are mutable , meaning you can change their content without changing their identity. Other objects like integers, floats, strings and tuples are objects that can not be changed.
create a Series
a=pd.Series([1,2,3,4],[‘a’,’b’,’c’,’d’])
show the data in a series
a.values
array([ 4, 5, -3, 2])
show the index with pandas
a.index
by using labels. #pandas has more fexibility to use index than NumPy.
a[‘c’]=5
a[‘c’]
5
numpy like operations
obj2[obj2>0]
np.exp(obj2)
create a series from a dictionary
dict1={'eggs':10,'ham':20} series1=pd.Series(dict1) series1 eggs 10 ham 20 dtype: int64
how to find if something exists in series
pd.isnull(obj4)
bread False
ham False
dtype: bool
give value to a key in Series
obj4[‘bread’]=300
obj4
300
add the values of 2 series
print(obj3)
print(obj4)
obj4+obj3
select first 5 rows of a dataframe
frame.head()#this method selects only the first 5 rows.
first 10 rows of a dataframe
frame[:10]
change the index on a dataframe
frame=pd.DataFrame(data,index=[1,2,3,4,5,6]) frame you can also do frame=pd.DataFrame(data,index=['a','b',3,4,5,6]) state year pop 1 Ohio 2000 1.5 2 Ohio 2001 1.7 3 Ohio 2002 3.6 4 Nevada 2000 2.4 5 Nevada 2001 2.9 6 Nevada 2002 3.2
create dataframe with some index numbers of your choosing and column headers
frame2=pd.DataFrame(data,index=[0,2,3,4,6,4],columns=[‘year’,’state’,’pop’,’debt’])#columns are arranged in order
frame2
show a specific column of a dataframe
either works:
#frame2.year #notice the index has been overidden.
frame2[‘year’]
replace value on every row of a specific column
frame2['debt']=16.5 frame2 year state pop debt 1 2000 Ohio 1.5 16.5 2 2001 Ohio 1.7 16.5 3 2002 Ohio 3.6 16.5 4 2000 Nevada 2.4 16.5 5 2001 Nevada 2.9 16.5 6 2002 Nevada 3.2 16.5
add a bunch of values in series on dataframe for specific column
#how did we asign float number 1.0-6.0 to the debt. frame2['debt']=np.arange(1.0,7.0,1.0) frame2 year state pop debt 1 2000 Ohio 1.5 1.0 2 2001 Ohio 1.7 2.0 3 2002 Ohio 3.6 3.0 4 2000 Nevada 2.4 4.0 5 2001 Nevada 2.9 5.0 6 2002 Nevada 3.2 6.0