notebooks Flashcards
class
can be thought of as a type. it determines what you can do with a variable and what operators you can use with this class. it will define what all operators mean when applied to objects.
object
an instance of a class. every variable in your program is in fact an object.
type() function
tells us about what is the class of this object. it also works for functions and methods.
encapsulation
groups the variables and function that are highly related.
abstraction
makes the interface of an object simpler and reduces the impact of change.
inheritance
mechanism to reduce redundant code. functions or states that are shared among objects, can be reused by inheriting from classes.
polymorphism
something occurs in several different forms. eg., you can call the same functional call payout_salary() based on the different object.
python list
an object of class list.
pandas dataframes
like Excel sheets.
constructor
a function that helps you create new objects of this class. each class usually has a constructor.
instance method
a function that applies to a object. you call this function like this variable.method().
txt.lower()
a function that puts a string variable in lowercase.
txt.upper()
a function that puts a string variable in uppercase. the equivalent of having a function: def upper(string) and calling upper(txt).
txt.isalpha()
returns True if all characters in the string are in the alphabet.
txt.strip()
returns a trimmed version of the string. eg. txt.strip(‘I’) removes ‘I’ from I am sterdam.
class method
it is called by using classname.method(). eg. for datetime.today(), datetime is the name of the class and today is the name of the method.
pandas
a very popular python library, specialized in handling spreadsheet data. like a true panda, it handles data by chewing it relentlessly and effortlessly. the name comes from panel datas.
read_csv()
a function from the module pandas, noted as pandas.read_csv(). it is used to read data from a text file where each line is a row of our spreadsheet. columns are separated by a character, eg. ‘\t’. in this case, write sep= ‘\t’.
df
an object with the class DataFrame. it has 360 methods you can call.
head()
a method that can be called to display the first N rows of the spreadsheet by typing df.head().
df.types
an attribute of the object df. it is a value and indicates the type of data in each column.
df[condition]
only shows the rows where the condition is True. a condition can be eg. df[‘item_name’ == ‘Izze’. it is true for rows where the value in columns item_name is equal to the string ‘Izze’.
apply method
applies a function to each element of a column. the results overwrite the current data in the column.
groupby
a method of df. it groups all rows with the same value for a given variable, eg. item_name. df.groupby(‘item_name’) is an object of class pd.DataFrameGroupBy. it has 134 methods.
sum
a method of df.groupby(‘item_name’). it groups all rows by item_name. it is a new spreadsheet with the same column and one row per item_name value. for each column it aggregates the value using sum() (sums all values).
sort_values
a method used to sort the rows of a spreadsheet according to the values in a column.
describe
a method of the object by_order. it gives you basic statistics, per column.
encapsulation
refers to data security.
abstraction
hidden complexity.
polymorphism
code reusability through the same method call but with a different response.
inheritance
code reusability through subclasses to inherit characteristics of parents.
‘self’
used to represent the instance of the class itself. by convention, the argument is named self, although you can name it differently if you wish. the purpose is to allow methods to access and modify the attributes and methods of the instance to which they belong. it refers to the initialising of an object.
’-‘ for two objects from class datetime
works as ‘how long between those two dates and times?’
’+’ for a datetime object and timedelta object
works as ‘what time would it be if we add … to a date?’
data granularity
a measure of the level of detail in a data structure.
pd.cut
a function from the module pd. it takes a list of values and the limits of our categories.
sample method
returns a random sample from the dataframe.
count()
counts the number of rows in each category.
.plot.pie method
used to plot a pie chart.
concat method
used to add dataframes together.
axis
using the keys from the column or rows to concatenate. by default, we concatenate rows by matching columns.
inner join
only takes as result the subset of both data frames that have matching indexes.
pivoting
the reorganization of a data frame by means of aggregation of selected columns values as rows in the new data frame.
NaN
stands for Not a Number, but means missing number. different forms exist based on the data type, eg. NaT.
None
a value that there is no object, usually the return of a function (there was no return value). None is not the same as 0, False, or an empty object.
fillna()
a function that can be used to replace missing values.
left_on
refers to a parameter used in the merging or joining of two DataFrames. used to specify the column or columns from the left DataFrame (the one whose rows you want to keep all of) that will be used for the merge operation.
isna() function
tells you whether values have been set to NaN and returns True or False:
notna()
a method used to identify non-missing values within a Series or DataFrame. It returns True or False indicating whether each element in the Series or DataFrame is not missing.
fillna()
a method used to fill missing (NaN) values in a DataFrame or Series with a specified value or using a specified method.
np.nan
represents a missing or undefined value in numpy.
drop() function
a method used to remove rows or columns from a DataFrame.
loc function
used for selecting data from a DataFrame and is label-based. you use the index and column names to select data.
iloc function
used for selecting data from a DataFrame and is integer-based. you use the integer positions to select data. in df.iloc[0, 1], 0 indicates the row and 1 indicates the column.
duplicated()
a function used to identify duplicate rows in a DataFrame. it returns True or False.
is_unique
shows whether the indexes in a dataframe are unique.
matlab
used to build figures in Python on a canvas. the canvas has subplots that can contain x and y-axes.
set_facecolor(‘color’)
can make the canvas on a figure a specified colour.
np.linspace function
it is used to create an array of evenly spaced numbers over a specified range. results in a number of points on the figure we printed.
plt.plot
a function used to create a basic line plot. plot(x3, np.sin(x), color = red) indicates a red sinus figure which uses x3 as the x-values.
plt.axes
a function used to create or modify an axes object within a figure. an axes object represents an individual plot within a figure.
plt.xlim/plt.ylim
a function used to set the limits for the x- or y-axis of a plot. specifically, it sets the minimum and maximum values for the x- or y-axis range. eg. plt.xlim(0, 10) sets the minimum limit of the x-axis to 0 and the maximum limit to 10.
set_xlabel/set_ylabel
changes the label of the axis. eg., ax4.set_xlabel(‘x label’)
set_title
changes the title of the figure. eg,. ax4.set_title(‘graph’)
legend()
adds a legend to the figure. eg., ax4.legend()