Pandas Flashcards
Pandas
Pandas, and in particular its Series and DataFrame objects, builds on the NumPy array structure and provides efficient access to these sorts of “data munging” tasks that occupy much of a data scientist’s time.”
Pandas Series
- A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and a Series is a structure that maps typed keys to a set of typed values.
- It is like a column in a table. It is a one-dimensional array holding data of any type.
Dataframe
• this is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.
CSV
- A simple way to store big data sets it to use CSV files.
* Csv files contain plain text and is a well know format that can be read by everyone including pandas.
Concat
• This combines all dataframes into one. Rows are added to create a bigger table.
Merge
• Merge is a way to combine dtaFrames column-wise, result is wider and with more columns. This is similar to a join in SQL.
Inner join
• get matching values only from both dataframes. Non-matching values will be dropped.
Outer Join
• this will combine everything from both dataframes.Will contain matching and non-matching values.
Left/Right Join
• This will include every value from the left/right dataframe and pull in matching items from the other dataframe
Groupby
• This can be used to run computations against different groups of rows.
Pivot
• This can help collapse data spread across multiple rows into a single row with data now spread across columns.