Pandas Flashcards
https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html
What is Pandas?
It is a newer package built on top of NumPy, providing an efficient implementation of a DataFrame. These are multi-dimensional arrays with attached row and column labels, often allowing heterogeneous types.
What are advantages of Pandas over NumPy?
Pandas provides efficient access to these sorts of “data munging” tasks that occupy much of a data scientist’s time and for which NumPy is not sufficient.
NumPy limitations become clear when we need more flexibility (e.g., attaching labels to data, working with missing data, etc.) and when attempting operations that do not map well to element-wise broadcasting (e.g., groupings, pivots, etc.), each of which is an important piece of analyzing the less structured data available in many forms in the world around us
What are the three fundamental Pandas data structures?
Series, DataFrame and Index.
What is a Pandas Series?
A Pandas Series is a one-dimensional array of indexed data values. The Series wraps a sequence of values and indices.
How do you create a Pandas Series?
It can be created from a list or array as follows:
From list:
data = pd.Series([0.25, 0.5, 0.75, 1.0])
From array
x = np.arange(4,10)
data = pd.Series(x)
What does the values attribute of a Series return?
A NumPy array of the data values
What does the index attribute of a Series return?
An array-like object of type pd.Index containing the index of the Series.
How can you access elements of a Series?
Use the associated index e.g. for a Series x
x[0] # returns first element
x[0:3] # returns first 3 elements along with their index
What is the essential difference between a Series and a NumPy array?
The presence of the index: while the Numpy Array has an implicitly defined integer index used to access the values, the Pandas Series has an explicitly defined index associated with the values.
Does a Series index need to be an integer?
No, it can be values of any type e.g.
data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=[‘a’, ‘b’, ‘c’, ‘d’])
Can a Series be thought of as a specialization of a Python dictionary?
Yes. A Series is a structure that maps typed keys to a set of typed values. This makes it more efficient than Dictionaries for certain operations.
What is a Pandas DataFrame?
A Pandas DataFrame is a two-dimensional array with flexible row indices and flexible column names.
Can think of DF as a sequence of aligned Series objects, where they share the same index.
What does the index attribute of a DataFrame return?
An Index object containing the index of the DF.
What does the columns attribute of a DataFrame return?
An Index object containing the column labels of the DF.
How can a Pandas DataFrame be created from a Series?
pd.DataFrame(population, columns=[‘population’])
Where population is a Series