Python Pandas Flashcards

Question

What is Time Series in Pandas?

Answer 1

The Time series data is defined as an essential source for information that provides a strategy that is used in various businesses. From a conventional finance industry to the education industry, it consists of a lot of details about the time. Time series forecasting is the machine learning modeling that deals with the Time Series data for predicting future values through Time Series modeling.

Answer 2

The offset specifies a set of dates that conform to the DateOffset. We can create the DateOffsets to move the dates forward to valid dates.

Answer 3

The Time Periods represent the time span, e.g., days, years, quarter or month, etc. It is defined as a class that allows us to convert the frequency to the periods.

Answer 4

The below code demonstrates how to convert the string to date: fromdatetime import datetime ``` # Define dates as the strings dmy_str1 = 'Wednesday, July 14, 2018' dmy_str2 = '14/7/17' dmy_str3 = '14-07-2017' ``` ``` # Define dates as the datetime objects dmy_dt1 = datetime.strptime(date_str1, '%A, %B %d, %Y') dmy_dt2 = datetime.strptime(date_str2, '%m/%d/%y') dmy_dt3 = datetime.strptime(date_str3, '%m-%d-%Y') ``` ``` #Print the converted dates print(dmy_dt1) print(dmy_dt2) print(dmy_dt3) ``` Output: 2017-07-14 00:00:00 2017-07-14 00:00:00 2018-07-14 00:00:00

Answer 5

The main task of Data Aggregation is to apply some aggregation to one or more columns. It uses the following: sum: It is used to return the sum of the values for the requested axis. min: It is used to return a minimum of the values for the requested axis. max: It is used to return a maximum values for the requested axis.

Answer 6

Pandas Index is defined as a vital tool that selects particular rows and columns of data from a DataFrame. Its task is to organize the data and to provide fast accessing of data. It can also be called a Subset Selection.

Answer 7

Multiple indexing is defined as essential indexing because it deals with data analysis and manipulation, especially for working with higher dimensional data. It also enables us to store and manipulate data with the arbitrary number of dimensions in lower-dimensional data structures like Series and DataFrame.

Answer 8

Reindexing is used to change the index of the rows and columns of the DataFrame. We can reindex the single or multiple rows by using the reindex() method. Default values in the new index are assigned NaN if it is not present in the DataFrame. DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)

Answer 9

We can set the index column while making a data frame. But sometimes, a data frame is made from two or more data frames, and then the index can be changed using this method.

Answer 10

The Reset index of the DataFrame is used to reset the index by using the 'reset_index' command. If the DataFrame has a MultiIndex, this method can remove one or more levels.

Answer 11

In Pandas, there are different useful data operations for DataFrame, which are as follows: Row and column selection We can select any row and column of the DataFrame by passing the name of the rows and columns. When you select it from the DataFrame, it becomes one-dimensional and considered as Series. Filter Data We can filter the data by providing some of the boolean expressions in DataFrame. Null values A Null value occurs when no data is provided to the items. The various columns may contain no values, which are usually represented as NaN.

Answer 12

In Pandas, groupby() function allows us to rearrange the data by utilizing them on real-world data sets. Its primary task is to split the data into various groups. These groups are categorized based on some criteria. The objects can be divided from any of their axes. DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)

Answer 13

Questions were sourced from: https://www.javatpoint.com/python-pandas-interview-questions

Answer 14

>>> df.loc['c': , :'z'] # rows 'c' and onwards AND columns up to 'z' x y z c 10 11 12 d 15 16 17 e 20 21 22 >>> df.iloc[:, 3] # all rows, but only the column at index location 3 a 3 b 8 c 13 d 18 e 23

Answer 15

reset_index(drop=True) to drop the index column. This method resets the index and discards the existing index column

Answer 16

df = pd.DataFrame.from_dict({ 'Name': ['Jane', 'Nik', 'Kate', 'Melissa'], 'Age': [10, 35, 34, 23] }).set_index('Name')

Answer 17

df = df.drop(columns=['column_name']) df = df.drop(df.columns[1], axis=1)

Answer 18

df.drop_duplicates(subset=['Name']) (for only the duplicates in the Name column)

Answer 19

.loc[] (Label-Based Indexing): .loc[] allows you to access rows and columns using labels (such as column names or index labels). Syntax: df.loc[row_label, column_label] Examples: To select a specific row by label: df.loc[2] (selects the third row) To filter rows based on a condition: df.loc[df['Age'] > 30] To update a specific cell value: df.loc[1, 'Name'] = 'Kate' .ix[ is deprecated

Answer 20

.iloc[] (Position-Based Indexing): .iloc[] is used for integer-based indexing. It allows you to access rows and columns by their position (integer index). Syntax: df.iloc[row_index, column_index] Examples: To select the second row: df.iloc[1] To slice rows and columns: df.iloc[1:4, 0:2] To update a specific cell value: df.iloc[0, 2] = 42

Answer 21

numpy.intersect1d(ar1, ar2, assume_unique=False, return_indices=False)

Answer 22

numpy.union1d(ar1, ar2)

Answer 23

numpy.percentile(a, q, axis=None, out=None, overwrite_input=False, method='linear', keepdims=False, *, interpolation=None)

Answer 24

The Generator provides access to a wide range of distributions, and served as a replacement for RandomState. The main difference between the two is that Generator relies on an additional BitGenerator to manage state and generate the random bits, which are then transformed into random values from useful distributions.

Answer 25

(1) np.array (2) np.zeros((x, y)) / np.ones((x, y)) (3) np.arange(x) / np.arange(x).reshape(y, z)

Answer 26

np.set_printoptions(threshold=sys.maxsize) # sys module should be imported

Answer 27

When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise one

Answer 28

as methods, e.g. a.sum() (a is a np array)

Answer 29

"universal functions” (ufunc). ``` B = np.arange(3) np.exp(B) --> array([1. , 2.71828183, 7.3890561 ]) np.sqrt(B) --.> array([0. , 1. , 1.41421356]) ```

Answer 30

Yes, much like lists and other Python sequences.

Answer 31

dots (...) --> represent as many colons as needed to produce a complete indexing tuple. For example, if x is an array with 5 axes, then x[1, 2, ...] is equivalent to x[1, 2, :, :, :],

Answer 32

for element in b.flat: ...

Answer 33

a.shape a.ravel() a.T

Answer 34

The reshape function returns its argument with a modified shape, whereas the ndarray.resize method modifies the array itself:

Answer 35

hstack and vstack

Answer 36

np.hsplit(a, (3, 4)) --> [array([[6., 7., 6.], [8., 5., 5.]]), array([[9.], [7.]]), array([[0., 5., 4., 0., 6., 8., 5., 2.], [1., 8., 6., 7., 1., 8., 1., 0.]])]

Answer 37

a[:, newaxis] --> array([[4.], [2.]])

Answer 38

c = a.view() c is a --> False c.base is a --> True

Answer 39

np.concatenate((a, b), axis=0) --> axis=0 is vertical, 1 is horizontal, 0 is 1dim (series)

Answer 40

* (SVD) is a fundamental matrix factorization technique used in linear algebra and numerical computing * decomposes a given matrix A into three matrices: U, S, and V^H (the conjugate transpose of V). * For a 2D matrix A, the SVD factorization is expressed as: [ A = U \Sigma V^H ] where: * U is a unitary matrix (with orthonormal columns). * Σ (Sigma) is a diagonal matrix containing the singular values of A. * V^H is the conjugate transpose of another unitary matrix V.

Python Pandas Flashcards

(69 cards)