# Chapter 1: Pandas Foundations Flashcards

Question

int

Answer 1

The NumPy integer type, which does not support missing values

Answer 2

pandas nullable integer type

Answer 3

The NumPy type for storing strings (and mixed types)

Answer 4

pandas categorical type, which does support missing values

Answer 5

The NumPy Boolean type, which does not support missing values (None becomes False, np.nan becomes True)

Answer 6

pandas nullable Boolean type

Answer 7

The NumPy date type, which does support missing values (NaT)

Answer 8

This returns a Series with the data type of each column.

Answer 9

Return the dtype object of the underlying data.

Answer 10

returns the counts of the data type of every column.

Answer 11

This method prints information about a DataFrame including the index dtype and columns, non-null values and memory usage.

Answer 12

In Pandas, the value_counts() method is used to count the frequency of unique values in a Pandas Series, sorted in descending order of frequency.. However, you can also use the value_counts() method with a Pandas DataFrame to count the frequency of unique values in a specific column or across multiple columns.

Answer 13

To use value_counts() with a Pandas DataFrame, you first need to select the column or columns you want to count the frequency of unique values for. You can do this using bracket notation or dot notation, depending on the column name.

Answer 14

returns a Series(that has the same index as the DataFrame)

Answer 15

Dataframe.column_name vs DataFrame['column_name']

Answer 16

- .loc is used for label-based indexing, which means that you use column and row labels to select data. For example, if you have a DataFrame with a column labeled 'Name' and a row labeled 'A', you can select the data at that intersection using .loc['A', 'Name']. - .iloc is used for integer-based indexing, which means that you use integer positions to select data. For example, you can select the first row and second column of a DataFrame using .iloc[0, 1].

Answer 17

Return the number of elements in the underlying data.

Answer 18

how many series attributes and methods are there?

Answer 19

which attributes and methods do Series and Dateframes have in common

Answer 20

- Get 5 random items from the director Series. - The random_state parameter is set to 42, which means that the same 5 elements will be selected from the Series each time the code is run. - a seed value for the random number generator used to sample the items (default is None)

Answer 21

which of the methods will be the most useful.

Answer 22

- size - shape - len(series)

Answer 23

Return unique values of Series object.

Answer 24

Return number of non-NA/null observations in the Series.

Answer 25

.min, .max, .mean, .median, and .std

Answer 26

- Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values. - Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types.

Answer 27

- Return value at the given quantile. - if you pass in a scaler, you will get scalar output, but if you pass in a list, the output is a pandas Series:

Answer 28

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Answer 29

Fill NA/NaN values using the specified method.

Answer 30

Return a new Series with missing values removed.

Answer 31

Passing normalize=True as an argument to value_counts() returns the relative frequencies or proportions of each unique value instead of their counts. The resulting Series contains the percentage of occurrences of each unique value.

Answer 32

Relative frequencies, also known as proportions or percentages, are a way of expressing the frequency of an event or value in relation to the total number of events or values in a sample.

Answer 33

In statistics, a relative frequency is calculated as the number of times an event or value occurs divided by the total number of events or values in the sample. The resulting proportion represents the fraction or percentage of the sample that the event or value represents. For example, if we have a sample of 100 people and 20 of them are male, the relative frequency of males in the sample would be 20/100 = 0.2, or 20%. This means that males make up 20% of the sample.

Answer 34

Relative frequencies are useful for comparing the occurrence of different events or values in a sample or population, and for identifying patterns and trends in data. They can also be used to make predictions about future occurrences based on past data.

Answer 35

Series.hasnans

Answer 36

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values

Answer 37

Adds 1 to each element of the series.

Answer 38

multiplies 2.5 by each element of the series.

Answer 39

The // operator performs integer (floor) division, which returns the largest integer that is less than or equal to the result of the division. This means that it truncates any decimal portion of the result.

Answer 40

The percent sign (%) is the modulus operator, which returns the remainder after a division

Answer 41

Each comparison operator turns each value in the Series to True or False based on the outcome of the condition. The result is a Boolean array.

Answer 42

pandas addition method. The same as imdb_score + 1

Answer 43

pandas > method. The same as imdb_score > 7

Answer 44

- Using the method rather than the operator can be useful when we chain methods together. - Methods, on the other hand, can have parameters that allow you to alter their default functionality.

Answer 45

the .sub method allows you to specify a fill_value parameter to use in place of missing values.

Answer 46

+,-,*,/,//,%,**

Answer 47

.add, .sub, .mul, .div, .floordiv, .mod, .pow

Answer 48

<,>,<=,>=,==,!=

Answer 49

.lt, .gt, .le, .ge, .eq, .ne

Answer 50

whenever the multiplication operator is used.

Answer 51

dunder methods

Answer 52

No. The operator is just syntactic sugar for the special method.

Answer 53

Yes, the mul() method has additional parameters.

Answer 54

It is sequential invocation of methods using attribute access.

Answer 55

Because In Python, every variable points to an object, and many attributes and methods return new objects.

Answer 56

Because int does not support missing values.

Answer 57

- We fill the missing values with zeroes. - We use astype(Int64) for the conversion

Answer 58

You use parenthesis like: ( fb_likes.fillna(0) .astype('Int64') .head() )

Answer 59

The .pipe method on a Series needs to be passed a function that accepts a Series as input and can return anything

Answer 60

Using dictionaries col_map = { "director_name" : "director" "num_critic_for_reviews" : "critic_reviews" } movies.rename(columns=col_map)

# Chapter 1: Pandas Foundations Flashcards

(86 cards)