Data Input and Validation Flashcards

Question

What does sort_index() do ?

Answer 1

Sort index allows for all the items to be sorted by that index. The advantage of this is that when you have a particularly large data set, sorting the index reduces the time to access any subset of that data. You can sort objects by a label along the axis DataFrame.sort_index(axis = 0, level = None, ascending = True, inplace = False, .. by = None)

Answer 2

"loc[]" is a label-based indexer, that means you are selecting by the labels. And notice that "loc[]" uses square brackets and not regular brackets. "loc[]" will raise the KeyError when items are not found DataFrame.loc[] DataFrame[' Series '].loc[]

Answer 3

Here with Iloc, we're doing selection by integer index. Iloc is primarily integer position based. One of the advantages of Iloc is that it allows for the traditional Pythonic slicing

Answer 4

is to use a list. iloc, For example, I might want the integer index 1542, 2390, 6000, and 15000. This will return the rows corresponding to these index. df.iloc[[ 1 , 4, 5, 10]]

Answer 5

df.iloc[1;4]

Answer 6

Groupby does three things. It splits a DataFrame into groups based on some criteria, it applies a function to each group independently and it combines the results into a DataFrame The Groupby object isn't a DataFrame but rather a group of DataFrames in a dict-like structure So, Groupby splits the DataFrame into groups. Each of these groups remember is a DataFrame, it applies a function for each group and then finally it combines the results into a DataFrame

Answer 7

pandas.core.groupby.generic.DataFrameGroupBy

Answer 8

for group_key, group_value in oo.groupby('Edition'): print(group_key) print(group_value)

Answer 9

``` GroupBy.size() GroubBy.count() groupby.first() / groupby.last() groupby.head() / groupby.tail() groupby.mean() groupby.max() / groupby.min() ```

Answer 10

Instructions for aggregation are provided in the form of a python dictionary or a list. And the dictionary keys are where you specify which series or columns in your data frame you want to perform the operations and the actual dictionary values specify the function to run. You can also pass custom functions to the list of aggregated calculations and each will be passed the values from the column in your grouped data. Groupby is a very useful Pandas function and it's worth your time making sure you understand how to use it. DataFrame.groupby(agg( {..:[ ...] } )) DataFram.groupby(agg([...[))

Answer 11

oo. loc[oo['Athlete'] == 'LEWIS, Carl'].groupby('Athlete').agg({'Edition' : ['min','max','count']}) oo. groupby(['NOC']).agg({'Edition' : ['min','max','size']})

Answer 12

stack and unstack functions that are very helpful, especially when used in conjunction with group by. The stack function allows you to move the inner columns to the rows for the dataframe and the unstack function does the reverse. The stack function helps you to reshape the dataframe.

Answer 13

When using the stack function, the stack function returns a data frame or a series. The inner levels of a stack function are sorted. So when we do a stack we are returning a data frame or series with a new innermost level of rules.

Answer 14

``` x[start:stop:step] x = np.arange(10) # First five elements print(x[:5]) # Elements after index 5 print(x[5:]) # Middle print(x[4:7]) Multidimensional slices work in the same way, with multiple slices separated by com‐ mas. For example x2[:, 0] # first column of x2 x2[:3, ::2] # all rows, every other column ```

Answer 15

``` print(x2) [[3 5 2 4] [7 6 8 8] [1 6 7 7]] x2_sub = x2[:,0] array([3, 7, 1]) x2_sub[:] = 0 print(x2) [[0 5 2 4] [0 6 8 8] [0 6 7 7]] ```

Answer 16

``` x2_sub_copy = x2[:2, :2].copy() [[99 5] [ 7 6]] x2_sub_copy[0, 0] = 42 print(x2_sub_copy) [[42 5] [ 7 6]] ```

Answer 17

``` x = np.arange(10) y = np.arange(10) np.concatenate([x,y]) array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) grid = np.arange(10).reshape(2,5) np.concatenate([grid,grid]) # concatenate along the first axis array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) ```

Answer 18

grid = ([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) np.concatenate([grid,grid], axis = 1) array([[0, 1, 2, 3, 4, 0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 5, 6, 7, 8, 9]])

Answer 19

Concatenation, or joining of two arrays in NumPy, is primarily accomplished through the routines np.concatenate , np.vstack , and np.hstack . np.concatenate takes a tuple or list of arrays as its first argument, as we can see here x = np.array([1, 2, 3]) y = np.array([3, 2, 1]) np.concatenate([x, y]) array([1, 2, 3, 3, 2, 1])

Answer 20

z = [99, 99, 99] print(np.concatenate([x, y, z])) [ 1 2 3 3 2 1 99 99 99]

Answer 21

grid = np.array([[1, 2, 3], | [4, 5, 6]])

Answer 22

``` array([[1, 2, 3], [4, 5, 6], [1, 2, 3], [4, 5, 6]]) # concatenate along the second axis (zero-indexed) np.concatenate([grid, grid], axis=1) array([[1, 2, 3, 1, 2, 3], [4, 5, 6, 4, 5, 6]]) ```

Answer 23

``` x = np.array([1, 2, 3]) grid = np.array([[9, 8, 7], [6, 5, 4]]) # vertically stack the arrays np.vstack([x, grid]) Out[48]: array([[1, 2, 3], [9, 8, 7], [6, 5, 4]]) # horizontally stack the arrays y = np.array([[99], [99]]) np.hstack([grid, y]) Out[49]: array([[ 9, 8, 7, 99], [ 6, 5, 4, 99]]) ```

Answer 24

The opposite of concatenation is splitting, which is implemented by the functions np.split , np.hsplit , and np.vsplit . For each of these, we can pass a list of indices giving the split points x = [1, 2, 3, 99, 99, 3, 2, 1] x1, x2, x3 = np.split(x, [3, 5]) print(x1, x2, x3) [1 2 3] [99 99] [3 2 1] Notice that N split points lead to N + 1 subarrays. The related functions np.hsplit and np.vsplit are similar:

Answer 25

``` array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) upper, lower = np.vsplit(grid, [2]) print(upper) print(lower) [0 1 2 3] [4 5 6 7] ``` [[ 8 9 10 11] [12 13 14 15]]

Answer 26

``` + np.add Addition (e.g., 1 + 1 = 2 ) - np.subtract Subtraction (e.g., 3 - 2 = 1 ) - np.negative Unary negation (e.g., -2 ) * np.multiply Multiplication (e.g., 2 * 3 = 6 ) / np.divide Division (e.g., 3 / 2 = 1.5 ) // np.floor_divide Floor division (e.g., 3 // 2 = 1 ) ** np.power Exponentiation (e.g., 2 ** 3 = 8 ) % np.mod Modulus/remainder (e.g., 9 % 4 = 1 ```

Answer 27

x = np.array([-2, -1, 0, 1, 2]) np.absolute(x) array([2, 1, 0, 1, 2])

Answer 28

``` theta = np.linspace(0, np.pi, 3) : print("theta = ", theta) print("sin(theta) = ", np.sin(theta)) print("cos(theta) = ", np.cos(theta)) print("tan(theta) = ", np.tan(theta)) ``` ``` theta = [ 0. 1.57079633 3.14159265] sin(theta) = [ 0.00000000e+00 1.00000000e+00 1.22464680e-16] cos(theta) = [ 1.00000000e+00 6.12323400e-17 -1.00000000e+00] tan(theta) = [ 0.00000000e+00 1.63312394e+16 -1.22464680e-16] ```

Answer 29

``` # Gamma functions (generalized factorials) and related functions x = [1, 5, 10] print("gamma(x) =", special.gamma(x)) print("ln|gamma(x)| =", special.gammaln(x)) print("beta(x, 2) =", special.beta(x, 2)) ```

Answer 30

x = np.arange(5) y = np.empty(5) np.multiply(x, 10, out=y) print(y)

Data Input and Validation Flashcards

read_csv() shape head() tail() info() (54 cards)