Lesson7 Numpy_Pandas analysis Flashcards

Question 1

Q

Create an array of 10 zeros and ensure they are integers.

Answer

A

np.zeros(10, dtype=’int’)

Question 2

Q

Create a matrix with a predefined value of 5.45 with 3 rows and 5 cols.

Answer

A

np.full((3,5),5.45)

Question 3

Q

Create an array of even space between 0 and 2. Do this for 5 numbers.

Answer

A

np.linspace(0, 2, 5)

Question 4

Q

create a 3x3 array with random numbers (0-1) with a normal distribution. Specify that they have a mean 0 and standard deviation 1.

Answer

A

np.random.normal(0, 1, (3,3))

Question 5

Q

Combine the following arrays x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = [21,21,21]

Answer

A

x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = [21,21,21]
np.concatenate([x, y,z])

Question 6

Q

Concatenate the grid array twice grid = np.array([[1,2,3],[4,5,6]]).

Answer

A

grid = np.array([[1,2,3],[4,5,6]])
np.concatenate([grid,grid])

Question 7

Q

Create a dataframe using a dictionary with the columns: Fruit and Items (the values list for items is 121,40,100,130,11] and the values for fruit Fruit’: [‘Peach’,’Apple’,’Pear’,’Plum’,’Kiwi’.

Answer

A

data = pd.DataFrame({‘Fruit’: [‘Peach’,’Apple’,’Pear’,’Plum’,’Kiwi’],
‘Items’:[121,40,100,130,11]})

Question 8

Q

How do you get complete information on the dataset

Answer

A

data.info()

Question 9

Q

Make a dataframe with the column name group, kg. Group values: ‘a’, ‘a’, ‘a’, ‘b’,’b’, ‘b’, ‘c’, ‘c’,’c’, kg values: 4, 3, 12, 6, 7.5, 8, 3, 5, 6

Answer

A

data = pd.DataFrame({‘group’:[‘a’, ‘a’, ‘a’, ‘b’,’b’, ‘b’, ‘c’, ‘c’,’c’],’kg’:[4, 3, 12, 6, 7.5, 8, 3, 5, 6]})

Question 10

Q

Sort the values in the data df by kg. Do this for ascending and change the original df.

data = pd.DataFrame({‘kg’: [‘a’,’a’,’a’,’b’,’b’,’b’,’c’,’c’,’c’], ‘kg values’: [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})

Answer

A

data.sort_values(by=[‘kg’],ascending=True,inplace=True)

Question 11

Q

Sort by multiple columns - do this for data. Sort group by ascending order and kg by descending order. Make sure you don’t modify the original dataset.

Answer

A

data.sort_values(by=[‘group’,’kg’],ascending=[True,False],inplace=False)

Question 12

Q

data = pd.DataFrame({‘names’:[‘Mila’]3 + [‘Igor’]4, ‘Age’:[3,2,1,3,3,4,4]})

remove duplicates

Answer

A

data.drop_duplicates()

Question 13

Q

Remove duplicate values from the name column

data = pd.DataFrame({‘names’:[‘Mila’]3 + [‘Igor’]4, ‘Age’:[3,2,1,3,3,4,4]})

Answer

A

data.drop_duplicates(subset=’names’)

Question 14

Q

for the farm shop df (data) create a new column animal 2 that shows the result of the meat to animal. Ensure they are all lowercase.

Answer

A

data[‘animal’] = data[‘food’].map(str.lower).map(meat_to_animal)

Question 15

Q

Remove animal 2 from dataset (series only).

Answer

A

data.drop(‘animal2’,axis=’columns’,inplace=True)

Question 16

Q

Make a new series using assign

Answer

A

data.assign(new_variable = data[‘kg’]*10)

Question 17

Q

Make a dataframe that has values 1-11, in a matrix of 3 rows and 4 columns. Use the index names

index=[‘London’, ‘Manchester’, ‘Brighton’],
columns=[‘one’, ‘two’, ‘three’, ‘four’])

Answer

A

data = pd.DataFrame(np.arange(12).reshape((3, 4)),
index=[‘London’, ‘Manchester’, ‘Brighton’],
columns=[‘one’, ‘two’, ‘three’, ‘four’])

Question 18

Q

Rename Manchester to Cardiff and in the columns one to one_p and two to two_p for the dataframe data. Make sure to change the original df.

Answer

A

data.rename(index = {‘Manchester’:’Cardiff’}, columns={‘one’:’one_p’,’two’:’two_p’},inplace=True)

Question 19

Q

convert the index to capital letters and columns to title.

Answer

A

data.rename(index = str.upper, columns=str.title,inplace=True)

Question 20

Q

Create categories for this variable ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]. Use the bins bins = [18, 25, 35, 60, 100]

Answer

A

categories = pd.cut(ages, bins)

Question 21

Q

Include the left bin value

Answer

A

pd.cut(ages,bins,right=False)

Question 22

Q

See how many observations (the frequency or count of observations that belong to each bin) fall under each bin. Do this for the categories variable.

Answer

A

pd.value_counts(categories)

Question 23

Q

Add unique name to each category then check how many observations fall under each bin. bin_names = [‘Youth’, ‘Early 20s’, ‘Middle Age’, ‘Senior’]

Answer

A

bin_names = [‘Youth’, ‘Early 20s’, ‘Middle Age’, ‘Senior’]
new_cats = pd.cut(ages, bins,labels=bin_names)

pd.value_counts(new_cats)

Question 24

Q

Create a df date starting from 20210701 with a length of 7 periods. Then create a pandas DataFrame with 7 rows and 4 columns, with random values generated from a normal distribution the row index is set to the ‘dates’ variable created above and the columns are labeled ‘A’, ‘B’, ‘C’, and ‘D’

Answer

A

dates = pd.date_range(‘20210701’,periods=7)
df = pd.DataFrame(np.random.randn(7,4),index=dates,columns=list(‘ABCD’))
df

Question 25

Q

Get the first 3 rows from the df

Question 26

Q

Slice df based on date range 20210703 to 20210705

Answer

A

df[‘20210703’:’20210705’]

Question 27

Q

Slice df on the column names A and B

Answer

A

df.loc[:,[‘A’,’B’]]

Question 28

Q

Slice df based on the dates 20210703 to 20210705 and the column names A and B.

Answer

A

df.loc[‘20210701’:’20210705’,[‘A’,’B’]]

Question 29

Q

Slice the df based on the second index of row

Answer

A

df.iloc[2]

Question 30

Q

Return a specific range of rows based on index. Return the rows 2-4 for the first two columns.

Answer

A

df.iloc[2:4, 0:2]

Question 31

Q

Return specific rows (second and sixth row) and columns (first and third) using lists containing columns or row indexes.

Answer

A

df.iloc[[1,5],[0,2]]

Question 32

Q

Copy the dataframe df and add a new column E. Name it df2.

Answer

A

df2 = df.copy()
df2[‘E’]=[‘one’, ‘one’,’two’,’three’,’four’,’three’,’two’]

Question 33

Q

Select rows based on column values. Select anything from column E that are in the rows that contain two or four. Use df2.

Answer

A

df2[df2[‘E’].isin([‘two’,’four’])]

Question 34

Q

select all rows in column E except those with two and four. Use the df df2.

Answer

A

df2[~df2[‘E’].isin([‘two’,’four’])]

Question 35

Q

Make a series which has random integers from range 1-10 with total of 40 numbers. Then make a dataframe using this series and change it to 8 rows and 5 columns.

Answer

A

ser = pd.Series(np.random.randint(1, 10, 40))
df = pd.DataFrame(ser.values.reshape(8,5))

Question 36

Q

Create a dataframe of two column headings called name and age where the values for the names and ages are:

names = [‘Alice’, ‘Bob’, ‘Charlie’]
ages = [25, 30, 35]

Answer

A

names = [‘Alice’, ‘Bob’, ‘Charlie’]
ages = [25, 30, 35]

Create DataFrame
df = pd.DataFrame({‘Name’: names, ‘Age’: ages})