Lesson7 Numpy_Pandas analysis Flashcards

1
Q

Create an array of 10 zeros and ensure they are integers.

A

np.zeros(10, dtype=’int’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Create a matrix with a predefined value of 5.45 with 3 rows and 5 cols.

A

np.full((3,5),5.45)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Create an array of even space between 0 and 2. Do this for 5 numbers.

A

np.linspace(0, 2, 5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

create a 3x3 array with random numbers (0-1) with a normal distribution. Specify that they have a mean 0 and standard deviation 1.

A

np.random.normal(0, 1, (3,3))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Combine the following arrays x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = [21,21,21]

A

x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = [21,21,21]
np.concatenate([x, y,z])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Concatenate the grid array twice grid = np.array([[1,2,3],[4,5,6]]).

A

grid = np.array([[1,2,3],[4,5,6]])
np.concatenate([grid,grid])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Create a dataframe using a dictionary with the columns: Fruit and Items (the values list for items is 121,40,100,130,11] and the values for fruit Fruit’: [‘Peach’,’Apple’,’Pear’,’Plum’,’Kiwi’.

A

data = pd.DataFrame({‘Fruit’: [‘Peach’,’Apple’,’Pear’,’Plum’,’Kiwi’],
‘Items’:[121,40,100,130,11]})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you get complete information on the dataset

A

data.info()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Make a dataframe with the column name group, kg. Group values: ‘a’, ‘a’, ‘a’, ‘b’,’b’, ‘b’, ‘c’, ‘c’,’c’, kg values: 4, 3, 12, 6, 7.5, 8, 3, 5, 6

A

data = pd.DataFrame({‘group’:[‘a’, ‘a’, ‘a’, ‘b’,’b’, ‘b’, ‘c’, ‘c’,’c’],’kg’:[4, 3, 12, 6, 7.5, 8, 3, 5, 6]})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sort the values in the data df by kg. Do this for ascending and change the original df.

data = pd.DataFrame({‘kg’: [‘a’,’a’,’a’,’b’,’b’,’b’,’c’,’c’,’c’], ‘kg values’: [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})

A

data.sort_values(by=[‘kg’],ascending=True,inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sort by multiple columns - do this for data. Sort group by ascending order and kg by descending order. Make sure you don’t modify the original dataset.

A

data.sort_values(by=[‘group’,’kg’],ascending=[True,False],inplace=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

data = pd.DataFrame({‘names’:[‘Mila’]3 + [‘Igor’]4, ‘Age’:[3,2,1,3,3,4,4]})

remove duplicates

A

data.drop_duplicates()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Remove duplicate values from the name column

data = pd.DataFrame({‘names’:[‘Mila’]3 + [‘Igor’]4, ‘Age’:[3,2,1,3,3,4,4]})

A

data.drop_duplicates(subset=’names’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

for the farm shop df (data) create a new column animal 2 that shows the result of the meat to animal. Ensure they are all lowercase.

A

data[‘animal’] = data[‘food’].map(str.lower).map(meat_to_animal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Remove animal 2 from dataset (series only).

A

data.drop(‘animal2’,axis=’columns’,inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Make a new series using assign

A

data.assign(new_variable = data[‘kg’]*10)

17
Q

Make a dataframe that has values 1-11, in a matrix of 3 rows and 4 columns. Use the index names

index=[‘London’, ‘Manchester’, ‘Brighton’],
columns=[‘one’, ‘two’, ‘three’, ‘four’])

A

data = pd.DataFrame(np.arange(12).reshape((3, 4)),
index=[‘London’, ‘Manchester’, ‘Brighton’],
columns=[‘one’, ‘two’, ‘three’, ‘four’])

18
Q

Rename Manchester to Cardiff and in the columns one to one_p and two to two_p for the dataframe data. Make sure to change the original df.

A

data.rename(index = {‘Manchester’:’Cardiff’}, columns={‘one’:’one_p’,’two’:’two_p’},inplace=True)

19
Q

convert the index to capital letters and columns to title.

A

data.rename(index = str.upper, columns=str.title,inplace=True)

20
Q

Create categories for this variable ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]. Use the bins bins = [18, 25, 35, 60, 100]

A

categories = pd.cut(ages, bins)

21
Q

Include the left bin value

A

pd.cut(ages,bins,right=False)

22
Q

See how many observations (the frequency or count of observations that belong to each bin) fall under each bin. Do this for the categories variable.

A

pd.value_counts(categories)

23
Q

Add unique name to each category then check how many observations fall under each bin. bin_names = [‘Youth’, ‘Early 20s’, ‘Middle Age’, ‘Senior’]

A

bin_names = [‘Youth’, ‘Early 20s’, ‘Middle Age’, ‘Senior’]
new_cats = pd.cut(ages, bins,labels=bin_names)

pd.value_counts(new_cats)

24
Q

Create a df date starting from 20210701 with a length of 7 periods. Then create a pandas DataFrame with 7 rows and 4 columns, with random values generated from a normal distribution the row index is set to the ‘dates’ variable created above and the columns are labeled ‘A’, ‘B’, ‘C’, and ‘D’

A

dates = pd.date_range(‘20210701’,periods=7)
df = pd.DataFrame(np.random.randn(7,4),index=dates,columns=list(‘ABCD’))
df

25
Q

Get the first 3 rows from the df

A

df[:3]

26
Q

Slice df based on date range 20210703 to 20210705

A

df[‘20210703’:’20210705’]

27
Q

Slice df on the column names A and B

A

df.loc[:,[‘A’,’B’]]

28
Q

Slice df based on the dates 20210703 to 20210705 and the column names A and B.

A

df.loc[‘20210701’:’20210705’,[‘A’,’B’]]

29
Q

Slice the df based on the second index of row

A

df.iloc[2]

30
Q

Return a specific range of rows based on index. Return the rows 2-4 for the first two columns.

A

df.iloc[2:4, 0:2]

31
Q

Return specific rows (second and sixth row) and columns (first and third) using lists containing columns or row indexes.

A

df.iloc[[1,5],[0,2]]

32
Q

Copy the dataframe df and add a new column E. Name it df2.

A

df2 = df.copy()
df2[‘E’]=[‘one’, ‘one’,’two’,’three’,’four’,’three’,’two’]

33
Q

Select rows based on column values. Select anything from column E that are in the rows that contain two or four. Use df2.

A

df2[df2[‘E’].isin([‘two’,’four’])]

34
Q

select all rows in column E except those with two and four. Use the df df2.

A

df2[~df2[‘E’].isin([‘two’,’four’])]

35
Q

Make a series which has random integers from range 1-10 with total of 40 numbers. Then make a dataframe using this series and change it to 8 rows and 5 columns.

A

ser = pd.Series(np.random.randint(1, 10, 40))
df = pd.DataFrame(ser.values.reshape(8,5))

36
Q

Create a dataframe of two column headings called name and age where the values for the names and ages are:

names = [‘Alice’, ‘Bob’, ‘Charlie’]
ages = [25, 30, 35]

A

names = [‘Alice’, ‘Bob’, ‘Charlie’]
ages = [25, 30, 35]

Create DataFrame
df = pd.DataFrame({‘Name’: names, ‘Age’: ages})