W0.5 Flashcards

1
Q

How do you import NumPy and its alias?

A

import numpy as np

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you construct a 1-dimensional ndarray?

A

data_ndarray = np.array([5, 10, 15, 20])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which is faster NumPy or Python?

A

NumPy requires only two processor cycles — four times faster than standard Python.

This technique of replacing for loops with simultaneous operations on multiple data points is called vectorization, made possible by ndarrays.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is understood by CSV?

A

CSV file data is separated by commas (Comma Separated Value file)

EXAMPLE.
pickup_year,pickup_month,pickup_day,pickup_dayofweek,pickup_time,pickup_location_code,dropoff_location_code,trip_distance
2016,1,1,5,0,2,4,21
2016,1,1,5,0,2,1,16.29
2016,1,1,5,0,2,6,12.7
2016,1,1,5,0,2,6,8.7
2016,1,1,5,0,2,6,5.56

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can you convert a CSV file into a ndarray with NumPy?

A

numpy.array()

STEPS:
1. Import the csv and numpy libraries

  1. Use the csv module to read in our nyc_taxis.csv file as a list of lists
  2. Convert the list of lists to an ndarray
    - We’ll use the numpy.array() constructor in order to create a 2D ndarray

EXAMPLE:

import csv
import numpy as np

import nyc_taxi.csv as a list of lists
f = open(“nyc_taxis.csv”, “r”)
taxi_list = list(csv.reader(f))

remove the header row
taxi_list = taxi_list[1:]

convert all values to floats
converted_taxi_list = []
for row in taxi_list:
converted_row = []
for element in row:
converted_row.append(float(element))
converted_taxi_list.append(converted_row)

convert the converted_taxi_list variable to a NumPy ndarray
below this comment
taxi = np.array(converted_taxi_list)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can we determine the shape/dimension of the ndarray?

A

Use the ndarray.shape attribute

EXAMPLE:
data_ndarray = np.array([[5, 10, 15],
[20, 25, 30]])
print(data_ndarray.shape)

OUT: (2, 3)

NOTE: This output, which is a tuple, gives us a couple of important pieces of information:
- The first number tells us that there are two rows in data_ndarray.
- The second number tells us that there are three columns in data_ndarray

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you select a row or select a particular column for a given row?

A

select all columns for a given set of rows
ndarray[row_index]

select particular columns for a given set of rows
ndarray[row_index, column_index]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

EXAMPLE:
From the provided taxi array:
1. Select the row at index 0, and assign it to row_0.

  1. Select every column for the rows at indices 391 to 500 inclusive, and assign the result to rows_391_to_500.
  2. Select the element at row index 21 and column index 5, and assign it to row_21_column_5
A

1.
row_0 = taxi[0]

2.
rows_391_to_500 = taxi[391:501]

3.
row_21_column_5 = taxi[21, 5]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

EXAMPLE:
From the provided taxi ndarray:

  1. Select every row for the columns at indices 1, 4, and 7. Assign the result to columns_1_4_7.
  2. Select the columns at indices 5 to 8 inclusive for the row at index 99. Assign the result to row_99_columns_5_to_8.
  3. Select the rows at indices 100 to 200 inclusive for the column at index 14. Assign the result to rows_100_to_200_column_14
A

1.
columns_1_4_7 = taxi[:,[1, 4, 7]]

2.
row_99_columns_5_to_8 = taxi[99, 5:9]

3.
rows_100_to_200_column_14 = taxi[100:201, 14]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

EXAMPLE:
From the provided taxi ndarray:

  1. Slice the taxi array to extract all rows and the 10th column only. Assign the result to a new variable called fare_amount.
  2. Slice the taxi array to extract all rows and the 11th column only. Assign the result to a new variable called fees_amount.
  3. Add the fare_amount and fees_amount arrays element-wise. Assign the result to a new variable called fare_and_fees.
A

1.
fare_amount = taxi[:,9]

2.
fees_amount = taxi[:,10]

3.
fare_and_fees = fees_amount + fare_amount

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you find the minimum value of an 1 dimensional array?

A

To find the minimum value of a 1D ndarray, we can use the vectorized ndarray.min() method, like this:

mph_min = trip_mph.min()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the different summary statistics for ndarray?

A
  • To calculate the minimum value: ndarray.min()
  • To calculate the maximum value: ndarray.max()
  • To calculate the mean or average value: ndarray.mean()
  • To calculate the sum of the values: ndarray.sum()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference between function calls and method calls?

A
  • Function calls usually start with the library name or its alias (e.g., np.mean()).
  • Method calls begin with an object or variable name from a particular class (e.g., trip_mph.mean()).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a delimiter?

A

Delimiter: a named argument; string used to separate each value in the text file. For CSV files, we use a comma – defined as a string – (‘,’) as the delimeter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is NaN, and why may it occur?

A

NaN stands for Not a Number and indicates that the underlying value cannot be represented as a number.

  • It’s similar to Python’s None constant and it is often used to represent missing values in datasets.
  • The NaN values often appear because the first row of CSV files contains column names, which NumPy can’t convert to float64 values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you remove the header row from an ndarray?

A

To remove the header row from our ndarray, we can use a slice, just like with a list of lists:

taxi = taxi[1:]

17
Q

How can you skip the header when loading the data set?

A

We can avoid getting NaN values by skipping the header row(s) when loading the data.

  • We do this by passing an additional argument, skip_header=1, to our call to the numpy.genfromtxt() function.
  • The skip_header argument accepts an integer — the number of rows from the start of the file to skip. Remember that this integer measures the total number of rows to skip and doesn’t use index values. To skip the first row, use a value of 1, not 0.
18
Q

EXAMPLE:
1. Select the values of the tip_amount column (index 12) and store them in an array called tip_amount.

  1. Create a Boolean array, tip_bool, that determines which rows have values for the tip_amount column that are greater than 20.
  2. Use the tip_bool array to select all rows from taxi with values for tip_amounts greater than 20, and select the columns from column index 5 to 13 inclusive. Assign the resulting array to top_tips.
A

tip_amount = taxi[:,12]
tip_bool = tip_amount > 20
top_tips = taxi[tip_bool,5:14]

19
Q

How can you change a value at a specific index location, and multiple locations?

A

For a SPECIFIC INDEX LOCATION:
a = np.array([‘red’, ‘blue’, ‘black’, ‘blue’, ‘purple’])
a[0] = ‘orange’
print(a)

OUT:
[‘orange’, ‘blue’, ‘black’, ‘blue’, ‘purple’]

For MULTIPLE LOCATIONS:
a[3:] = ‘pink’
print(a)

OUT:
[‘orange’, ‘blue’, ‘black’, ‘pink’, ‘pink’]

20
Q

How do you create a copy of the data set?

A

Use Command:

.copy()

NOTE: To safeguard our original data for the exercise below, we’ve created a copy of the taxi data and stored it in taxi_copy using the ndarray.copy() method.

21
Q

EXAMPLE:
1. Select the trip_length column (index 8) in taxi_copy and store it in the variable trip_length.

  1. Use a Boolean array to select all rows of trip_length that are less than 60 and use assignment to update these values to 0.
A

taxi_copy = taxi.copy()
trip_length = taxi_copy[:,8]
taxi_copy[trip_length < 60] = 0

22
Q

EXAMPLE:
1. Use trip_mph to create a new ndarray, cleaned_taxi, containing only rows for which the values of trip_mph are less than 100.

  1. Calculate the mean of the trip_distance column of cleaned_taxi. Assign the result to mean_distance.
  2. Calculate the mean of the trip_length column of cleaned_taxi. Assign the result to mean_length.
  3. Calculate the mean of the total_amount column of cleaned_taxi. Assign the result to mean_total_amount.
A

trip_distance = taxi[:, 7]
trip_length = taxi[:, 8] / 3600
trip_mph = trip_distance / trip_length

cleaned_taxi = taxi[trip_mph < 100]
mean_distance = cleaned_taxi[:, 7].mean()
mean_length = cleaned_taxi[:, 8].mean()
mean_total_amount = cleaned_taxi[:, -2].mean()