Data Science Intro Flashcards
cylinders = set(d[‘cyl’] for d in mpg)
Use set to return the unique values for the number of cylinders the cars in our dataset have.
sum(float(d[‘hwy’]) for d in mpg) / len(mpg)
This is how to find the average hwy fuel economy across all cars.
len(mpg) - mpg is the title of a list that includes dictionary keys.
csv.Dictreader has read in each row of our csv file as a dictionary. len shows that our list is comprised of 234 dictionaries.
import csv
%precision 2
with open('mpg.csv') as csvfile: mpg = list(csv.DictReader(csvfile))
mpg[:3] # The first three dictionaries in our list.
Reads csv file and make a list named mpg
What will the output be?
sales_record = {
‘price’: 3.24,
‘num_items’: 4,
‘person’: ‘Chris’}
sales_statement = ‘{} bought {} item(s) at a price of {} each for a total of {}’
print(sales_statement.format(sales_record[‘person’],
sales_record[‘num_items’],
sales_record[‘price’],
sales_record[‘num_items’]*sales_record[‘price’]))
Chris bought 4 item(s) at a price of 3.24 each for a total of 12.96
x = (‘Christopher’, ‘Brooks’, ‘brooksch@umich.edu’)
fname, lname, email = x
print(fname)
Christopher
Tuple format?
list = (“Hi”, “Dave”, 4)
List format?
list = [“hi”, 4, 2, “Dave”]
x = {‘Christopher Brooks’: ‘brooksch@umich.edu’, ‘Bill Gates’: ‘billg@microsoft.com’}
x[‘Christopher Brooks’]
Retrieve a value by using the indexing operator
‘brooksch@umich.edu’
CtyMpgByCyl = []
for c in cylinders: # iterate over all the cylinder levels
summpg = 0
cyltypecount = 0
for d in mpg: # iterate over all dictionaries
if d[‘cyl’] == c: # if the cylinder level type matches,
summpg += float(d[‘cty’]) # add the cty mpg
cyltypecount += 1 # increment the count
CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple (‘cylinder’, ‘avg mpg’)
CtyMpgByCyl.sort(key=lambda x: x[0])
CtyMpgByCyl
Prints the average mpg for each cylinder size
Lambda sorts CityMpgByCyl by first key index.
[(‘4’, 21.01), (‘5’, 20.50), (‘6’, 16.22), (‘8’, 12.57)]
vehicleclass = set(d[‘class’] for d in mpg)
vehicleclass
What are the class types? Only show me one of each
{‘2seater’, ‘compact’, ‘midsize’, ‘minivan’, ‘pickup’, ‘subcompact’, ‘suv’}
HwyMpgByClass = []
for t in vehicleclass: # iterate over all the vehicle classes
summpg = 0
vclasscount = 0
for d in mpg: # iterate over all dictionaries
if d[‘class’] == t: # if the cylinder amount type matches,
summpg += float(d[‘hwy’]) # add the hwy mpg
vclasscount += 1 # increment the count
HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple (‘class’, ‘avg mpg’)
HwyMpgByClass.sort(key=lambda x: x[1])
HwyMpgByClass
example of how to find the average hwy mpg for each class of vehicle in our dataset.
[('pickup', 16.88), ('suv', 18.13), ('minivan', 22.36), ('2seater', 24.80), ('midsize', 27.29), ('subcompact', 28.14), ('compact', 28.30)]
import datetime as dt
import time as tm
tm.time()
time returns the current time in seconds
import datetime as dt
import time as tm
dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow
Convert the timestamp to datetime.
dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second
get year, month, day, etc. from a datetime
delta = dt.timedelta(days = 100) # create a timedelta of 100 days
delta
timedelta is a duration expressing the difference between two dates.
delta = dt.timedelta(days = 100) today = dt.date.today()
today - delta
Returns date 100 days ago.
datetime.date(2016, 8, 13)
today > today-delta
compare dates
returns True
store1 = [10.00, 11.00, 12.34, 2.34]
store2 = [9.00, 11.10, 12.34, 2.01]
cheapest = map(min, store1, store2)
cheapest
stores the lowest values as a list in cheapest
my_function = lambda a, b, c : a + b
my_function(1, 2, 3)
Here’s an example of lambda that takes in three parameters and adds the first two.
my_list = [] for number in range(0, 1000): if number % 2 == 0: my_list.append(number) my_list
appends even numbers in range
my_list = [number for number in range(0,1000) if number % 2 == 0]
my_list
shorthand version of :
my_list = [] for number in range(0, 1000): if number % 2 == 0: my_list.append(number) my_list
m = np.array([[7, 8, 9], [10, 11, 12]]) # create array w/ numpy
m.shape
Use the shape method to find the dimensions of the array. (rows, columns)
(2, 3)
n = np.arange(0, 30, 2)
n
arange returns evenly spaced values within a given interval.
start at 0 count up by 2, stop before 30
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])
n = [ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
n = n.reshape(3, 5)
reshape returns an array with the same data with a new shape.
reshape array to be 3x5
array([[ 0, 2, 4, 6, 8],
[10, 12, 14, 16, 18],
[20, 22, 24, 26, 28]])
o = np.linspace(0, 4, 9)
o
linspace returns evenly spaced numbers over a specified interval.
return 9 evenly spaced values from 0 to 4
array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ])
o = [ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ]
o.resize(3, 3)
o
resize changes the shape and size of array in-place.
array([[ 0. , 0.5, 1. ],
[ 1.5, 2. , 2.5],
[ 3. , 3.5, 4. ]])
np.ones((3, 2))
ones returns a new array of given shape and type, filled with ones.
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])
np.zeros((2, 3))
zeros returns a new array of given shape and type, filled with zeros.
array([[ 0., 0., 0.],
[ 0., 0., 0.]])
np.eye(3)
eye returns a 2-D array with ones on the diagonal and zeros elsewhere.
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
np.diag(y)
diag extracts a diagonal or constructs a diagonal array.
array([[4, 0, 0],
[0, 5, 0],
[0, 0, 6]])
np.array([1, 2, 3] * 3)
Create an array using repeating list (or see np.tile)
array([1, 2, 3, 1, 2, 3, 1, 2, 3])
np.repeat([1, 2, 3], 3)
Repeat elements of an array using repeat.
array([1, 1, 1, 2, 2, 2, 3, 3, 3])
p = ([[1, 1, 1],
[1, 1, 1]])
np.vstack([p, 2*p])
Use vstack to stack arrays in sequence vertically (row wise).
array([[1, 1, 1],
[1, 1, 1],
[2, 2, 2],
[2, 2, 2]])
p = ([[1, 1, 1],
[1, 1, 1]])
np.hstack([p, 2*p])
Use hstack to stack arrays in sequence horizontally (column wise).
array([[1, 1, 1, 2, 2, 2],
[1, 1, 1, 2, 2, 2]])
x = [1 2 3] y = [4 5 6]
print(x + y)
print(x - y)
[5 7 9]
[-3 -3 -3]
x = [1 2 3] y = [4 5 6]
print(x * y)
print(x / y)
[ 4 10 18]
[ 0.25 0.4 0.5 ]
x = [1 2 3]
print(x**2)
raises all elements to power of 2
[1 4 9]
x.dot(y) # dot product 14 + 25 + 3*6
dot product 14 + 25 + 3*6
x[0]y[0] + x[1]y[1] + x[2]y[2] = 32
z = np.array([y, y**2])
print(len(z))
prints number of rows in 2d array
z = ([[ 4, 5, 6],
[16, 25, 36]])
z. shape is (2,3)
z. T
Transposing changes shape of array
array([[ 4, 16],
[ 5, 25],
[ 6, 36]])
z.dtype
Use .dtype to see the data type of the elements in the array.
dtype(‘int64’)
z starts off as int
z = z.astype(‘f’)
z.dtype
Use .astype to cast to a specific type.
dtype(‘float32’)
a. argmax()
a. argmin()
argmax and argmin return the index of the maximum and minimum values in the array.
s = np.arange(13)**2
s
fills array w/ squares of first 13 index spots
array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144])
s = ([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144])
s[0], s[4], s[-1]
Use bracket notation to get the value at a specific index. Remember that indexing starts at 0.
(0, 16, 144)
s = ([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144])
s[1:5]
Use : to indicate a range. array[start:stop]
Leaving start or stop empty will default to the beginning/end of the array.
array([ 1, 4, 9, 16])
s = ([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144])
s[1:5]
Use negatives to count from the back.
array([ 81, 100, 121, 144])
s = ([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144])
s[-5::-2]
A second : can be used to indicate step-size. array[start:stop:stepsize]
Here we are starting 5th element from the end, and counting backwards by 2 until the beginning of the array is reached.
array([64, 36, 16, 4, 0])
r = ([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]])
r[3, 3:6]
use : to select a range of rows or columns
array([21, 22, 23])
r = ([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]])
r[:2, :-1]
Here we are selecting all the rows up to (and not including) row 2, and all the columns up to (and not including) the last column.
array([[ 0, 1, 2, 3, 4],
[ 6, 7, 8, 9, 10]])
r = ([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]])
r[-1, ::2]
This is a slice of the last row, and only every other element.
array([30, 32, 34])
r = ([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]])
r[r > 30]
We can also perform conditional indexing. Here we are selecting values from the array that are greater than 30. (Also see np.where)
array([31, 32, 33, 34, 35])
r = ([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]])
r[r > 30] = 30
r
Here we are assigning all values in the array that are greater than 30 to the value of 30.
array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 30, 30, 30, 30, 30]])
r2[:] = 0
r2
Set this slice’s values to zero ([:] selects the entire array)
if r2 is the result of a slice of r’s values, r will also change in the sliced positions. Need to copy.
test = np.random.randint(0, 10, (4,3))
test
Create a new 4 by 3 array of random numbers 0-9.
array([[0, 8, 0],
[0, 5, 7],
[3, 7, 4],
[3, 4, 9]])
test = ([[0, 8, 0],
[0, 5, 7],
[3, 7, 4],
[3, 4, 9]])
for row in test:
print(row)
Iterate by row:
Each row is an index because this is multidimensional array.
[0 8 0]
[0 5 7]
[3 7 4]
[3 4 9]
test = ([[0, 8, 0],
[0, 5, 7],
[3, 7, 4],
[3, 4, 9]])
for i in range(len(test)):
print(test[i])
Iterate by index:
[6 9 4]
[8 1 9]
[4 8 1]
[7 2 2]
test = ([[0, 8, 0],
[0, 5, 7],
[3, 7, 4],
[3, 4, 9]])
for i, row in enumerate(test):
print(‘row’, i, ‘is’, row)
Iterate by row and index:
row 0 is [6 9 4]
row 1 is [8 1 9]
row 2 is [4 8 1]
row 3 is [7 2 2]
test = ([[0, 8, 0],
[0, 5, 7],
[3, 7, 4],
[3, 4, 9]])
test2 = test**2 test2 prints as: array([[36, 81, 16], [64, 1, 81], [16, 64, 1], [49, 4, 4]])
for i, j in zip(test, test2):
print(i, ‘+’, j, ‘=’, i+j)
Use zip to iterate over multiple iterables.
for i, j in zip(test, test2):
print(i, ‘+’, j, ‘=’, i+j)
[6 9 4] + [36 81 16] = [42 90 20]
[8 1 9] + [64 1 81] = [72 2 90]
[4 8 1] + [16 64 1] = [20 72 2]
[7 2 2] + [49 4 4] = [56 6 6]