Data Science Intro Flashcards
cylinders = set(d[‘cyl’] for d in mpg)
Use set to return the unique values for the number of cylinders the cars in our dataset have.
sum(float(d[‘hwy’]) for d in mpg) / len(mpg)
This is how to find the average hwy fuel economy across all cars.
len(mpg) - mpg is the title of a list that includes dictionary keys.
csv.Dictreader has read in each row of our csv file as a dictionary. len shows that our list is comprised of 234 dictionaries.
import csv
%precision 2
with open('mpg.csv') as csvfile: mpg = list(csv.DictReader(csvfile))
mpg[:3] # The first three dictionaries in our list.
Reads csv file and make a list named mpg
What will the output be?
sales_record = {
‘price’: 3.24,
‘num_items’: 4,
‘person’: ‘Chris’}
sales_statement = ‘{} bought {} item(s) at a price of {} each for a total of {}’
print(sales_statement.format(sales_record[‘person’],
sales_record[‘num_items’],
sales_record[‘price’],
sales_record[‘num_items’]*sales_record[‘price’]))
Chris bought 4 item(s) at a price of 3.24 each for a total of 12.96
x = (‘Christopher’, ‘Brooks’, ‘brooksch@umich.edu’)
fname, lname, email = x
print(fname)
Christopher
Tuple format?
list = (“Hi”, “Dave”, 4)
List format?
list = [“hi”, 4, 2, “Dave”]
x = {‘Christopher Brooks’: ‘brooksch@umich.edu’, ‘Bill Gates’: ‘billg@microsoft.com’}
x[‘Christopher Brooks’]
Retrieve a value by using the indexing operator
‘brooksch@umich.edu’
CtyMpgByCyl = []
for c in cylinders: # iterate over all the cylinder levels
summpg = 0
cyltypecount = 0
for d in mpg: # iterate over all dictionaries
if d[‘cyl’] == c: # if the cylinder level type matches,
summpg += float(d[‘cty’]) # add the cty mpg
cyltypecount += 1 # increment the count
CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple (‘cylinder’, ‘avg mpg’)
CtyMpgByCyl.sort(key=lambda x: x[0])
CtyMpgByCyl
Prints the average mpg for each cylinder size
Lambda sorts CityMpgByCyl by first key index.
[(‘4’, 21.01), (‘5’, 20.50), (‘6’, 16.22), (‘8’, 12.57)]
vehicleclass = set(d[‘class’] for d in mpg)
vehicleclass
What are the class types? Only show me one of each
{‘2seater’, ‘compact’, ‘midsize’, ‘minivan’, ‘pickup’, ‘subcompact’, ‘suv’}
HwyMpgByClass = []
for t in vehicleclass: # iterate over all the vehicle classes
summpg = 0
vclasscount = 0
for d in mpg: # iterate over all dictionaries
if d[‘class’] == t: # if the cylinder amount type matches,
summpg += float(d[‘hwy’]) # add the hwy mpg
vclasscount += 1 # increment the count
HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple (‘class’, ‘avg mpg’)
HwyMpgByClass.sort(key=lambda x: x[1])
HwyMpgByClass
example of how to find the average hwy mpg for each class of vehicle in our dataset.
[('pickup', 16.88), ('suv', 18.13), ('minivan', 22.36), ('2seater', 24.80), ('midsize', 27.29), ('subcompact', 28.14), ('compact', 28.30)]
import datetime as dt
import time as tm
tm.time()
time returns the current time in seconds
import datetime as dt
import time as tm
dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow
Convert the timestamp to datetime.
dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second
get year, month, day, etc. from a datetime
delta = dt.timedelta(days = 100) # create a timedelta of 100 days
delta
timedelta is a duration expressing the difference between two dates.
delta = dt.timedelta(days = 100) today = dt.date.today()
today - delta
Returns date 100 days ago.
datetime.date(2016, 8, 13)
today > today-delta
compare dates
returns True
store1 = [10.00, 11.00, 12.34, 2.34]
store2 = [9.00, 11.10, 12.34, 2.01]
cheapest = map(min, store1, store2)
cheapest
stores the lowest values as a list in cheapest
my_function = lambda a, b, c : a + b
my_function(1, 2, 3)
Here’s an example of lambda that takes in three parameters and adds the first two.
my_list = [] for number in range(0, 1000): if number % 2 == 0: my_list.append(number) my_list
appends even numbers in range
my_list = [number for number in range(0,1000) if number % 2 == 0]
my_list
shorthand version of :
my_list = [] for number in range(0, 1000): if number % 2 == 0: my_list.append(number) my_list
m = np.array([[7, 8, 9], [10, 11, 12]]) # create array w/ numpy
m.shape
Use the shape method to find the dimensions of the array. (rows, columns)
(2, 3)
n = np.arange(0, 30, 2)
n
arange returns evenly spaced values within a given interval.
start at 0 count up by 2, stop before 30
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])