Data Science Flashcards
dictionaries
type of data type like list. uses { } is define it
The built-in datatypes in Python is called dictionary. It defines one-to-one relationship between keys and values. Dictionaries contain pair of keys and their corresponding values. Dictionaries are indexed by keys.
countries = {“afghanistan”: 30.55, “albania”:2.77, “algeria”: 40.33}
dict_name[key]
result: value
pandas
helps import data from csv into python as tabular form
loc in panadas
uses labels for slicing/filtering
print(cars.loc[[‘AUS’, ‘EG’]])
iloc in pandas
uses indexes for slicing/filtering
print(cars.iloc[2])
if then stmt
if condition:
expression
boolen ops for Numpy arrays
use np.logical_or, np.logical_and ,
np.logical_and
while loop
The while loop is like a repeated if statement. The code is executed over and over again, as long as the condition is True
while condition:
expression
What is the difference between list and tuples?
Lists are mutable i.e they can be edited. Syntax: list_1 = [10, ‘Chelsea’, 20]
Tuples are immutable (tuples are lists which can’t be edited). Syntax: tup_1 = (10, ‘Chelsea’ , 20)
Explain Inheritance in Python with an example
Inheritance allows One class to gain all the members(say attributes and methods) of another class. Inheritance provides code reusability, makes it easier to create and maintain an application. The class from which we are inheriting is called super-class and the class that is inherited is called a derived / child class.
Help() function:
The help() function is used to display the documentation string and also facilitates you to see the help related to modules, keywords, attributes, etc.
Dir() function
The dir() function is used to display the defined symbols.
What does this mean: *args, **kwargs?
We use *args when we aren’t sure how many arguments are going to be passed to a function, or if we want to pass a stored list or tuple of arguments to a function. **kwargsis used when we don’t know how many keyword arguments will be passed to a function, or it can be used to pass the values of a dictionary as keyword arguments.
Write a one-liner that will count the number of capital letters in a file. Your code should work even if the file is too big to fit in memory.
with open(SOME_LARGE_FILE) as fh: count = 0 text = fh.read() for character in text: if character.isupper(): count += 1
We will now try to transform this into a single line. count sum(1 for line in fh for character in line if character.isupper())
How can you randomize the items of a list in place in Python?
from random import shuffle
x = [‘Keep’, ‘The’, ‘Blue’, ‘Flag’, ‘Flying’, ‘High’]
shuffle(x)
print(x)
[‘Flying’, ‘Keep’, ‘Blue’, ‘High’, ‘The’, ‘Flag’]
Write a sorting algorithm for a numerical dataset in Python.
list = [“1”, “4”, “0”, “6”, “9”]
list = [int(i) for i in list]
list.sort()
print (list)
NOTE: if list held integers then you can just to list.sort()
but here we first had to convert string into integer
How can you generate random numbers in Python?
import random
random.random
randrange(a, b): it chooses an integer and define the range in-between [a, b). It returns the elements by selecting it randomly from the range that is specified. It doesn’t build a range object.
What is pickling and unpickling?
Pickle module accepts any Python object and converts it into a string representation and dumps it into a file by using dump function, this process is called pickling. While the process of retrieving original Python objects from the stored string representation is called unpickling.
How do you calculate percentiles with Python/ NumPy?
import numpy as np
a = np.array([1,2,3,4,5])
p = np.percentile(a, 50) #Returns 50th percentile, e.g. median
print(p)
Suppose list1 is [2, 33, 222, 14, 25], What is list1[-1] ?
a) Error
b) None
c) 25
d) 2
c) 25
how to parse strings into lists
l1 = s1.split(“ “)
print(l1)
define functions and reuse them
def times(x,y): z = x*y return z
print(times(5,10))
iterating through dictionaries
for k in my_dict1.keys():
print(k)
for k,v in my_dict1.items():
print(k,v)
word = word + 1
=+1
Saving the tweets to the csv file
with open('recent_tweets_crest 100.csv', 'w', newline='') as csvfile: wf = csv.writer(csvfile, delimiter=',', quotechar='"')
for tweet in tweets: wf.writerow(['crest', tweet['text'].encode("utf-8")])
numpy — how to use/code it
import numpy as np
*** height and weights are listed predefined before
In [7]: np_height = np.array(height)
In [8]: np_height
Out[8]: array([ 1.73, 1.68, 1.71, 1.89, 1.79])
In [9]: np_weight = np.array(weight)
In [10]: np_weight
Out[10]: array([ 65.4, 59.2, 63.6, 88.4, 68.7])
In [11]: bmi = np_weight / np_height ** 2
In [12]: bmi
Out[12]: array([ 21.852, 20.975, 21.75 , 24.747, 21.441])
np1 = np.array([1,’is’, True])
NumPy arrays: contain only one type
array([‘1’, ‘is’, ‘True’],
dtype=’
you can not add value of two list, only combine them.
python_list = [1, 2, 3] numpy_array = np.array([1, 2, 3])
In [22]: python_list + python_list
Out[22]: [1, 2, 3, 1, 2, 3]
In [23]: numpy_array + numpy_array
Out[23]: array([2, 4, 6])
subsetting numpy. list of 1,2,3,4,5. find values > 3
np2 > 3
array([False, False, False, True, True], dtype=bool)
np2[np2>3]
array([4, 5])
subsetting 2d numpy
np2d [row] [column]
for numpy, [:] means?
everything in row/column
for numpy, [1:3] means?
includes only row/coulmns 1 and 2
numpy 2d structure
np2d = np.array( [[list1], [list2]])
arrays ans lists count index as
0,1,2,3,4 etc
how to change values in a list
list = [1,2,3,4,5]
list[2] = ‘33’
list[:2] = [0,0]
how to add to a list
append() or list + [1,2,3]
how to delete a list or values in list?
set to equal to empty list
list [:] = [ ]
list [:2] = [ ]
opening and writing a file
f = open('file path', 'w') ('c:/test/a.txt', 'w') f.write('hey what'/s up') f.close() always close files
reading file
f = open(‘file path’, ‘r’)
(‘c:/test/a.txt’, ‘r’)
f.read(#bytes)
import panda as pd
pd.dataframe
cars = pd.DataFrame(my_dict)
how to put row labels on pandas data frame
cars = pd.DataFrame(dict)
# Definition of row_labels row_labels = ['US', 'AUS', 'JAP', 'IN', 'RU', 'MOR', 'EG']
# Specify row labels of cars cars.index = row_labels`
how to import csv into pandas
cars = pd.read_csv("cars.csv") cars = pd.read_csv('cars.csv', index_col = 0)
slicing pandas
# Print out country column as Pandas Series print(cars['country'])
# Print out country column as Pandas DataFrame print(cars[['country']])
# Print out DataFrame with country and drives_right columns print(cars[['country', 'drives_right']])
how to filter pandas
# Import cars data import pandas as pd cars = pd.read_csv('cars.csv', index_col = 0)
# Create car_maniac: observations that have a cars_per_cap over 500 ---cpc = cars["cars_per_cap"] ---many_cars = cpc > 500 -----car_maniac = cars[many_cars] one liners: car_maniac = cars[cars["cars_per_cap"]]
ANOTHER EXAMPLE # Create medium: observations with cars_per_cap between 100 and 500
cpc = cars['cars_per_cap'] between = np.logical_and(cpc > 100, cpc < 500) medium = cars[between]
for loop with 2 variables, use….
enumerate # areas list areas = [11.25, 18.0, 20.0, 10.75, 9.50]
# Change for loop to use enumerate() for x,y in enumerate(areas) : print("room " + str(x) + ": " + str(y))
how to iterate through numpy arrays
nditers ()
for val in np.nditer(my_array):
how to iterate through pandas dataframes
for lab, row in cars.iterrows():
print (lab)
for lab, row in cars.iterrows() :
print(lab + “: “ + str(row[‘cars_per_cap’]))
how to add a column in dataframe
# Code for loop that adds COUNTRY column for lab, row in cars.iterrows(): cars.loc[lab, "COUNTRY"] = row["country"].upper()
apply function on dataframes
cars[“COUNTRY”] = cars[“country”].apply(str.upper)
to get median, avg, low, high of a dataframe
df.describe()