random Flashcards
Python: When scraping a list from a site
remember that you need to loop each list item into a new list, and not use the soup_page to save into pandas.
Pandas: To reference a df column by its index rather than its name, type
df.columns[0]
Pandas: To filter a column by partial string, type
mask = df[“Column name”].str.contains(“string”)
Selenium: To scroll to the bottom of a page, type
my_browser.execute_script(“window.scrollTo(0, document.body.scrollHeight);”)
To import SelectPercentile, type
from sklearn.feature_selection import SelectPercentile
To set the SelectPercentile percentile, type
SelectPercentile(percentile=20)
sklearn: To create a transformer that turns a column from an integer to a float
from sklearn.base import TransformerMixin
class MyTransformer(TransformerMixin):
def transform(self, X, **transform_params): X["Numeric"] = X["Numeric"].apply(lambda x: x.astype(float)) return X
def fit(self, X, y=None, **fit_params): return self
sklearn: To import gradient boosting, type
from sklearn.ensemble import GradientBoostingClassifier
Pandas: To set the columns on read_csv and on a new DataFrame use
read_csv: names=[]
DataFrame: columns=[]
pyautogui: To click somewhere based on a screentshot’s center, type
pixel_x, pixel_y = pyautogui.locateCenterOnScreen(“screenshot.png”)
pyautogui.click(pixel_x, pixel_y)
pyautogui: To have a dialogue box pop up and confirm that you want to continue, type
pyautogui.confirm(“Proceed?”)
pyautogui: To find the pixel coordinates of the current mouse position, and then click them, type
current_x, current_y = pyautogui.position()
pyautogui.click(current_x, current_y)
pyautogui: To move the mouse, type
pyautogui.moveTo(100, 150)
pyautogui: To type characters, type
pyautogui.typewrite(“My String”, interval=0.25)
pyautogui: To take a screenshot and then save it, type
screenshot = pyautogui.screenshot()
screenshot.save(“path/screenshot.png”)
smtplib: To send a gmail email with an image attachment, type
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.image import MIMEImage
from email.mime.text import MIMEText
my_msg = MIMEMultipart()
my_msg[“Subject”] = “My subject”
my_msg.attach(MIMEText(“My body message text”, “plain”))
fp = open(“file/path.png”, ‘rb’)
file = MIMEImage(fp.read())
fp.close()
my_msg.attach(file)
server = smtplib.SMTP(“smtp.gmail.com:587”)
server. ehlo()
server. starttls()
server. login(“me@gmail.com”, “password”)
server. sendmail(“from@gmail.com”, [“to@gmail.com”], my_msg.as_string())
server. quit()
smtplib: To send a gmail email with a csv attachment, type
import smtplib from email.mime.multipart import MIMEMultipart from email.mime.base import MIMEBase from email.mime.text import MIMETe xt from email import encoders
my_msg = MIMEMultipart()
my_msg[“Subject”] = “My subject”
my_msg.attach(MIMEText(“My body message text”, “plain”))
fp = open(“/path/filename.csv”, “rb”)
file = MIMEBase(“application”, “octet-stream”)
file.set_payload(fp.read())
fp.close()
encoders.encode_base64(file)
file.add_header(“Content-Disposition”, “file”, filename=”filename.csv”)
my_msg.attach(file)
server = smtplib.SMTP(“smtp.gmail.com:587”)
server. starttls()
server. login(“me@gmail.com”,”password”)
server. sendmail(“from@gmail.com”, [“to@gmail.com”], my_msg.as_string())
server. quit()
Math: ROI is
revenue divided by cost
Python: To create a special method that returns a string when print is called on a class instance, type
class Myclass(Parentclass): def \_\_str\_\_(self): return "This string is returned when print(my_instance) is called"
Python: To create an __init__ method that prompts an input upon instantiation, type
class Myclass: def \_\_init\_\_(self, **args): self.my_attribute = input("Prompt string")
Python: To create an __init__ method that prompts a method that then prompts two inputs, type
class Myclass: def \_\_init\_\_(self, **args): self.my_attribute = self.input_method()
def input_meth(self): my_attribute = input("Prompt string") return my_attribute
Python: To combine two columns together so their rows are both available in every iteration of a for loop, type
for item1, item2 in tuple(zip(df[“column”].tolist(), df[“column”].tolist())):
print(item1, item2)
Python: A generator expression is
the same as a list comprehension but can be passed into a function without turning it into a list.
Python: To write a generator expression, type
(item for item in my_list if item >5)
Python: To iteratively replace a list of characters with spaces, type
for item in [”.”, “?”, “!”]:
text = text.replace(item, “ “)
Python: To remove most of the html, style and scripts from a pages source, type
soup_page = BeautifulSoup(page, “html.parser”)
for script in soup_page.find_all([“script”, “style”]):
script.extract()
text = soup.get_text()
for item in [”.”, “?”, “!”, “,”, “ “]:
text = text.replace(item, “ “)
Pandas: To append a row of data to an existing df that is empty while also setting its column labels, type
df = df.append({“column1”:”value”, “column2”:”value”, “column3”:”value”}, ignore_index=True)
sklearn: A confusion matrix is a
2x2 matrix with the y index of actual class and x index of predicted class, that counts how many values of each class were correctly or incorrectly predicted. It is a measure of how many false positives and false negatives there are.
sklearn: Generally it seems for text it is best to
Not use select percentile
use a stemmer
remove irrelevant symbols and characters
use Tfidf instead of CountVect
sklearn: When choosing the training data it is pivotal to
not have mislabeled data
sklearn: When choosing the features, use
the all features you think have high information gain, and do whatever is necessary to get them into the dataset.
sklearn: It is very unlikely for NearestNeighbors
to outperform other models, and if it does it may be because the data has duplicates
Python: To open a file on windows in its default application, type
os.system(“start /file/path.csv”)
sklearn: The default settings for GridSearchCV should be
if using DataFrameMapper
sklearn_pandas.GridSearchCV(pipeline, param_grid=param_grid, verbose=3, scoring=”accuracy”, cv=10)
sklearn: When data is missing, it can be useful to
impute the data based on hints in the other columns. eg Mr. is associated with older age.
sklearn: To GridSearchCV the parameters of a model nested in a pipeline, type
import sklearn_pandas
param_grid = {“setname__parameter”:[10, 20, 30]}
grid_model = sklearn_pandas.GridSearchCV(pipeline, param_grid=param_grid, verbose=3, scoring=”accuracy”, cv=10)
sklearn: To use sklearn_pandas.GridSearchCV, you cannot
have any custom transformer (I think)
adwords: In upgraded URLs, curly brackets with an underscore in the tracking template means,
That it is a variable name that must be assigned in one of the custom parameters.
adwords: Always export reports as
a nomal CSV, not the Excel type
Pandas: To make the last column the first, type
df = df.reindex_axis([“Conversions”] + [item for item in df.columns if item !=”Conversions”], axis=1)
Python: To inherit from two parent classes, type
class Myclass(Parentclass1, Parentclass2):
GridSearchCV: To print all of the grid scores as GridSearchCV is working, type
verbose=3
sklearn_pandas.GridSearchCV(pipe, param_grid, verbose=3)
GridSearchCV: To choose the number of folds GridSearchCV creates, type
cv =10
sklearn_pandas.GridSearchCV(pipe, param_grid, verbose=3, scoring=”accuracy”, cv=10)
Python: To make a class that prompts for an input upon init and if it is not part of a list, it asks for the input again, type
class Myclass:
def __init__(self, **args):
def get_attribute(self):
my_attribute = input(“Attribute query?”)
if my_attribute != “chosen attribute”:
return get_attribute()
else:
get_attribute()
self.my_attribute = self.get_attribute()
Python: To perform addition on just one index of a list, type
my_list[2] += 1
or
my_list[2] = my_list[2] +1
Python: DRY means
grouping common operations into functions and common functionality in classes.
Python: To borrow classes from another python file in the same directory, type
from otherclassfile import Myotherclass
class Myclass(Myotherclass):
Python: To extend two classes, type
class Myclass(Otherclass, Otherclass2)
Python: To override a function that is inherited from a class just
create a new function in the current class with the same name
Python: Before inheriting a class make sure to
import it.
Python: In order to set attributes of a class they must be
in def __init__(self, **args):
Python: All classes implicitly extend from
the Object class
Python: The format for a set is
{1,2,3}
Python: A set automatically
removes any non unique values and order the rest in ascending order.
Python: Can a set use a comprehension?
yes
Python: When a tuple is a paramater into a class it requires
it’s own brackets
Pandas: To set the maximum rows and columns to display, type
pandas. set_option(“display.max_rows”, 1000)
pandas. set_option(“display.max_columns”, 1000)
Python: A dependency is
an external file that must be imported into the file you are running.
Python: To sum a list, type
sum(my_list)
sklearn: for RandomForestClassifer, the parameters you should GridSearchCV are
n_estimators, max_features, max_depth, min_samples_leaf
sklearn: If the desired output consists of one or more continuous variables
the task is called regression.
sklearn: In a confusion matrix it is ideal for the
main top left to bottom right has the highest numbers because that signifies correct classifications.
sklearn: Recall is
the rate of how often the algorithm misclassifies a sample that is in fact a certain class as another one. "This class only gets classified correctly x percent of the time." When a sample is in fact a certain class, how often is it classified correctly.
“in fact”
Measure of false negatives for a class.
true positives/(false negatives + true positives)
sklearn: Precision is
the rate of how often when a classification is made, how often it is correct.
“When a classification is finally made for this sample we are x sure that is was made correctly”
How often do samples of other classes get mistaken for this class.
“when classification is made”
Measure of true positives for a class.
true positives/(false positives + true positives)