3 Flashcards
VC: The most senior person in a VC firm is called a
managing director or general partner.
VC: The managing director or general partner
make the final decision on which companies to invest in and sit on the board of directors for the companies they invest in.
HostGator: To allow remote sql connections
Click “Remote MySQL” and enter the IP address of the computer you want to allow.
HostGator: To create a database, a new user, and give permissions
click “MySQL Database Wizard” and go through it.
HostGator: To upload a csv
Click on desired db, then click on import tab. Select the csv. Choose the csv file format, then if necessary, click checkbutton for “first line contains table columns”
DB: A simple DB is called
sqlite
Windows: To create a new file, type
echo.>file_name.py
Windows: To see current directory, type
echo %cd%
Windows: To change directory, type
cd c:/users/alen.solomon/desktop
Windows: To list all the files in current directory, type
dir /b
Pandas: Non printable characters can cause a “ValueError: No columns to parse from file” and can be fixed by
adding the parameter encoding=”utf-16” to read_csv
Adwords: When exporting a report from adwords, make sure not to use
Excel.csv format
Pandas: To choose a delimiter for read_csv, type
sep=”\t”
Pandas: To turn an excel file to a df, type
xl = pandas.ExcelFile("path/to.xlsx") df = xl.parse("Sheet name")
Pandas: To change column values with a dollar sign to a float, type
df[“Cost”] = df[“Cost”].apply(lambda x: str(x).strip(“$”)).astype(float)
Pandas: When you use df[“Column name”].apply(my_function)
the function name does not need to end with ()
Pandas: To return only certain columns of a df in a set order, type
df = df.ix[ :, [“Abbrev”,”Jan”,”Feb”,”Mar”,”Total”]]
Pandas: To replace all nans with something, type
df[“Column name”].fillna(“Replacement”, inplace=True)
Pandas: To write a couple dfs to two sheets in excel, type
writer = pandas.ExcelWriter(‘/users/student/desktop/demo.xlsx’, engine=’xlsxwriter’)
df. to_excel(writer, index=False, sheet_name=’Sheet1’)
df2. to_excel(writer, index=False, sheet_name=’Sheet2’)
writer. save()
Windows: To clear the cmd screen, type
CLS
Numpy: To turn a numpy array into a matrix, type
my_numpy_array.reshape(5,5)
Note: the dimensions are the 5,5
Numpy: To create a boolean index matrix from two numpy matrices, type
my_numpy_matrix
Numpy: To turn a numpy matrix back into an array, type
my_numpy_matrix.ravel()
Selenium: To import selenium, type
from selenium import webdriver
Selenium: To send selenium to a site, type
my_browser.get(“http://google.com”)
Selenium: To instantiate Chrome, type
my_browser = webdriver.Chrome()
Selenium: To return the title tag, type
my_browser.title
Selenium: To find an element on a page by it’s id, type
my_element = my_browser.find_element_by_id(“lst-ib”)
Selenium: To press keyboard into a page element, type
my_browser.find_element_by_name(“aw”).send_keys(“Send these keys”)
Selenium: To close the browser you instantiated, type
my_browser.quit()
Selenium: To submit in an element, type
my_element.submit()
Selenium: To simulate an arrow key press, type
my_element.send_keys(“my text”, Keys.ARROW_DOWN)
Selenium: To empty a text field, type
my_browser.find_element_by_id(“field”).clear()
Seleniun: To click a submit button, type
my_browser.find_element_by_id(“submit”).click()
Selenium: To select an element by its class name, type
my_browser.find_element_by_class_name(“date-box”)
HTML: The -select> tag is used for
Drop down menues
Selenium: To find the first form on a page through its xml path, type
my_browser.find_element_by_xpath(“//form[1]”)
Note: the slashes represent moving into the html tags and then the body tags and into the form tags.
Selenium: An easier way to find the submit form button and press it is
my_browser.find_element_by_id(“pswrd”).submit()
Selenium: To click a checkbox xpath, type
my_browser.find_element_by_xpath(“.//*[@id=’facebook’]/body/div[2]/label[2]”).click()
Note: Find xpath using firepath.
Selenium: The best way to find a checkbox is
xpath
Selenium: Sometimes to make an xpath work you need to
remove the [@id="js_5x"] from //*[@id="js_5x"]/div/ul/li[6]/a/span/span
Selenium: To get the xpath in chrome
Inspect element twice and then right click and click “copy xpath”
Selenium: When looking for the path for checkboxes or buttons, often
the label, or the outer ridge of button outside of the button text shows the correct path in inspect element rather than the button itself.
Selenium: When uploading a file do not
click on the upload button.
Selenium: To upload a file, type
my_browser.find_element_by_xpath(“//*/body/div[6]/input”).send_keys(“/users/student/desktop/report.csv”)
Selenium: To select an list item in a drop down menu, type
from selenium.webdriver.support.select import Select
select = Select(my_browser.find_element_by_id(“dropdownid”))
select.select_by_visible_text(“Jan”)
Selenium: To make the browser continue looking for a page element if it is not there for a set amount of time, type
my_browser.implicitly_wait(50)
Selenium: implicitly_wait(30) only has to be
set once and all tests for this browser will have the wait.
BS4: To parse a soup_page for an html tag with a certain parameter (e.g. css class name), type
soup_page.find_all(“div”, {“class”: “name”})
BS4: To return a list of all the top level html tags contents separated, type
soup_page.contents
BS4: To return the first item in a list of all the top level html tags contents separated, type
soup_page.contents[0]
BS4: To return the first item in a list of all the top level html tags contents separated and then the first tags contents from within that list item, type
soup_page.contents[0].contents[0]
Python: To use a list comprehension to return a boolean index, type
my_list = [1,2,3,4,5,6,7,8]
[item>3 for item in my_list]
IPYNB: To make a graph show up, type
%matplotlib inline
import matplotlib.pyplot as plt
df.plot(x=”Column”, y=”Column2”, kind=”scatter”)
Pandas: A series is like a
One dimensional array with an index.
Pandas: To set your own index to a Series or df, type
my_index_list = list(range(20)) my_series.index = my_index_list
Pandas: To slice a section of a Series by its index, type
my_series.ix[3:6]
Pandas: In order to set a new index for a df or Series, the index list must
Be the same length as the df or Series.
BS4: To put the page source of a selenium page into a soup_page, type
page = my_browser.page_source
soup_page = BeautifulSoup(page)
Pandas: To return the length and width of a df in a tuple, type
df.shape
Pandas: To see how many unique values are in a column, type
df[“column name”].unique()
Pandas: Even after you change the index to something other than 0 onward, it will
still allow you to slice by the index number
Pandas: To return specific indexes from a Series, type
my_series[[ 6, 9, 2]]
or
my_series[[ “A”, “C”, “G”]]
Pandas: If you change the index of a Series to new numbers you cannot use the original index of the numbers unless yo use
my_series.iloc[0: 5]
or
my_series.iloc[[4, 6, 9]]
Pandas: The difference between .iloc and .ix when the index is numerical, but doesnt match the zero index is
.iloc uses the zero index while .ix uses the current, artificial, index
Pandas: To check if any value in a Series passes a value test, type
my_series[my_series > 50].any()
Pandas: To check if all values in a Series passes a value test, type
my_series[my_series > 50].all()
Pandas: To sum how many values in a Series pass a value test, type
sum(my_series > 5)
Pandas: Can you do a step when slicing a Series
Yes
Pandas: df[“Campaign name”] is the data type
Series
Pandas: To create a copy of a df, type
df_copied = df.copy()
Pandas: To add the rows of one df to another that has the same columns, type
df_concated = pandas.concat([df, df2])
Pandas: With regard to nan values, pandas will
ignore them
Pandas: When you concatenate two dfs the index will
be maintained, so if the dfs were both zero indexed, they there will be duplicates index values.
Pandas: To return the length of a df column, type
df[“Column name”].count()
Pandas: Anything action you can perform on a Series can be also performed on a
df[“Column name”]
Pandas: To forward fill the na values in a Series, type
concated_df.ffill()
Pandas: To backwards fill the na values in a Series, type
concated_df.bfill()
Pandas: To replace the index of a concated df with one that is ordered, type
concated_df.index = range(concated_df[“Column name”].count())
Pandas: df[“column 1”] + df[“column 1”], is an
index based arithmetic. If the indexes are not aligned it will add the wrong rows together.
Pandas: When multiplying two Series, if there are many indices with the same value, pandas will
multiply every combination of the values and add new rows for each product.
Pandas: If you decide to create a DataFrame with lists rather than dicts, the column labels will
Just be set to zero index numbers.
Pandas: To name unnamed columns, type
df.columns = [“Column 1”, “Column 2”]
Pandas: To filter for indexes in a pivot_table, type
pvt.query(‘Type == [“Banner”, “Text”]’)
Pandas: To drill down to a certain value in a certain level of a pivot_table, type
pvt.xs(“Value name”, level=0)
SQL: To select everything from a table, type
SELECT * FROM table
SQL: A databases structure is called it’s
schema
SQL: The three main types of data you can set a column to hold are
String, Numeric, Date and Time
SQL: The two types of string data are
Text and Varchar
SQL: Varchar string type is ideal for
Short strings, like names
SQL: Text string type is ideal for
Long strings, like descriptions
SQL: The numeric data types are
Integers, Fixed Point Decimal, Float
SQL: The fixed point data type
Sets a strict number of decimal places and is ideal for dollars
SQL: The float point data type
does not set a strict number of decimal places
SQL: The best data type to store a date and time together is
datetime
SQL: To create a table with one column that stores 50 characters, type
CREATE TABLE tablename (columnname VARCHAR(50));
SQL: To create a table with two columns, one that stores 50 characters, and another that store integers, type
CREATE TABLE tablename (columnname VARCHAR(50), columnname INTEGER);
SQL: To insert data into a row, type
INSERT INTO table VALUES (“String”, 1000);
SQL: When inserting values into the DB, the values must
be in the same order you defined in the table.
SQL: String you insert must have
quotes
Facebook: To create free banners, go to
www.picmonkey.com
Pep 8: Before and after a top level function put
two blank lines, not including #comments
Pep 8: After every comma and operator, put a
space
Pep 8: Import libraries
on separate lines and at the top with no lines in between
Pep 8: Class names should start with a
capital letter
Pep 8: If putting a comment on the same line as some code
precede it by two spaces
Pep 8: At the end of a file put
an empty line
PDB: To use python debugger, type
import pdb; pdb.set_trace() above the area you want to debug
PDB: To quit the python debugger, type
q
PDB: To see the return of every variable one by one in order to debug, use
python debugger
PDB: When finished debugging, make sure to
remove pdb
PDB: To run the next line of code and return the variable, type
next or n
Pandas: To turn a pivot_table or groupby back into a filterable table, type
pvt.reset_index()
Python: Every file that you create is a
library
Python: To import one specific class, type
from library_name import Class_name
Python: To create an instance of a class type
my_class_instance = Class_name()
Python: Functions that belong to classes are called
methods
Pandas: When importing a csv, to get pandas to recognize a date column, type
parse_dates=[“Column name”]
Pandas: To merge two dfs but only keep the keys on the left df, type
df1.merge(df2, on=”Column”, how=”left”)
Pandas: To refer to a column by it’s index instead of it’s name, type
df.columns[1]
Pandas: When you receive a key error, double check for
Extra spaces
Pandas: To return a list of all the column labels, type
list(df.columns.values)
Python: To turn two lists into a dictionary, type
dict(zip(list1, list2))
Python: To create an invisible command line input for a password, type
import getpass
pswd = getpass.getpass(“Password:”)
Selenium: To find an element by link text and click it, type
my_browser.find_element_by_link_text(“Link text”).click()
Selenium: To take a screenshot of the entire page you must
use firefox driver. chrome driver currently has a bug.
Selenium: The find elements By. types I should remember are
By.PARTIAL_LINK_TEXT By.XPATH By.NAME By.CLASS_NAME By.TAG_NAME By.ID
Selenium: Screenshots must be saved as a
png
Selenium: To refresh the current page, type
my_browser.refresh()
Selenium: To create an explicit wait condition, type
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(my_browser, 20)
wait.until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT,”Reports”)))
Selenium: The find elements By. types I should remember are
By.PARTIAL_LINK_TEXT By. XPATH By.NAME By.CLASS_NAME By.TAG_NAME By.ID
Selenium: Regarding elements hidden by jquery, Selenium must
do what the user must do in order to make it visible.
Selenium: To find an element based on anchor text, type
my_browser.find_element_by_partial_link_text(“Link text”)
Pandas: To convert a Series or column to floats, type
df[“Column name”].astype(float)
Pandas: To create a new column with values that are conditional on other columns values in the same row.
df[“New”] = numpy.where((df[“Column.”]>35) | (df[“Column 2”]==”Banner”), df[“Column2”] * 0.8, df[“Column4”])
Pandas: When using both the “or” and “and” conditions,
the conditions on both sides of “and” get evaluated together first, before the “or”
Pandas: To turn a column into a list type
column_list = df[“Column name”].tolist()
Pandas: To combine two columns into key value pairs in a dictionary, type
column_list = df["Column name"].tolist() column_list2 = df["Column name2"].tolist() combined_dict = dict(zip(column_list, column_list2))
Pandas: The purpose of inplace=True is to
change the df itself without having to re assign the return of the function to the same df variable name.
Pandas: To filter a date formatted column for just a month and year, type
df[“2014-11”]
Pandas: To place a df with inconsistent dates onto an index with all dates, type
df = pandas.DataFrame(index=pandas.date_range(“2014-08-02”, “2014-09-06”, freq=”d”))
df.join(inconsistent_df, how=”outer”)
Pandas: Join, by default, merges on the
indexes
Javascript: To refer to the current page, type
document
Javascript: To create an alert, type
alert(“Hello!”);
Javascript: To write an h1 into the current page, type
document.write(“<h1>Hello!</h1>”);
Javascript: Javascript files end with the filename
.js
Javascript: To pull javascript code into a webpage from an external file, type
-script src=”javascript.js”>-script>
Javascript: To print to the console, type
console.log(“My log”);
Javascript: To create a variable with no value, type
var my_var;
Javascript: To create a variable with a value, type
var my_var = 25;
Javascript: Variable name cannot start with
a number
Javascript: To make a quote be just a quote and not end a string you can use an
escape character \ right before it
Javascript: To save a user input from a dialog box into a variable, type
var dialog_input = prompt(“Text to display”);
Javascript: To add a string and a variable using +, type
var concated_string = “Hello “ + visitor;
Javascript: To update a variable that is referencing itself, type
var message = "Hello "; message = message + "Dave";
Javascript: To update a variable that is referencing itself at the beginning in a shorter way, type
var message = "Hello "; message += "Dave";
Javascript: To return the length of a string type
my_string.length;
Javascript: To return the lower case of a string, type
my_string.toLowerCase();
Javascript: To return the upper case of a string, type
my_string.toUpperCase();
Pandas: To parse dates based on column location instead of column name, type
parse_dates=[0]
Pandas: To rename a column label by its index, type
df.columns.values[0] = “New label”
or
df.rename(columns={df.columns[0]:”New name”})
Pandas: To do a value test for nan, type
df[“Column”] == numpy.nan
Pandas: Pandas deals with nan values by
ignoring them
Pandas: To turn all of the values in a df to integers, type
df = df.astype(int)
Pandas: To see a table of correlations, type
df.corr()
Pandas: To save an image of a matplotlib plot, type
import matplotlib.pyplot as plt
my_plot = df.plot(x=”Column”, y=”Column2”, kind=”scatter”)
my_plot.get_figure().savefig(“/users/student/desktop/pii.png”, bbox_inches=”tight”)
Pandas: The type of graph df.plot(x=”A”, y=”B”) returns is
line
Math: The X axis is
the bottom going horizontal
Matplotlib: The kinds of plots I should remember are
bar, scatter, pie, line
Python: When creating a big project, remember to
Use pep8, document everything starting with “This”, create each part of the project in separate files for easy testing
Python: To slice the last 8 characters from a string, type
my_string[-8:]
Javascript: To turn a string to an integer, type
parseInt(var_string)
Javascript: To turn a string to a float, type
parseFloat(var_string)
Python: To round a number to the nearest whole number, type
round(1.3)
Python: To round a number to the nearest first decimal, type
round(1.326, 1)
Python: To trigger a python script from another python script, type
import os
os.system(“python /users/student/desktop/script.py”)
Python: To open a file in its default program, type
import subprocess
subprocess.Popen([“open”, “/Applications/Calculator.app/”])
Pandas: To delete a column from a df, type
df.drop(“column_name”, axis=1, inplace=True)
Python: To open any file on a mac, type
import subprocess
subprocess.call([“open”, “/Users/student/Desktop/file.app”])
Python: To open any file on widows, type
import os
os.startfile(c:/filename/path)
Python: To delete a file from computer, type
import os
os.remove(“/users/student/desktop/file.png”)
Pandas: To read_csv for specific columns, type
usecols=range(1,7)
Pandas: The usecols argument for read_csv must be
explicit by either index or label, but not a slice
Pandas: To find the highest value in a series, type
my_series.max()
Pandas: The .apply method does not have
access to the index. Must use df.index.map() in the case where you must create a column based on the index.
Pandas: To reference a date_range index to create a new column with the day of the week, type
df[“Day”] = df.index.map(lambda x: x.strftime(“%A”))
Pandas: To add two new values to the bottom of a df, type
df = df.append([[“value1”, “value2”]], ignore_index=True)
Pandas: To get the number of rows in each of a groupby’s keys, type
grouped.size()
ML: Supervised learning is when
The machine learns by labeled examples. The data sample you teach it with must be labeled and then it will start to predict.
ML: A common ratio of a training data set to a testing dataset is
67% train to 33% test.
ML: Naive bayes is useful for datasets with
less than 100k samples and text data
ML: Data sets must be
Labeled
Scraping: Remember before beginning the scraping loop to
create the variable that will hold the results and import BeautifulSoup
ML: Learning new classifiers as you receive data without doing a batch update is called
online learning
ML: When some features are missing it is known as
“missing features”
ML: In non-online cases, when new features become available, you would need to
re-fit the model based on a new batch.
ML: After you have a batch of samples you need to
Fit a model to it.
Pandas: When parsing the XL file sheet, the sheet name is
Case sensitive
Pandas: the to_csv command must be used on a
df. eg. df.to_csv(“path/to/file.csv”)
Pandas: When using df.append([[val1, val2]], ignore_index=True), remember to
reassign the df value to the appended version. e.g. df = df.append([[val1, val2]], ignore_index=True)
ML: A dimension is a
column
ML: In ML, every pixel of an image is its own
Dimension/column
sklearn: Before you can train a model, you need to
instantiate it.
sklearn: To instantiate a regression model, type
from sklearn.linear_model import LinearRegression
model = LinearRegression()
sklearn: To pass an estimator your data you must call the method
model.fit(x, y)
sklearn: The x and y parameters in the model.fit(x, y) method take in
x = samples by features y = the labels
sklearn: Given a trained model, to predict the label of a new data sample by highest probability, type
model.predict([[feature_value, feature_value, feature_value]])
sklearn: To return the probability that a new data set has each label, type
model.predict_proba()
ML: To use categorical data, you must
Binarize it into separate columns.
This is because models assume that if it is in the same column that it is a range, like length or height.
sklearn: The LinearRegression model
plots a line through the data and based on the y axis return x.
sklearn: The DecisionTreeRegressor
Hold all samples in a library and compares new data to the sample and returns the label that matched closest.
sklearn: A classification task is when
You give a model samples with features and have it predict the label.
sklearn: Parameters defined by training have a
trailing underscore
sklearn: The basic format for a ML prediction generator is
from sklearn.chosenlibrary import ChosenClass
model = ChosenClass()
x = samplesbyfeaturesdata
y = targetlabels
model.fit(x, y)
model.predict([[feature_val, feature_val2, feature_val3]])
sklearn: clf stands for
classifier
sklearn: To import the cross validation module, type
from sklearn import cross_validation
sklearn: To fit a model and then cross validate it, type
from sklearn import cross_validation
from sklearn.chosenlibrary import ChosenClass
x = samplebyfeaturedata
y = targetlabels
X_train, X_test, y_train, y_test = cross_validation.train_test_split(x, y, test_size=0.25, random_state=0)
clf = ChosenClass()
clf.fit(x, y)
pred = clf.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(pred, y_test)
SQL: Database normalization usually involves
dividing large tables into smaller (and less redundant) tables and defining relationships between them.
Numpy: To see the number of rows and columns in a numpy array
my_numpy_array.shape
Numpy: To convert a pandas DataFrame to a numpy array, type
my_numpy_array = array(df)
sklearn: clf stands for
classifier.
ML: Natural language processing is
Training a computer to understand the meaning of words.
ML: A decision surface is
a boundary on a scatter plot that divides two different types of data
sklearn: To return the accuracy of a prediction to the test labels, type
from sklearn.metrics import accuracy_score pred = clf.predict(features_test) accuracy_score(pred, labels_test) or model.score(feature_test, label_test)
sklearn: To import naive bayes, type
from sklearn.naive_bayes import GaussianNB
Pandas: To return just the fourth column of a df, type
df.ix[:,3:4]
sklearn: Does not accept values of the type
string
sklearn: To import and instantiate a decision tree classifier, type
from sklearn import tree
clf = tree.DecisionTreeClassifier()
DecisionTreeClassifier: The DecisionTreeClassifier segments the data in a
blocky and slicy manner.
DecisionTreeClassifier: The min_samples_split parameter controls
How many data samples are necessary for the DecisionTree decision boundary line to turn to fit them.
DecisionTreeClassifier: Entropy is
a measure of impurity of the data samples and controls how impure the data must get before splitting.
DecisionTreeClassifier: The default parameter for DecisionTreeClassifier that tunes entropy is
“gini”
DecisionTreeClassifier: DecisionTreeClassifier is prone to
overfitting
KNeighborsClassifier: The KNeighborsClassifier predicts by
a simple majority vote of the nearest neighbors of each test point
NaiveBayes: NaiveBayes classifies by using
correlation with each feature independently to guide classification. It does not group features.
Numpy: To convert a two dimensional numpy array to a no dimensional numpy array, type
my_numpy_array.ravel()
Pandas: To parse dates where the day, month and year are in separate columns, type
parse_dates={“Dates”:[“Day”, “Month”, “Year”]}
Pandas: To bin data from a column in a new column, type
ranges = [0,6,12,18,24]
labels = [“AM”,”AM”,”PM”,”PM”,”PM”]
df[“New column”] = pandas.cut(df[“column name”], ranges, labels = labels)
Pandas: To save a csv without the index, type
df.to_csv(“path.csv”, index=False)
Pandas: To make a specific cell a nan value, type
df.iloc[1,1] = numpy.nan
Pandas: To remove rows with nan values, type
df = df.dropna()
Pandas: To convert an excel file to a df, do not try to
save the excel file as a csv and the read_csv, it will unicode error
Pandas: You cannot df.drop() the
header
Pandas: To change the header to another row in the df, type
df.columns = df.iloc[1]
python: to return the last index where the substring is found, type
my_string.rfind(“substring”)
pandas: When getting a nonetype is not … error using .apply(lambda x: x) use the
str(x) or int(x) method to convert the objects to string
pandas: iterrows() returns
a tuple of the index and the row data
When creating automation scripts always add some
logging
note: Not the same character every time
python: To make a one line if else statement, type
“true value” if 10==10 else “false_value”
python: Think of a virtual env as
an environment you are running your script with, not a place you are saving your scripts.