random Flashcards

1
Q

Python: When scraping a list from a site

A

remember that you need to loop each list item into a new list, and not use the soup_page to save into pandas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Pandas: To reference a df column by its index rather than its name, type

A

df.columns[0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Pandas: To filter a column by partial string, type

A

mask = df[“Column name”].str.contains(“string”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Selenium: To scroll to the bottom of a page, type

A

my_browser.execute_script(“window.scrollTo(0, document.body.scrollHeight);”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

To import SelectPercentile, type

A

from sklearn.feature_selection import SelectPercentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

To set the SelectPercentile percentile, type

A

SelectPercentile(percentile=20)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

sklearn: To create a transformer that turns a column from an integer to a float

A

from sklearn.base import TransformerMixin

class MyTransformer(TransformerMixin):

    def transform(self, X, **transform_params):
        X["Numeric"] = X["Numeric"].apply(lambda x: x.astype(float))
        return X
    def fit(self, X, y=None, **fit_params):
        return self
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

sklearn: To import gradient boosting, type

A

from sklearn.ensemble import GradientBoostingClassifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Pandas: To set the columns on read_csv and on a new DataFrame use

A

read_csv: names=[]
DataFrame: columns=[]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

pyautogui: To click somewhere based on a screentshot’s center, type

A

pixel_x, pixel_y = pyautogui.locateCenterOnScreen(“screenshot.png”)
pyautogui.click(pixel_x, pixel_y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

pyautogui: To have a dialogue box pop up and confirm that you want to continue, type

A

pyautogui.confirm(“Proceed?”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

pyautogui: To find the pixel coordinates of the current mouse position, and then click them, type

A

current_x, current_y = pyautogui.position()

pyautogui.click(current_x, current_y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

pyautogui: To move the mouse, type

A

pyautogui.moveTo(100, 150)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

pyautogui: To type characters, type

A

pyautogui.typewrite(“My String”, interval=0.25)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

pyautogui: To take a screenshot and then save it, type

A

screenshot = pyautogui.screenshot()

screenshot.save(“path/screenshot.png”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

smtplib: To send a gmail email with an image attachment, type

A

import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.image import MIMEImage
from email.mime.text import MIMEText

my_msg = MIMEMultipart()
my_msg[“Subject”] = “My subject”
my_msg.attach(MIMEText(“My body message text”, “plain”))

fp = open(“file/path.png”, ‘rb’)
file = MIMEImage(fp.read())
fp.close()
my_msg.attach(file)

server = smtplib.SMTP(“smtp.gmail.com:587”)

server. ehlo()
server. starttls()
server. login(“me@gmail.com”, “password”)
server. sendmail(“from@gmail.com”, [“to@gmail.com”], my_msg.as_string())
server. quit()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

smtplib: To send a gmail email with a csv attachment, type

A
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email.mime.text import MIMETe
xt
from email import encoders

my_msg = MIMEMultipart()
my_msg[“Subject”] = “My subject”
my_msg.attach(MIMEText(“My body message text”, “plain”))

fp = open(“/path/filename.csv”, “rb”)
file = MIMEBase(“application”, “octet-stream”)
file.set_payload(fp.read())
fp.close()
encoders.encode_base64(file)
file.add_header(“Content-Disposition”, “file”, filename=”filename.csv”)
my_msg.attach(file)

server = smtplib.SMTP(“smtp.gmail.com:587”)

server. starttls()
server. login(“me@gmail.com”,”password”)
server. sendmail(“from@gmail.com”, [“to@gmail.com”], my_msg.as_string())
server. quit()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Math: ROI is

A

revenue divided by cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Python: To create a special method that returns a string when print is called on a class instance, type

A
class Myclass(Parentclass):
    def \_\_str\_\_(self):
        return "This string is returned when print(my_instance) is called"
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Python: To create an __init__ method that prompts an input upon instantiation, type

A
class Myclass:
    def \_\_init\_\_(self, **args):
        self.my_attribute = input("Prompt string")
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Python: To create an __init__ method that prompts a method that then prompts two inputs, type

A
class Myclass:
    def \_\_init\_\_(self, **args):
        self.my_attribute = self.input_method()
    def input_meth(self):
        my_attribute = input("Prompt string")
            return my_attribute
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Python: To combine two columns together so their rows are both available in every iteration of a for loop, type

A

for item1, item2 in tuple(zip(df[“column”].tolist(), df[“column”].tolist())):
print(item1, item2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Python: A generator expression is

A

the same as a list comprehension but can be passed into a function without turning it into a list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Python: To write a generator expression, type

A

(item for item in my_list if item >5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Python: To iteratively replace a list of characters with spaces, type
for item in [".", "?", "!"]: | text = text.replace(item, " ")
26
Python: To remove most of the html, style and scripts from a pages source, type
soup_page = BeautifulSoup(page, "html.parser") for script in soup_page.find_all(["script", "style"]): script.extract() text = soup.get_text() for item in [".", "?", "!", ",", " "]: text = text.replace(item, " ")
27
Pandas: To append a row of data to an existing df that is empty while also setting its column labels, type
df = df.append({"column1":"value", "column2":"value", "column3":"value"}, ignore_index=True)
28
sklearn: A confusion matrix is a
2x2 matrix with the y index of actual class and x index of predicted class, that counts how many values of each class were correctly or incorrectly predicted. It is a measure of how many false positives and false negatives there are.
29
sklearn: Generally it seems for text it is best to
Not use select percentile use a stemmer remove irrelevant symbols and characters use Tfidf instead of CountVect
30
sklearn: When choosing the training data it is pivotal to
not have mislabeled data
31
sklearn: When choosing the features, use
the all features you think have high information gain, and do whatever is necessary to get them into the dataset.
32
sklearn: It is very unlikely for NearestNeighbors
to outperform other models, and if it does it may be because the data has duplicates
33
Python: To open a file on windows in its default application, type
os.system("start /file/path.csv")
34
sklearn: The default settings for GridSearchCV should be
*if using DataFrameMapper* | sklearn_pandas.GridSearchCV(pipeline, param_grid=param_grid, verbose=3, scoring="accuracy", cv=10)
35
sklearn: When data is missing, it can be useful to
impute the data based on hints in the other columns. eg Mr. is associated with older age.
36
sklearn: To GridSearchCV the parameters of a model nested in a pipeline, type
import sklearn_pandas param_grid = {"setname__parameter":[10, 20, 30]} grid_model = sklearn_pandas.GridSearchCV(pipeline, param_grid=param_grid, verbose=3, scoring="accuracy", cv=10)
37
sklearn: To use sklearn_pandas.GridSearchCV, you cannot
have any custom transformer (I think)
38
adwords: In upgraded URLs, curly brackets with an underscore in the tracking template means,
That it is a variable name that must be assigned in one of the custom parameters.
39
adwords: Always export reports as
a nomal CSV, not the Excel type
40
Pandas: To make the last column the first, type
df = df.reindex_axis(["Conversions"] + [item for item in df.columns if item !="Conversions"], axis=1)
41
Python: To inherit from two parent classes, type
class Myclass(Parentclass1, Parentclass2):
42
GridSearchCV: To print all of the grid scores as GridSearchCV is working, type
verbose=3 | sklearn_pandas.GridSearchCV(pipe, param_grid, verbose=3)
43
GridSearchCV: To choose the number of folds GridSearchCV creates, type
cv =10 | sklearn_pandas.GridSearchCV(pipe, param_grid, verbose=3, scoring="accuracy", cv=10)
44
Python: To make a class that prompts for an input upon init and if it is not part of a list, it asks for the input again, type
class Myclass: def __init__(self, **args): def get_attribute(self): my_attribute = input("Attribute query?") if my_attribute != "chosen attribute": return get_attribute() else: get_attribute() self.my_attribute = self.get_attribute()
45
Python: To perform addition on just one index of a list, type
my_list[2] += 1 or my_list[2] = my_list[2] +1
46
Python: DRY means
grouping common operations into functions and common functionality in classes.
47
Python: To borrow classes from another python file in the same directory, type
from otherclassfile import Myotherclass class Myclass(Myotherclass):
48
Python: To extend two classes, type
class Myclass(Otherclass, Otherclass2)
49
Python: To override a function that is inherited from a class just
create a new function in the current class with the same name
50
Python: Before inheriting a class make sure to
import it.
51
Python: In order to set attributes of a class they must be
in def __init__(self, **args):
52
Python: All classes implicitly extend from
the Object class
53
Python: The format for a set is
{1,2,3}
54
Python: A set automatically
removes any non unique values and order the rest in ascending order.
55
Python: Can a set use a comprehension?
yes
56
Python: When a tuple is a paramater into a class it requires
it's own brackets
57
Pandas: To set the maximum rows and columns to display, type
pandas. set_option("display.max_rows", 1000) | pandas. set_option("display.max_columns", 1000)
58
Python: A dependency is
an external file that must be imported into the file you are running.
59
Python: To sum a list, type
sum(my_list)
60
sklearn: for RandomForestClassifer, the parameters you should GridSearchCV are
n_estimators, max_features, max_depth, min_samples_leaf
61
sklearn: If the desired output consists of one or more continuous variables
the task is called regression.
62
sklearn: In a confusion matrix it is ideal for the
main top left to bottom right has the highest numbers because that signifies correct classifications.
63
sklearn: Recall is
``` the rate of how often the algorithm misclassifies a sample that is in fact a certain class as another one. "This class only gets classified correctly x percent of the time." When a sample is in fact a certain class, how often is it classified correctly. ``` "in fact" Measure of false negatives for a class. true positives/(false negatives + true positives)
64
sklearn: Precision is
the rate of how often when a classification is made, how often it is correct. "When a classification is finally made for this sample we are x sure that is was made correctly" How often do samples of other classes get mistaken for this class. "when classification is made" Measure of true positives for a class. true positives/(false positives + true positives)
65
sklearn: When using an unbalanced dataset with many samples of one class and few of another, it is better to use the the evaluation metrics of
precision, recall, or f1 score (which is both) on a class by class basis.
66
sklearn: f1 score is a combination of
precision score and recall score
67
GridSearchCV: To use GridSearchCV with the goal metric as f1 type
scoring="f1"
68
udacity: To predict the time of arrival for at&t techs, udacity
binned the times into sections of the day and used a NearestNeighbors classifier to predict based on locality.
69
ML: A spurious attribute is an attribute that
should not have any bearing on the label so any features with an information gain lower than the spurious attribute can be ignored.
70
ML: A plot where the prediction line is very wiggly suggests
overfitting
71
ML: The definition of p-value is
idk
72
Stats: Stratified sampling is when
The population is grouped by a characteristic, and then a number of samples is pulled from each group to represent it.
73
Stats: Cluster sampling is when
you group samples based on a characteristic but then only pull samples from one of the groups.
74
Stats: Simple random sampling is when
All of the samples are grouped together and chosen chosen at random and then returned back into the pool at each draw.
75
Console: Vim is a... and the command is
A text editor and the command is: vi my_file.py
76
Python: To sort a list in place, type
my_list.sort()
77
Python: To reverse the elements of a list in place, type
my_list.reverse()
78
Python: To return a count the occurrences of a value in a list, type
my_list.count("value")
79
Python: To apply a lambda function to all of a lists items instead of using a list comprehension, type
list(map(lambda x: x*2, my_list))
80
Python: The map function returns
an object, not a list.
81
Python: To return a list of all of the
os.listdir("/users/student/desktop")
82
Python: When I see myself looping and appending, I should question whether
a list comprehension assigned to a variable would do.
83
sklearn: In order to optimize GridSeachCV towards f1, recall or precison, you must
make the labels binary (1 and 0) only.
84
sklearn: To make GridSearchCV run faster, add
n_jobs=-1 to the parameters
85
sklearn: To save the best GridSearchCV params to a variable, type
best_parameters = grid_search.best_estimator_.get_params()
86
Python: If you edit a class, make sure to
re-instantiate the instance afterwards so it can take on the new attributes.
87
Pandas: df["column"].str.contains("string") is
case sensitive
88
sklearn: To reattach the predictions to the samples, type
df["Prediction"] = pandas.Series(model_grid.predict(my_transformer(df_features)))
89
Selenium: to return the current url, type
my_browser.current_url
90
sklearn: for small datasets a classifier that can work well is
LinearSVC
91
sklearn: To return the best params and best score from a grid search, type
model_grid.get_params_ | model_grid.best_score_
92
sklearn: To return the best params from a grid search, type
model_grid.get_params_
93
numpy: To transpose a numpy array, type
my_array = np.array([1,2], [3,4]) my_array.T array([ [1, 3], [2, 4] ])
94
numpy: To merge to sets of columns, type
numpy.concatenate((a, b), axis=1)
95
marketing: A burst is
usually incentivized traffic for a short time.
96
marketing: Incentivized traffic is usually,
a sign up or download in exchange for a bribe like in game currency.
97
HTTP stands for
HyperText Transfer Protocol
98
HyperText is
text with links in it
99
Transfer protocol is
rules for getting data from one place to another
100
REST API stands for
Representational State Transfer
101
A stateless API means that
all information necessary to respond to a request is available in each individual request; no data, or state, is held by the server from request to request
102
axis=1 means
columns
103
numpy: to check the type of object an arroy is, type
my_numpy_array.dtype
104
mysql: To delete a table, type
DROP TABLES tablename;
105
mysql: To delete multiple tables , type
DROP TABLES tablename1, tablename2;
106
mysql: To insert multiple rows into a table together, type
INSERT INTO tablename VALUES ("String 1", "String 2"), ("String 1", "String 2");
107
mysql: Strings that you are inserting into a table must be
surrounded by quotes
108
re: To test if a regex matches a string in the python interpreter, type
import re | re.match(r'^org/?P\w+/$', 'org/companyA')
109
re: To create the variable that will be parseable by re from a txt file, type
import re file = open("my_file.txt", encoding="utf-8") data = file.read() file.close()
110
python: To chain multiple ands and ors into an if statement, type
if (True and True) and (False or True) or (False and False):
111
python: This returns if True and False: print("Hi")
nothing
112
python: This returns if True or False: print("Hi")
"Hi"
113
python: To turn ["A"] into ["A", "A", "A", "A"], type
["A"] * 4
114
pandas: For plots to display in the notebook, type
%pylab inline
115
Pandas: changes columns name uses the command
rename, not replace
116
Excel: When doing vlookup, set approx match to
0
117
Pandas: to remove the index when sending df to string, type
df.to_string(index=False)
118
flask: In order to use a file from the templates directory in a view function, type
import render_template ``` @app.route("/") def my_view_function(): return render_template("file.html") ```
119
flask: To open spots in the html template that are variable from the view, you need to
``` put {{ var_name }} in the template pass the variable into render_template like return render_template("file.html", var_name=var_name) ```
120
flask: when creating views remember to
set defaults for the variable that are supposed to be passed in.
121
flask: {{ var_name }} is used in
templates to pull a variable into it from the view.
122
flask: all html files must go in
the template directory
123
flask: the symbols for variable and blocks are
variable: {{ var_name }} block: {%block my_block %}{% endblock %}
124
flask: the file html you are extending must
have quotes around it
125
flask: html pages with after extending from layout.html and removing repeated html look like
{% extends "layout.html" %} {% block title %}{{ super() }} My Title Tag{% endblock %} {% block body_content %}

This is the content of my body

{% endblock %}
126
flask: To have a views route redirect you to another view function, type
from flask import redirect from flask import url_for ``` @app.rout("/save") def save(): return redirect(url_for("view_function")) ```
127
flask: to make a view function only allow post methods to access it, type
@app.route("/save", methods=["POST"])
128
flask: to access the form data POSTed into a view, type
request.form
129
flask: To set a cookie by instantiating a make_response object, type
import json import make_response import redirect import url_for @app.route("/save", methods=["POST"]) def save_view(): response = make_response(redirect(url_for("index.html"))) response.set_cookie("cookie_name", json.dumps(dict(request.form.items()))) return response
130
flask: to create a form thats action is to send a POST request to a view. (which is later made to set a cookie)
{% block my_body %} - form action="{{ url_for("save") }}" method="POST"> - label>Form title-/label> - input type="text" name="name" value="" autofocus> - input type="submit" value="default!"> - /form> {% endblock%}
131
flask: To be able to accept a form POST request, you must first
import request in the app file from flask import request
132
flask: In flask the cookie is set upon
the response to the browser
133
flask: When setting a cookie with a POST request from a form, the value of the cookie becomes
a dict with the key as the name from the name parameter in the form, and the value as the value inputted into the form.
134
flask: Cookies set on a browser have both
a name (which you give) and a value which is a dict with the name and value from the form field.
135
flask: To create a view meant for setting a cookie, type
``` def get_cookies(): try: cookie = json.loads(request.cookies.get("cookie1")) except: cookie = {} return data ``` @app.route("/save", methods=["POST"]) def save(): response = make_response(redirect(url_for("index"))) cookie = get_cookies() cookie.update(dict(request.form.items())) response.set_cookie("cookie1", json.dumps(cookie)) return response
136
flask: To make the default value of a form the value from a cookie, type
``` create the function that returns the cookie in dict format. def get_saved_data(): try: data = json.loads(request.cookies.get("cookie name")) except: data = {} return data ``` ``` Pass the cookie dict into the template. @app.route("/") def index(): data = get_saved_data() return render_template("index.html", data=data) ``` Set the value in the form.
137
flask: In the view that just received a POST request, to get all the form keys and values, type
request.form.items()
138
flask: To use a for loop in flask, type
{% for item in my_list %} -li>-h2>item-/h2>-/li> {% endfor %}
139
python: When you are inheriting from two parent classes, the order should be
most import parent class last.
140
flask: The three main files that go in the directory are
app.py, templates, static
141
flask: In a flask app the css file will be in
-link rel="stylesheet" href="../static/styles.css">
142
flask: To create a form meant to file uploads, type
- form action="" method="" enctype=multipart/form-data> - input type="file" value="value" name="name"> - /form>
143
flask: When I reference images from within an html view, it assumes
I am already in my static directory, so I can reference the files directly without changing levels.
144
flask: I must keep all static files without exception in
static/
145
flask: To upload a file, type
ALLOWED_EXTENSIONS = set(['txt', 'pdf', 'png', 'jpg', 'jpeg', 'gif']) ``` app = Flask(__name__) app.config['UPLOAD_FOLDER'] = '/home/alpalalpal/mysite/static' ``` @app.route('/4', methods=['GET', 'POST']) def upload_file(): if request.method == 'POST': file = request.files['file'] if file and allowed_file(file.filename): filename = secure_filename(file.filename) file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename)) return redirect(url_for('uploaded_file', filename=filename)) return ''' Upload new File

Upload new File

'''
146
flask: Loops inside flask blocks do not require
a colon
147
pyautogui: To scroll down, type
pyautogui.scroll(-10)
148
python: list(range(1,2)) has
1 item
149
Outbrain: The number of characters allowed in ads it
150
150
Hadoop is
an open-source software framework written in Java for distributed storage and distributed processing of huge data sets.
151
Statically typed programming languages
do type checking, which is verifying and enforcing the constraints of types at compile-time as opposed to run-time.
152
MapReduce is
an algorithm that allows you to query data in parallel on a distributed cluster of computers.
153
Big Data refers to at least
a terabyte of data
154
The four V's of IBM's definition of big data is
volume, variety, veracity and velocity
155
Apache Mahout is
library of scalable machine-learning algorithms, implemented on Apache Hadoop
156
Hadoop: HDFS stands for
Hadoop Distributed File System
157
Hadoop: A cluster usually has
one heavy duty computers and then 10-15 commodity computers
158
A node is a
single point in a network or single computer in a cluster.
159
os: To change your current working directory from within python, type
import os | os.chdir("C:\\folder")
160
ipython: Sometimes when there is an unusual error it can be ameliorated by
restarting he kernel
161
cookies: For security purposes, cookies can only be accessed by
the site that placed them.
162
python: To create a decorator with no arguments, type
``` def log(func): def inner(): print("string") return func() return inner ``` ``` @log def say_hello(): return "Hello there!" ``` say_hello()