misc learnings Flashcards

1
Q

Python: use “_” to get the output of the last run command

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Iterators: don’t wrap list() around them while actually iterating, only beforehand to check that it looks correct. Why?

A

Iterators don’t take up memory; they’re just one-use disposable instructions to get “the next item” using .next(), and as soon as it moves on to the next item, the previous item disappears from memory.

iter() # create iterator out of any iterable (e.g., out of a list)

“yield” is the iterator’s equivalent of “return”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s a function decorator and its syntax?

A

Meaning: modifies behavior of function in some way. E.g.:
from ediblepickle import checkpoint
@checkpoint # Modifies an API-reading function to instead read results from disk if it’s queried & saved-to-disk that exact thing before

Syntax: “@…” sits on top of a function definition, like literalleh (the line right above the “def” line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Syntax to read lines from STDINPUT / “inputs with a handle” (like in many HackerRank problems)?

A

import sys

_ = sys.stdin.readline() # reads the first line; this works like an iterator, so the next identical call will actually read the 2nd line, etc.

sys.stdin.readlines() # reads all subsequent lines;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What’s a dictionary called in other languages?

A

A “hash map”. It points exactly to where in memory/disk a piece of info can be found so no search needs to take place to find it.

FLESH THIS OUT - IT’S IMPORTANT TO BE ABLE TO ANSWER HOW A HASH TABLE WORKS IN AN INTERVIEW. SEE TDI COURSE NOTES WEEK 3.1 + THE VIDEO OF DON BASICALLY CREATING A HASH PYTHON CLASS FROM SCRATCH.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

BeautifulSoup package is simply an HTML parser, not a web scraper. Requests package is the web scraper.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Various Python syntax conventions:

A

“_” is a var that has to be assigned a value but isn’t used in subsequent code. E.g.:
for _,x in enumerate(my_list) if the number part of it won’t be used for anything.

Related: In sklearn, notation like .error_ indicates that these are things the model LEARNED and weren’t known before fitting it.

Use ALL CAPS for vars that are constants (e.g., a fixed URL, a fixed sleep time between get requests, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

pandas.read_html() !
Built-in HTML parser; returns list of dfs, each of which is supposed to be a table in the HTML page. Works well with realgm, actually.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do df.filter(), df.query(), and df.assign() do?

A

df.filter() # another way to filter by column or index values, including regex, etc
df.filter(items=[‘one’, ‘three’], axis=0)

df.query() # SQL-like way to filter by values, with the expression completely in quotes
df.query(“CONTROL == 1 and MAIN == 1”)

df.assign() # another way to create a new column; can be chained with other commands
df = df.assign(new_col = lambda df: df.zip_code.str.extract(r”some_regex”))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Print(“A” “B” “C”) # WITHOUT commas to print a long string without going beyond the screen or triple quotes or line continuation chars

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

NetworkX standard commands

A

G = nx.Graph() # create empty graph

G.add_edges_from(edge_weight_tuples) # Sample tuple: (‘Alicia’, ‘Jerry Rey’, {‘weight’: 1})
# Watch out for adding five “A-B”s and 3 “B-A”s; the latter will overwrite the former, NOT add to it.

G.nodes
G.edges
G.edges[‘Andrew Galperin’] # dict-like syntax

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

“Heap” concept

A

Basically a tree branching downward; each set of child branches adds up to the parent’s total value. When a new element is introduced, it takes relatively little (~log) computing power to slot its magnitude in the correct place.

Python “heapq” package is for this. It “heapify”s lists, which remain list objects but become specially re-ordered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does “greedy” mean?

A

In regex context: an expression like “\d+” will match the “123” part of “abc123xyz”, not just the “1” part. It keeps going until it can’t no’ mo’, UNLESS you use a “lazy” modifier.

In other contexts: an algorithm that’s short-sighted and optimizes in a local way without considering the big picture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Can use “long strings” (“”” “””) anywhere, even inside print() !

They’re also useful in re.compile()’s “verbose” regex where you can have new line separators without them counting as part of the regex.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What’s the StringIO package for?

A

Stands for String Input/Output. It’s for creating a file out of a (potentially super long) string. Can also extract the 2nd “column” from CSV data (similar to pandas’ .read_csv())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In ML, what is a “stateless” transformer?

A

One that transforms every input in the same predetermined way (e.g., Z-score it), without needing to really “fit” the data first. These are easy to create custom because they can be created from a function. (In contrast, a non-stateless transformer has to be written directly as a Class).

17
Q

What does “non-parametric” mean?

A

Simply that the # of parameters isn’t known in advance. E.g., a Decision Tree regressor is non-parametric because the # of nodes isn’t known until the model is trained (fitted).

18
Q

What are the 3 variations on hyperparameter tuning?

A
  1. Grid search: brute force - tests every single combo of hyperparameter values. Can quickly get out of hand
  2. Randomized grid search: a little better.
  3. Bayesian grid search: usually best! Uses the outcome of each hyperparameter combo test to help choose the next hyperparameter combo. Essentially learns from its experience.
19
Q

Why sigmoid function used for logit? (at a high level)

A

It penalizes “confident wrongness” a lot more than, say, SSE. That molds the model in a way that it’s less likely to predict the wrong thing out of 0 and 1 with high confidence.
(remember that a logit model’s output is actually a confidence value between 0 and 1 that gets rounded to the nearest integer. E.g., .02 means it’s very confident that it’s probably a 0)

20
Q

Python packages involved in creating an interactive web app from beginning (database) to end

A

Python-SQL interface: sqlalchemy and psycopg2 (in Jupyter magic: “%load_ext sql”, followed by “%%sql” at top of every cell that’ll be SQL code)

“.env” file: for storing & aliasing “secret” info while still sharing code, like the parts of DB or API connection URL that are usernames/pwords

Something about “SQL injection” attacks - I vaguely understood it at the time

Virtual environments: a collection of packages isolated from the rest of machine that are just for this project. “requirements.txt” stores the list of packages.
pip install -r requirements.txt

venv package: also for something virtual environment-related

21
Q

Altair is essentially the best simulation of D3 in Python!

Very easy to have interactive charts controlling/filtering other charts.

Ingests and also generates JSONs for standalone HTML pages or embedding the viz HTML within a larger web page!

COULD BE SUPER USEFUL FOR US IN CONFLUENCE OR EVEN POTENTIALLY FOR REPLACING OAC ALTOGETHER.

A
22
Q

What’s the idea behind Quicksort algorithm? (one of 3 well-known sorting algorithms)

A

Recursive algorithm

“pivot” is in rightmost position. Keep swapping items until pivot is somewhere in middle, w/everything to the left smaller than it and everything to the right larger than it (but not necessarily sorted within themselves). Keep repeating to each chunk left & right of the pivot until the chunks are 1 element.

23
Q

Pretty-printing multiple objects (dfs) from one block of code using display(HTML())
(can’t do it with regular print() because it starts looking janky)

A

from IPython.display import display, HTML

display(HTML(df.to_html()))

24
Q

How to print all defined variables?

A

whos

25
Q

2_000 is understood the same as 2000! (way to separate visually)

A
26
Q

DON’T DO: if len(my_list)==0:
DO DO: if not my_list:

A
27
Q

Can you use PCA on sparse matrices?

A

Not a good idea - PCA involves centering, and centering a bunch of 0s and a few non-0s will result all the 0s becoming non-0s, thereby losing the sparsity of the matrix.

28
Q

Python magics to time a command or cell?

A

%time # times the next line

%%time # times the entire cell

29
Q

How to make a folder into a Python “module”? As in, .py files from it can be imported into elsewhere?

A

Created empty file “__init__.py” and put it in the folder

30
Q

How to merge together the 2nd- and 3rd- most recent git commits into one commit?

A

git rebase -i HEAD-3 # Creates interactive session with the last 3 commits.
This opens an interactive vim editor.
Press “i” to actually edit.
Change the 2nd most recent commit’s label to “squash” it into the 3rd.
Esc, then “:wq” to actually save the changes
This opens a 2nd interactive vim editor, confirming the combined commit message for the new combined commit.
Esc, then “:wq” to save the changes AGAIN.

31
Q

How to view all variables in memory and their types (Jupyter or Spyder)?

A

%whos

32
Q

Numpy function for breaking up a continuous var into categories?

A

np.digitize(x=dfp.investor_real_irr, bins=[-1, 0, .1, np.inf])