misc learnings Flashcards
Python: use “_” to get the output of the last run command
Iterators: don’t wrap list() around them while actually iterating, only beforehand to check that it looks correct. Why?
Iterators don’t take up memory; they’re just one-use disposable instructions to get “the next item” using .next(), and as soon as it moves on to the next item, the previous item disappears from memory.
iter() # create iterator out of any iterable (e.g., out of a list)
“yield” is the iterator’s equivalent of “return”
What’s a function decorator and its syntax?
Meaning: modifies behavior of function in some way. E.g.:
from ediblepickle import checkpoint
@checkpoint # Modifies an API-reading function to instead read results from disk if it’s queried & saved-to-disk that exact thing before
Syntax: “@…” sits on top of a function definition, like literalleh (the line right above the “def” line)
Syntax to read lines from STDINPUT / “inputs with a handle” (like in many HackerRank problems)?
import sys
_ = sys.stdin.readline() # reads the first line; this works like an iterator, so the next identical call will actually read the 2nd line, etc.
sys.stdin.readlines() # reads all subsequent lines;
What’s a dictionary called in other languages?
A “hash map”. It points exactly to where in memory/disk a piece of info can be found so no search needs to take place to find it.
FLESH THIS OUT - IT’S IMPORTANT TO BE ABLE TO ANSWER HOW A HASH TABLE WORKS IN AN INTERVIEW. SEE TDI COURSE NOTES WEEK 3.1 + THE VIDEO OF DON BASICALLY CREATING A HASH PYTHON CLASS FROM SCRATCH.
BeautifulSoup package is simply an HTML parser, not a web scraper. Requests package is the web scraper.
Various Python syntax conventions:
“_” is a var that has to be assigned a value but isn’t used in subsequent code. E.g.:
for _,x in enumerate(my_list) if the number part of it won’t be used for anything.
Related: In sklearn, notation like .error_ indicates that these are things the model LEARNED and weren’t known before fitting it.
Use ALL CAPS for vars that are constants (e.g., a fixed URL, a fixed sleep time between get requests, etc)
pandas.read_html() !
Built-in HTML parser; returns list of dfs, each of which is supposed to be a table in the HTML page. Works well with realgm, actually.
What do df.filter(), df.query(), and df.assign() do?
df.filter() # another way to filter by column or index values, including regex, etc
df.filter(items=[‘one’, ‘three’], axis=0)
df.query() # SQL-like way to filter by values, with the expression completely in quotes
df.query(“CONTROL == 1 and MAIN == 1”)
df.assign() # another way to create a new column; can be chained with other commands
df = df.assign(new_col = lambda df: df.zip_code.str.extract(r”some_regex”))
Print(“A” “B” “C”) # WITHOUT commas to print a long string without going beyond the screen or triple quotes or line continuation chars
NetworkX standard commands
G = nx.Graph() # create empty graph
G.add_edges_from(edge_weight_tuples) # Sample tuple: (‘Alicia’, ‘Jerry Rey’, {‘weight’: 1})
# Watch out for adding five “A-B”s and 3 “B-A”s; the latter will overwrite the former, NOT add to it.
G.nodes
G.edges
G.edges[‘Andrew Galperin’] # dict-like syntax
“Heap” concept
Basically a tree branching downward; each set of child branches adds up to the parent’s total value. When a new element is introduced, it takes relatively little (~log) computing power to slot its magnitude in the correct place.
Python “heapq” package is for this. It “heapify”s lists, which remain list objects but become specially re-ordered.
What does “greedy” mean?
In regex context: an expression like “\d+” will match the “123” part of “abc123xyz”, not just the “1” part. It keeps going until it can’t no’ mo’, UNLESS you use a “lazy” modifier.
In other contexts: an algorithm that’s short-sighted and optimizes in a local way without considering the big picture.
Can use “long strings” (“”” “””) anywhere, even inside print() !
They’re also useful in re.compile()’s “verbose” regex where you can have new line separators without them counting as part of the regex.
What’s the StringIO package for?
Stands for String Input/Output. It’s for creating a file out of a (potentially super long) string. Can also extract the 2nd “column” from CSV data (similar to pandas’ .read_csv())