Python Code Knowledge Flashcards
What is getopt?
What is it’s syntax?
Is a “command line option parser”.
It parses an argument sequence such as sys.argv and returns a sequence of (option, argument) pairs and a sequence of non-options arguments.
Syntax: (option, argument) = getopt.getopt([‘-a’, ‘-bval’, ‘-c’, ‘val’], ‘ab:c’)
As you see that it outputs a pair, this is why you need to equate to a pair.
What is numpy?
Numpy is the core library for scientific computing.
Numpy provides a high-performance multidimensional array object. Alongside this, it gives tools to work with these arrays.
It allows you to get the same sort of functionality as Matlab.
What is pip?
It is the preferred used installer program for modules in python.
Since python 3.4 pip has been included by default with python.
Almost all packages that you hear of will be available with pip install
“PIP is a package manager for Python packages, or modules if you like.”
“A package contains all the files you need for a module.”
“Modules are Python code libraries you can include in your project.”
How do you create your own modules in python?
Modules are simply python scripts that are imported into another script.
First, write up your function and save it.
How do you write to a file?
First of all you need to have ‘w’ in your file open line.
Then use .write() to add you text. *Note writing to a file clears the file of anything beforehand, to add to a file you need to use append i.e. ‘a’ in the open line*
Within the brackets of .write() you want to write the text.
What does CSV stand for?
Comma separated variables
The delimiter determines what separated the variables. It doesn’t have to be commas, it can pretty much be anything.
What is pandas?
“Pandas is a python software library for data manipulation and analysis. “
“Pandas is a python package providing fast, flexible, and expressive data structures designed to make working with ‘relational’ and ‘labeled’ data both easy and intuitive.
What is matplotlib?
Is plotting library/package.
What is beautfiulsoup?
Is a python library for pulling data out of HTML and XML files.
How do you permanently set the index for a data frame?
To set the index you use .set_index(“Desired Index”)
To make it permanent you need to add another parameter in the brackets, inplace, and it needs to be set to True.
How would you access a single column from a dataframe?
In the same way you would get the values in a dictionary.
dataframe_name[‘Desired_Column_Name’]
Or
dataframe_name.Desired_Column_Name()
How do you convert a dataframe column into a list?
What do you need to remember?
dataframe_name.column_name.tolist()
Only works on one column at a time.
If you want to print to columns of a dataframe what should you do ?
You can extract the columns by referencing both columns with respect to the dataframe, within double square brackets.
Alternatively, you could convert the dataframe into an array using np.array(dataframe[[‘Column1’, ‘Column2’]]).
When reading in a csv file through .read_csv() how can we define the index column?
within the brackets you place index_col and equate it to 0 (or whatever index you want)
What does .to_csv() do ?
It converts a dataframe in python to a csv file.
Before the full stop you place the dataframe you can to convert into a dataframe.
dataframe_name.to_csv()
How do you block comment in mac ?
cmd + 1
cmd +4 comments the block but also puts two lines of hyphens above and below the block.
How do you get data from Quandl?
using quandl.get()
within .get() you place the ‘TickerID’, authtoken i.e API and optionally a start_date and/or end_date.
What does pandas_datareader do ?
Up to date remote data access for pandas, works for multiple versions of pandas.
How do you read a CSV file and convert it to a dataframe?
What do you need to add to convert a date column to the index?
pd.read_csv(“file_name.csv”)
Plug in the parameters (parse_dates = True, index_col = 0) into the brackets.
If you make any changes to the iphyton console in preferences what else do you need to remember to do?
Reset ipython console kernel.
(Small cog wheel in the top right)
If the index isn’t sorted in the right direction how can you sort it correctly?
.sort_index(axis = 0, inplace=True, ascending = True)
When parsing out paragraphs from a webpage using BeautifulSoup, what do you need to remove the tags so that you’re only left with strings?
What is the most simple way to extract the strings?
You can use .string or .text. The difference between the two is that .string won’t show any string with ‘child’ tags.
In most case you probably want .text
That being said, the most simple way is to take your BeautifulSoup object say soup and apply the module .get_text(), i.e. soup.get_text(). Note that the two forms are not exactly the same.
What is pickling?
Why would you use it?
How would you use it?
Pickling is the serializing and de-serializing of python objects to a byte stream. Unpicking is the opposite.
Pickling is used to store python objects. This means things like lists, dictionaries, class objects, and more.
Pickling will be most useful for data analysis, when you are performing routine tasks on data, such as pre-processing. Also used when working with python specific data types, such as dictionaries.
If you have a large dataset and you’re loading a massive dataset into memory every time you run the program, it makes sense to just pickle the data and load that. It may be 50-100x faster.
How to pickle?
After having imported the pickle you need to open the file that is pickled.
pickle_in = open(“dict_pickle”, ‘rb’) # rb stands for read byte
example_dict = pickle.load(pickle_in)
What module is building block for scraping web pages with BeauftifulSoup?
After obtaining the desired content by applying .find() on soup(webpage contained by bs4), it is still covered and surrounded by HTML code. To remove it you need to use .find_all(), within the brackets you need to state what tag you want to remove.
What is a quick way to extract table data from webpages?
use
pd.read_html(“url”)
How do you add pickle data to a pickle file?
pickle.dump(x,y)
x = what you want to add
y = the pickle file you want to ‘dump’ it in.
After having extracted table data from a webpage through beautiful soup, you want to iterate through table elements one by one, how would you do that?
1st line: iterate through each table row, except for the top row as these are the column labels.
for row in table.findAll(‘tr’)[1:]:
ticker = row.findAll(‘td’).text
tickers.append(ticker)
2nd line: essentially what this says is find all table data for this row (hence td), convert this data into text. You could slice this list if you only desire content from specific columns.
If you want to open an already existing pickle file, what do you need?
pickle.load(x)
x = pickle file you want to access
This can be assigned to a variable to save it.
How would you make a new directory?
os.makedirs(‘x’)
x = define directory name
If you have a module applied to library that you consistently use or are going to use, what would you do to make your code writing more efficient?
Import the library with the module attached and give it a shortened name value using ‘as’.
For example, since the pyplot module is heavily used in matplotlib, it is common to find the module with the library imported and defined as plt.
i.e. import matplotlib.pyplot as plt
When using matplotlib, what do you need to do to make plt.legend() work?
You need to label your plots, after adding the x and y variables, add a third parameter label.
How do you read a csv file in python?
You read a csv using csv.reader(csv_file_name, delimiter = ‘,’)
The delimiter is what the values will be separated by. In the case above they are separated by a comma.
How can you use numpy to load data from files ?
import numpy as np
np.loadtxt(“File_name.type”, delimiter = ‘,’ , unpack = True)
*Note* The file does not have to be a .txt, it can be a .csv, it can be any file with text in it.
It’s also important to remember to add unpack = True if you have two variables to unpack.
What does .split() do ?
when applied to a string it returns a LIST of the all the words in the string.
How do you open URLs using the urllib library ?
urllib.request.urlopen()
Inside the brackets you paste url within commas.
Using the os library how do you return the current working directory from a python script?
os.getcwd()
its easier to remember the module if you look at what cwd abbreviates, its an abbreviation for current working directory.
What is sys.argv ?
sys.agrv allows you to pass a list of command line arguments from the terminal.
It is a list in python which contains the command-line arguments passed to the script.
What library would you use to search in a body of text?
How would you find all the numbers in a text?
you should use Regular Expressions written as re in python.
re.findall(r’\d’, x)
x = text variable
If you want to POST to an URL what are the necessary steps that you need to take?
What changes when you want to do GET request?
- You first need to define the variables that you intend to post in a dictionary, reffered to as values.
- For the URL to understand the values it needs to be encoded using data = urllib.parse.urlencode( values). There’s another encoding step after that, encoding to utf-8 bytes, i.e. data = data.encode(‘utf-8’)
- Once the data is encoded the next step is code a request to the URL to post your values. req = urllib.request.Request(url, data)
- The following step is to open the URL with request added on, urllib.request.urlopen(req). Opening the URL with the request will return a response, this will be assigned to the variable resp, i.e. resp = urllib.request.urlopen(req).
- Finally to see the response .read() needs to be applied to resp.
A GET request is pretty similar to part 4 above. It uses the same base code urllib.request.urlopen() but now we need to decode. The code should look like this, urllib.request.urlopen(website_url).read().decode()
If you want to combine multiple plots on the same grid, what module in plt do you need to use?
If you want to graph two plots on the same grid of 6 row pieces and 1 column piece, with ax1 taking up 5 rows across the 1 column and ax2 taking the rest, what would the code be?
plt.subplot2grid((x),(y))
x is a tuple stating the number of rows and columns, y is a tuple specifying the origin of the plot.
ax1 = subplot2grid((6,1),(0,0), rowspan = 5, colspan = 1)
ax2 = subplot2grid((6,1), (5,1), rowspan = 1, colspan = 1)
*note* you need to remember to adjust the start point.
If you have defined a subplot called ax1, how do you access the labels to change them (not to change the name, to rotate, etc.)?
ax1.xaxis.get_ticklabels()
if you want to access the y axis jus change xaxis to yaxis.
If you want to plot OHLC candles in python what do you need to import?
you need to use matplotlib.finance to import candlestick_ohlc
this is written as:
from matplotlib.finance import candlestick_ohlc
How do you add text to a graph ax1 based on matplotlib?
Two options:
ax1. annotate()
ax1. text()
For ax1.annotate, the first parameter is the what you want to annotate, it needs to be a string, so ints and floats need to be converted to strings. The second parameter is where you want to annotate, if you’re using candlesticks you can specify a specific candle and choose where you want to annotate on the candle, ohlc.
With deep learning, how should you approach testing?
The price data is split up into the training set and a test set.
The model is built on the training set and then applied to the unseen test set to see if similar results are obtained.
How is .loc() used ?
It is a module applied to a dataframe say df to access a group of rows or columns using the labels used.
Note that placing one label in loc returns the values in that row (or column) as a series.
If there is more than one label, then a dataframe is returned.
How do you parse webpage content using Beautiful Soup?
How is this applied to tables?
With import bs4 as bs.
To parse the content we need to first convert the URL data into a Beautiful Soup object. The Beautiful Soup object is obtained by applying the .BeautifulSoup() module to bs from bs4 library. It is by convention that the object is assigned to the variable soup, i.e. soup = bs.BeautifulSoup(text, ‘ lxml ‘).
With the content now as a Beautiful Soup object, other modules in the library can be applied to parse it.
One of the most common modules is find_all(), it is used on ‘soup’ and it allows you to filter specific content based on HTML tags.
For example, if you want to extract all the URLs in the webpage you can write soup.find_all(‘a’).
To find whole tables you need to apply the module soup.find(‘table’, {‘class’: ‘wikitable sortable’}). From there you can use find_all() to filter through the table rows(‘tr’) and within table rows you can access the table data (‘td’).
*Note: Beautiful Soup does not acquire web page content, this needs to be done using urllib or requests.*
What are the arguments for using the request over urllib?
The request package allows you to do what urllib does bu shorter and more succinct.
It only takes one line to get content from a URL
resp = requests.get(‘url’)
Similarly, posting information to a URL is a lot shorter. To post the request.post() module simply takes a dictionary as the argument.
search_data = {“search”: “Hello World”}
resp = request.post(‘url’, data=search_data)
How does .join() work for pandas?
It joins columns with other data frames either on the index or on a key column.
There are optional parameters to customize the joining, one important parameter you need to consider is ‘how’ it is going to join. The default of ‘how’ is set to left, which means that the calling frame’s index is used, right is the opposite, outer forms union of calling frame index with other and sorts it lexographically, lastly inner is the opposite of outer it forms an intersection.
What happens when you apply .values to a panda dataframe?
A numpy representation of the dataframe is returned.
If you apply a .shape() to a numpy array, what is returned?
A tuple with the numpy array dimensions.
What does numpy.arange(x) do ?
Return evenly spaced values within a given interval.
What does ax.xaxis.tick_top() do?
Move ticks and ticklabel (if present) to the top of the axes.
What does pandas.DataFrame.columns do ?
Returns the column labels of the dataframe.
When you need to remove a column from a dataframe using .drop() what does the axis parameter need to be set to?
To remove a column you need to set axis equal to 1.
When using .drop() on a dataframe and you keep the inplace parameter on the default False, what will happen?
Leaving inplace to false does not permanently change the dataframe. To change the underlying data of the dataframe you need to set inplace to True.
One way to view it is that you want your changes to stay in place, which is why you set it to True.
The default value of False for inplace is useful as it allows you to test the changes before making permanent changes.
What are two modules you can use to create heatmaps with matplotlib?
imshow()
&
pcolormesh()
*args and **kwargs
What are they used for?
They are mostly used in function definitions.
*args and **kwargs allow you to pass a variable number of arguments to a function. In other words, the number of arguments is dependent on the user.
*args is typicaly seen as a list (note that isn’t exactly the same).
**kwargs is seen as a dictionary as you need to pass keyworded arguments, i.e. name =”potato” where name is the keyword and potato the value.
A good way to remember what **kwargs do is to remember that ‘kw’ stands for keyword, so essentially its **keywordargs.
How do you add columns to a pandas dataframe?
It is essentially the same as dictionaries, you apply index brackets to the dataframe to assign the column name, this is then equated to what values you want in the column.