Python Code Knowledge Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is getopt?

What is it’s syntax?

A

Is a “command line option parser”.

It parses an argument sequence such as sys.argv and returns a sequence of (option, argument) pairs and a sequence of non-options arguments.

Syntax: (option, argument) = getopt.getopt([‘-a’, ‘-bval’, ‘-c’, ‘val’], ‘ab:c’)

As you see that it outputs a pair, this is why you need to equate to a pair.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is numpy?

A

Numpy is the core library for scientific computing.

Numpy provides a high-performance multidimensional array object. Alongside this, it gives tools to work with these arrays.

It allows you to get the same sort of functionality as Matlab.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is pip?

A

It is the preferred used installer program for modules in python.

Since python 3.4 pip has been included by default with python.

Almost all packages that you hear of will be available with pip install

“PIP is a package manager for Python packages, or modules if you like.”

“A package contains all the files you need for a module.”

“Modules are Python code libraries you can include in your project.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you create your own modules in python?

A

Modules are simply python scripts that are imported into another script.

First, write up your function and save it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you write to a file?

A

First of all you need to have ‘w’ in your file open line.

Then use .write() to add you text. *Note writing to a file clears the file of anything beforehand, to add to a file you need to use append i.e. ‘a’ in the open line*

Within the brackets of .write() you want to write the text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does CSV stand for?

A

Comma separated variables

The delimiter determines what separated the variables. It doesn’t have to be commas, it can pretty much be anything.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is pandas?

A

“Pandas is a python software library for data manipulation and analysis. “

“Pandas is a python package providing fast, flexible, and expressive data structures designed to make working with ‘relational’ and ‘labeled’ data both easy and intuitive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is matplotlib?

A

Is plotting library/package.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is beautfiulsoup?

A

Is a python library for pulling data out of HTML and XML files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you permanently set the index for a data frame?

A

To set the index you use .set_index(“Desired Index”)

To make it permanent you need to add another parameter in the brackets, inplace, and it needs to be set to True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How would you access a single column from a dataframe?

A

In the same way you would get the values in a dictionary.

dataframe_name[‘Desired_Column_Name’]

Or

dataframe_name.Desired_Column_Name()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you convert a dataframe column into a list?

What do you need to remember?

A

dataframe_name.column_name.tolist()

Only works on one column at a time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If you want to print to columns of a dataframe what should you do ?

A

You can extract the columns by referencing both columns with respect to the dataframe, within double square brackets.

Alternatively, you could convert the dataframe into an array using np.array(dataframe[[‘Column1’, ‘Column2’]]).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When reading in a csv file through .read_csv() how can we define the index column?

A

within the brackets you place index_col and equate it to 0 (or whatever index you want)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does .to_csv() do ?

A

It converts a dataframe in python to a csv file.

Before the full stop you place the dataframe you can to convert into a dataframe.

dataframe_name.to_csv()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you block comment in mac ?

A

cmd + 1

cmd +4 comments the block but also puts two lines of hyphens above and below the block.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you get data from Quandl?

A

using quandl.get()

within .get() you place the ‘TickerID’, authtoken i.e API and optionally a start_date and/or end_date.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does pandas_datareader do ?

A

Up to date remote data access for pandas, works for multiple versions of pandas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you read a CSV file and convert it to a dataframe?

What do you need to add to convert a date column to the index?

A

pd.read_csv(“file_name.csv”)

Plug in the parameters (parse_dates = True, index_col = 0) into the brackets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If you make any changes to the iphyton console in preferences what else do you need to remember to do?

A

Reset ipython console kernel.

(Small cog wheel in the top right)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

If the index isn’t sorted in the right direction how can you sort it correctly?

A

.sort_index(axis = 0, inplace=True, ascending = True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When parsing out paragraphs from a webpage using BeautifulSoup, what do you need to remove the tags so that you’re only left with strings?

What is the most simple way to extract the strings?

A

You can use .string or .text. The difference between the two is that .string won’t show any string with ‘child’ tags.

In most case you probably want .text

That being said, the most simple way is to take your BeautifulSoup object say soup and apply the module .get_text(), i.e. soup.get_text(). Note that the two forms are not exactly the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is pickling?

Why would you use it?

How would you use it?

A

Pickling is the serializing and de-serializing of python objects to a byte stream. Unpicking is the opposite.

Pickling is used to store python objects. This means things like lists, dictionaries, class objects, and more.

Pickling will be most useful for data analysis, when you are performing routine tasks on data, such as pre-processing. Also used when working with python specific data types, such as dictionaries.

If you have a large dataset and you’re loading a massive dataset into memory every time you run the program, it makes sense to just pickle the data and load that. It may be 50-100x faster.

How to pickle?

After having imported the pickle you need to open the file that is pickled.

pickle_in = open(“dict_pickle”, ‘rb’) # rb stands for read byte

example_dict = pickle.load(pickle_in)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What module is building block for scraping web pages with BeauftifulSoup?

A

After obtaining the desired content by applying .find() on soup(webpage contained by bs4), it is still covered and surrounded by HTML code. To remove it you need to use .find_all(), within the brackets you need to state what tag you want to remove.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a quick way to extract table data from webpages?

A

use

pd.read_html(“url”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How do you add pickle data to a pickle file?

A

pickle.dump(x,y)

x = what you want to add

y = the pickle file you want to ‘dump’ it in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

After having extracted table data from a webpage through beautiful soup, you want to iterate through table elements one by one, how would you do that?

A

1st line: iterate through each table row, except for the top row as these are the column labels.

for row in table.findAll(‘tr’)[1:]:

ticker = row.findAll(‘td’).text

tickers.append(ticker)

2nd line: essentially what this says is find all table data for this row (hence td), convert this data into text. You could slice this list if you only desire content from specific columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

If you want to open an already existing pickle file, what do you need?

A

pickle.load(x)

x = pickle file you want to access

This can be assigned to a variable to save it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How would you make a new directory?

A

os.makedirs(‘x’)

x = define directory name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

If you have a module applied to library that you consistently use or are going to use, what would you do to make your code writing more efficient?

A

Import the library with the module attached and give it a shortened name value using ‘as’.

For example, since the pyplot module is heavily used in matplotlib, it is common to find the module with the library imported and defined as plt.

i.e. import matplotlib.pyplot as plt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

When using matplotlib, what do you need to do to make plt.legend() work?

A

You need to label your plots, after adding the x and y variables, add a third parameter label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How do you read a csv file in python?

A

You read a csv using csv.reader(csv_file_name, delimiter = ‘,’)

The delimiter is what the values will be separated by. In the case above they are separated by a comma.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How can you use numpy to load data from files ?

A

import numpy as np

np.loadtxt(“File_name.type”, delimiter = ‘,’ , unpack = True)

*Note* The file does not have to be a .txt, it can be a .csv, it can be any file with text in it.

It’s also important to remember to add unpack = True if you have two variables to unpack.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What does .split() do ?

A

when applied to a string it returns a LIST of the all the words in the string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How do you open URLs using the urllib library ?

A

urllib.request.urlopen()

Inside the brackets you paste url within commas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Using the os library how do you return the current working directory from a python script?

A

os.getcwd()

its easier to remember the module if you look at what cwd abbreviates, its an abbreviation for current working directory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is sys.argv ?

A

sys.agrv allows you to pass a list of command line arguments from the terminal.

It is a list in python which contains the command-line arguments passed to the script.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What library would you use to search in a body of text?

How would you find all the numbers in a text?

A

you should use Regular Expressions written as re in python.

re.findall(r’\d’, x)

x = text variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

If you want to POST to an URL what are the necessary steps that you need to take?

What changes when you want to do GET request?

A
  1. You first need to define the variables that you intend to post in a dictionary, reffered to as values.
  2. For the URL to understand the values it needs to be encoded using data = urllib.parse.urlencode( values). There’s another encoding step after that, encoding to utf-8 bytes, i.e. data = data.encode(‘utf-8’)
  3. Once the data is encoded the next step is code a request to the URL to post your values. req = urllib.request.Request(url, data)
  4. The following step is to open the URL with request added on, urllib.request.urlopen(req). Opening the URL with the request will return a response, this will be assigned to the variable resp, i.e. resp = urllib.request.urlopen(req).
  5. Finally to see the response .read() needs to be applied to resp.

A GET request is pretty similar to part 4 above. It uses the same base code urllib.request.urlopen() but now we need to decode. The code should look like this, urllib.request.urlopen(website_url).read().decode()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

If you want to combine multiple plots on the same grid, what module in plt do you need to use?

If you want to graph two plots on the same grid of 6 row pieces and 1 column piece, with ax1 taking up 5 rows across the 1 column and ax2 taking the rest, what would the code be?

A

plt.subplot2grid((x),(y))

x is a tuple stating the number of rows and columns, y is a tuple specifying the origin of the plot.

ax1 = subplot2grid((6,1),(0,0), rowspan = 5, colspan = 1)

ax2 = subplot2grid((6,1), (5,1), rowspan = 1, colspan = 1)

*note* you need to remember to adjust the start point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

If you have defined a subplot called ax1, how do you access the labels to change them (not to change the name, to rotate, etc.)?

A

ax1.xaxis.get_ticklabels()

if you want to access the y axis jus change xaxis to yaxis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

If you want to plot OHLC candles in python what do you need to import?

A

you need to use matplotlib.finance to import candlestick_ohlc

this is written as:

from matplotlib.finance import candlestick_ohlc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

How do you add text to a graph ax1 based on matplotlib?

A

Two options:

ax1. annotate()
ax1. text()

For ax1.annotate, the first parameter is the what you want to annotate, it needs to be a string, so ints and floats need to be converted to strings. The second parameter is where you want to annotate, if you’re using candlesticks you can specify a specific candle and choose where you want to annotate on the candle, ohlc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

With deep learning, how should you approach testing?

A

The price data is split up into the training set and a test set.

The model is built on the training set and then applied to the unseen test set to see if similar results are obtained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

How is .loc() used ?

A

It is a module applied to a dataframe say df to access a group of rows or columns using the labels used.

Note that placing one label in loc returns the values in that row (or column) as a series.

If there is more than one label, then a dataframe is returned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

How do you parse webpage content using Beautiful Soup?

How is this applied to tables?

A

With import bs4 as bs.

To parse the content we need to first convert the URL data into a Beautiful Soup object. The Beautiful Soup object is obtained by applying the .BeautifulSoup() module to bs from bs4 library. It is by convention that the object is assigned to the variable soup, i.e. soup = bs.BeautifulSoup(text, ‘ lxml ‘).

With the content now as a Beautiful Soup object, other modules in the library can be applied to parse it.

One of the most common modules is find_all(), it is used on ‘soup’ and it allows you to filter specific content based on HTML tags.

For example, if you want to extract all the URLs in the webpage you can write soup.find_all(‘a’).

To find whole tables you need to apply the module soup.find(‘table’, {‘class’: ‘wikitable sortable’}). From there you can use find_all() to filter through the table rows(‘tr’) and within table rows you can access the table data (‘td’).

*Note: Beautiful Soup does not acquire web page content, this needs to be done using urllib or requests.*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What are the arguments for using the request over urllib?

A

The request package allows you to do what urllib does bu shorter and more succinct.

It only takes one line to get content from a URL

resp = requests.get(‘url’)

Similarly, posting information to a URL is a lot shorter. To post the request.post() module simply takes a dictionary as the argument.

search_data = {“search”: “Hello World”}

resp = request.post(‘url’, data=search_data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

How does .join() work for pandas?

A

It joins columns with other data frames either on the index or on a key column.

There are optional parameters to customize the joining, one important parameter you need to consider is ‘how’ it is going to join. The default of ‘how’ is set to left, which means that the calling frame’s index is used, right is the opposite, outer forms union of calling frame index with other and sorts it lexographically, lastly inner is the opposite of outer it forms an intersection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What happens when you apply .values to a panda dataframe?

A

A numpy representation of the dataframe is returned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

If you apply a .shape() to a numpy array, what is returned?

A

A tuple with the numpy array dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What does numpy.arange(x) do ?

A

Return evenly spaced values within a given interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What does ax.xaxis.tick_top() do?

A

Move ticks and ticklabel (if present) to the top of the axes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What does pandas.DataFrame.columns do ?

A

Returns the column labels of the dataframe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

When you need to remove a column from a dataframe using .drop() what does the axis parameter need to be set to?

A

To remove a column you need to set axis equal to 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

When using .drop() on a dataframe and you keep the inplace parameter on the default False, what will happen?

A

Leaving inplace to false does not permanently change the dataframe. To change the underlying data of the dataframe you need to set inplace to True.

One way to view it is that you want your changes to stay in place, which is why you set it to True.

The default value of False for inplace is useful as it allows you to test the changes before making permanent changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

What are two modules you can use to create heatmaps with matplotlib?

A

imshow()

&

pcolormesh()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

*args and **kwargs

What are they used for?

A

They are mostly used in function definitions.

*args and **kwargs allow you to pass a variable number of arguments to a function. In other words, the number of arguments is dependent on the user.

*args is typicaly seen as a list (note that isn’t exactly the same).

**kwargs is seen as a dictionary as you need to pass keyworded arguments, i.e. name =”potato” where name is the keyword and potato the value.

A good way to remember what **kwargs do is to remember that ‘kw’ stands for keyword, so essentially its **keywordargs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

How do you add columns to a pandas dataframe?

A

It is essentially the same as dictionaries, you apply index brackets to the dataframe to assign the column name, this is then equated to what values you want in the column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

For machine learning, explain features and labels.

A

Simply put, a feature is an input; the label is an output.

A feature is a single column of data in your input set. For example if you’re trying to predict what sort of degree someone might choose your input features might be gender, region, family income, etc. The label is the final choice.

After having trained the model give a new set of inputs for the features and it should return a predicted label.

60
Q

What does counter do?

A

Counts the occurrences of a string in a list and returns a dictionary with strings and their associated occurrences.

61
Q

Quantopian: What does the initialize function do?

A

The initialize function runs once when the script starts.

It takes in one parameter which is context.

Context is a python dictionary that stores a bunch of data on your strategy (your protfolio, your performace, leverage, other info about you, etc).

When using quantopian the initialize function needs to be defined but it does not need to be called in the script.

62
Q

Quantopian: what is the history() method and what are its input parameters?

A

The history() module returns the price (or volume, etc) for the specified asset for x time back depending on the bar_count and frequency chosen.

Note: that the module is based on a pandas dataframe.

Input parameters: asset (e.g. the stock), field (type of data, price or volume?), bar_count (how many bars do you want), frequency (time period).

63
Q

Quantopian: How can you pull price data?

A

you can get price data using data.history()

64
Q

Finance: What does alpha represent?

A

Alpha represents the performance of a portfolio relative to a benchmark.

In other words, alpha is a measure of the return on investment that is not a result of general movement in the market.

65
Q

Finance: What is beta?

A

Beta is a measurement of the volatility of an asset’s returns.

It is used as a measurement of risk.

A higher beta means greater risk, but also greater expected returns.

β = 1, exactly as volatile as the market.

β > 1, more volatile than the market.

β < 1 > 0, less volatile than the market.

β = 0, uncorrelated to the market.

β < 0, negatively correlated to the market.

66
Q

Quantopian: How do you run your own function in quantopian?

A

Using schedule_function() written under initialize function.

You need to place your function as a parameter within schedule_function().

You can also define how often it runs, hourly, weekly, monthly, etc.

Also, when it runs relative to the market open. For example, you can make it offset so that it only runs 1 hour after the market opens.

67
Q

What does the blaze ecosystem allow for?

A

It provides python users high-level access to efficient computation on inconveniently large data.

68
Q

Qunatopian: what does blaze.compute() do?

A

It returns a pandas dataframe from a blaze.

69
Q

What is rolling.apply() and how is it used?

A

Rolling.apply() allows you to apply a function to individual values in a data set. It is typically used to apply functions to values in a dataframe column.

Rolling.apply() applied to a dataframe:

pandas_dataframe_name.Rolling.apply()

70
Q

With sys.argv how do you make sure you only access the command line arguments?

A

You need to slice it to remove the program name from the beginning.

arguments = sys.argv[1:]

71
Q

What does sys.exit() do?

How is it different to break?

A

sys.exit() can be used anywhere and it causes the entire program to end.

Break is only used in loops, it causes the loop to end but if there is code after the loop it the program continues.

72
Q

How do you convert unix time to readable datetime?

A

df[‘Date’] = pd.to_datetime(df[‘Date’],unit=’s’)

73
Q

If you get the error ‘int’ object has no attribute ‘__toordinal__‘ when using df[‘Date’].apply(mdates.date2num), what has happened?

A

The dataframes ‘date’ column is not in the date type ‘datetime64’ which is obtained from the datetime library thus it needs to be converted using: pd.to_datetime(df[‘Date’],unit=’s’).

74
Q

If you have a dataframe with dates of type string how do you convert it into a datetime object?

A

using pd.to_datetime(x)

x= is the date column

There are also a lot of optional parameters that can be added.

75
Q

What do you need to remember about using pd.to_datetime()?

A

to_datetime() is capable of converting string dates to datetime objects as long as there is a consistent time differnce between the dates, in other words a consistent pattern.

If there are anomalies, it throws the module off.

76
Q

If you need to convert a string date to a mdate how would you do it ?

A

df[‘Column_name’].apply(mdates.datestr2num)

converts a string date to num date (which is mdate)

77
Q

What is the easiest module to use to plot candlesticks in python?

What do you need to remember about it?

A

candlestick2_ohlc

It does not takes dates.

78
Q

When using matplotlib if the dates are overlapping in a plot how do you fix it?

A

Rotate the dates using:

plt.xticks(rotation=x).

45 or 60 degrees would work well.

79
Q

If the x-axis is showing mdates how do you convert it to a regular datetime representation.

A

ax.xaxis.set_major_formatter(mdates.DateFormatter(‘%Y-%m-%d’))

Where ax is referring to what plot you want this to apply to.

80
Q

When using pd.to_datetime on a dataframe date column and you get inconsistent date values returned (i.e. random dates that are a lot larger or smaller than the dates preceding and following it), how do you solve this issue?

A

What is happening is that to_datetime is incorrectly identifying the date format, so you need to define the format.

One of the optional parameters of to_datetime is ‘format’.

Example, format = ‘%d/%m/%y’.

*Note* if the date column has years that are truncated (i.e. 15 for 2015), then you need to use a lowercase %y to define the year.

81
Q

If you want to pull a certain column (or columns) to create a new dataframe, how would you do that?

A

index the dataframe you want to pull the column from with 2 sets of brackets, i.e. df[[‘Close’]].

If you only you use one set you get a pandas series.

82
Q

What is the easiest way to add new columns to a dataframe?

A

On the LHS use index brackets on the dataframe with the name of the column inside, equate this to

83
Q

How do you add NaN values to a list?

A

use:

None

84
Q

If you’re going to use a list in multiple variables, what do you need to do and why?

A

You’re going to need to copy the list.

If you won’t the variables will essentially be linked to the same list and

85
Q

What does zip() allow you to do?

A

It allows you to iterate through more than one tuple, list or dictionary and at once.

For example, this allows you to pull values from two different lists at once and use them for calculations or aggregate them.

You need to bear in mind that zip() returns a tuple of the iterables.

86
Q

Remind yourself, what are the slicing rules again?

A

The start value is inclusive, the end value is exclusive.

So remember, that start value is included, end isn’t.

87
Q

What is the relation between figure and subplots.

A

A figure can have multiple subplots.

88
Q

Say you download a csv fill, the dates are strings and they are formatted weirdly, what do you need to do to convert it into a datetime object?

A

First of all, remove any characters that interfere with the identification of time values, do this with strip(). For example, if there are AMs and PMs.

Next, the date column needs to be passed through pd.to_datetime(). The first parameter is the column, the next parameter is the format of the date, make sure to correctly pass through the format of the string date including any spaces/hyphens/backslashes between the values.

89
Q

What do you need to remember about ylabels, xlabels, xticks, etc, when you have multiple subplot2grid calls in one figure?

A

Any specific plt definitions for a plot like ylabels , xlabels , xticks , et, need to be defined before any other subplot2grid mentions.

For example, say you’re creating a plot with volume, rsi and price data. If you want the date for the Volume plot to be rotated you need to define plt.xticks(rotation=45) right after you write ax = plt.subplot2grid().

90
Q

If datetime ticks of one plot are unnecessary, how do you remove it?

A

plt.setp(ax2.get_xticklabels(), visible = False)

ax2 just refers to the plot that this applies to.

91
Q

How to create an empty dataframe?

A

pd.DataFrame()

92
Q

How do you sort the index of a dataframe from smallest to largest?

A

use df.sort_index()

the parameters you need to add inside the brackets are ascending = True and inplace = True.

Ascending set to true specifies that you want the index to be sorted from smallest to largest.

Inplace set to true means that you want this change to be permanent.

93
Q

When applying modules that you have never applied before, what do you always need to check to save time?

A

Check if the module has a inplace parameter defined.

If it does then you need to set inplace to True if you want your changes to last.

94
Q

How can you create arrows without using the arrow() module?

A

Using annotate()

plt.annotate(‘text’, xy=(x,y), xytext=(x,y), arrowprops=dict( arrowstyle=”-|>”, color=’r’, lw=1.5))

95
Q

How do you reference a pandas dataframe column by index?

A

df.iloc(:, n)

n is the column number of the column that you want to use.

The first parameter is the index of the values or values that you want to obtain from the column, : specifies the whole column.

96
Q

How do you use an if/else statement in a list comprehension.

A

It needs to be put before the for loop.

Example, Negative_Returns = [i if i < 0.0 else 0.0 for i in Return].

The way it read is, given i, if i is less than 0, then use i else use 0.0.

97
Q

If you are getting an ‘HTTPError: Not Found’ trying to request data from a site using urllib or requests what do you need to do?

A

It is a simple bot prevention mechanism.

To overcome this block you need to impersonate a human being.

This can be done by manually assigning the headers of the request statement.

For example, headers = {‘USER-AGENT’: ‘Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7B405’}

If you are using requests.get(), this is then assigned to ‘headers’ parameter.

98
Q

What sort of iterating function can you never use if you want to modify a list (say remove certain elements)?

A

You cannot use a for loop.

You can instead use a while loop.

99
Q

After having requested the source code of a site how do you go about finding a table and iterating through the rows?

A

First, you need to find all the table tags, you can do that with .table.

Apply .table to your beautiful soup object, i.e. table = soup.table.

Next step is to find all the table rows, this can be done by applying find_all(‘tr’) to the table variable assigned above.

The last step is to iterate through the table data ‘td’ in each of the rows. Again, this can be done using the find_all(‘td’) module.

This data can then be appended to a pandas dataframe or

100
Q

How do you remove empty list entries?

A

.remove(‘’) needs to be used, the issue with the module is that it only applies once, to remove all the empty entries you need to use a while loop.

The while loop tests whether there are any empty list entries in the list, i.e. while ‘’ in NVT_Ratio:

101
Q

How do you plot two different datasets with the same x axis but different magnitudes for the y-axis on the same plot?

A

A subplots() needs to be defined.

Plot the first dataset to the axes, e.g. ax1.plot(data)

Then you need to state that you want two sets of y-axis scales on the plot using twinx(). This module is applied to the first axes that we defined, ax1, and this is assigned to a new axes, ax2.

Lastly, plot the second dataset on the second axes.

102
Q

What does plt.subplots() return?

A

It returns a figure and an axes usually defined as ax.

103
Q

If you only want to remove trailing zeros what do you need to use?

A

rstrip()

intead of strip()

104
Q

What is pd.Timestamp and what does it do?

A

Is the pandas equivalent of datetime in python, it is interchangeable with it in most cases.

So pd.Timestamp can be used to convert values into timestamps (datetime).

Key thing to note that they are essentially equivalent.

105
Q

If you’re scraping large amounts of data from the internet, what are you things you need to be doing to increase the efficiency of the code.

A

First and foremost, make sure you’re working with pandas, they are designed for data science and thus pandas are the quickest and most effective way of handling data (do not use lists).

Secondly, save the dataframe in a csv and give the user the option of using the data in the csv currently or refreshing the data and giving a new csv file. This prevents constant scraping of the website when it isn’t necessary.

Thirdly, try to use pickling to save your data.

106
Q

How do you add a dataframes column to another dataframe?

A

You can’t just write this:

df[‘Column_Name’] = df_2[‘Column_Name’]

You need to write values at the end of the RHS.

Why? Well without the .values the indexing of the column results in a pandas series, you cannot simply assign a new column of type pandas series. You can however use numpy types and by using values it converts the pandas series into a numpy array.

107
Q

How do you create an empty data frame?

A

you can create an empty dataframe by simply writing

pd.DataFrame(columns = [‘Column_One’])

108
Q

How do you select rows out of a dataframe that are equal to some value, values or combination of the two?

A

To select rows that are equal to a specific value:

df.loc[df[‘Column_Name’] == value]

If you need to list values from a dataframe column that in an iterable (multiple values to test for):

df.loc[df[‘Column_Name’].isin(some_values)]

If you want to combine multiple conditions:

df.loc[(df[‘Column_Name’] == value]) & df[‘Column_Name’].isin(some_values))

*You need to add & sign between the conditions*

109
Q

If you just used loc on a dataframe to create a dataframe with values that satisfy the condition passed through what do you need to remember about indexing?

A

The index value will still be related to the previous dataframe, so the first row will not be the 0th index instead it will hold the index value it had in the unfiltered dataframe.

If you want to access the first row by indexing you need to use .index and apply the logical index value to that, i.e.

filtered_df[‘Column’].index[0]

110
Q

If you are getting an invalid syntax error but the code seems fine, what should you check?

A

If you got any brackets wrong in the lines before the error.

For example, a missing bracket.

111
Q

How can you compare the speed of functions in python?

A

%timeit

before the function

112
Q

What did you learn about applying %timeit to df**2, np.square(df), np.power(df, 2)?

A

Specialised functions/modules for a specific task outperform general functions. (it’s a good habit to check with %timeit anyways).

So np.square outperforms the rest for squaring and np.power is the slowest (likely because ** function is more optimised for power of 2).

That being said, for float powers np.power is more advised. This is because, np.square only finds the square, and when non-integer values are used with ** its slower than np.power.

113
Q

What is the way of appending values from a loop to a dataframe that are not dataframe or series objects?

A

for i in data:

df = df.append({‘Column_Name’: i }, ignore_index=True)

The best way to do it is to define a {dictionary} in the data parameter with the column you want to append and the value you want to append in the value parameter.

You might also have to put ignore_index to True.

114
Q

What is the shortcut to keyboard interrupt in python?

A

ctrl+c

Make sure you don’t have anything selected in the script.

Press on the ipyhton console for make sure ctrl+c works.

115
Q

If you want to make a dataframe column that consists of two other existing columns (e.g. adding or multiplying them together), what is the easiest way?

A

The best way is to simply apply the operation between the two dataframe columns and then assign it to the column name in the dataframe.

Pandas understand that if you apply an operation between two columns that you want to apply the operation row by row.

116
Q

What does pd.series.dt.year do ?

A

It gives you only the year of a datetime object.

117
Q

How do you get user input in python?

A

You can use the input() function.

Inside the brackets, you can add a text to ask for a certain input response.

118
Q

How can you calculate the standard deviation of data in python without having to implement it yourself?

A

You can use the statistics module to calculate the stdev

x = statistics.stdev(data)

119
Q

How do you import everyting from a library without having to write the libaray name to call the functions?

A

form library_name import *

* means you want to import everything.

120
Q

What is an effective way of getting terminal access to a specific directory or folder?

A

Right click on a folder, go to services and go to ‘New terminal at folder’.

121
Q

What do you need to do to start your nodejs local server?

A

In the terminal associated with the appropriate folder, you need to write, node index.js

Note it does not necessarily need to be called index.js.

122
Q

How do you break a terminal command?

A

crtl + c

123
Q

Node JS: how do you copy one file into another using terminal?

A

using cp

example, cp config.example.js config.js

124
Q

What is the easiest way to do multi-line prints?

A

put ‘’’ ‘’’ triple qoutes inside the brackets.

e.g.

print(‘’’

This is a

mutli-line print

’’’)

125
Q

What does sys.argv allow you to do?

A

It allows you to input values to a python script from outside, say in the terminal. This allows you to manipulate the values that go into python.

You need to remember to have the terminal open in the folder/directory of the file you want to work with.

This allows you to pass values from anywhere to python.

126
Q

How do you assign a value to a specific cell in a dataframe?

A

You can refer to the cell using iloc[], with the index of the cell in the square brackets, on the specific column, i.e. df[‘Column_label’].iloc[-1].

127
Q

What is the easiest way to create a timer in python?

i.e. compare a start time with a current time

A

This can be done with time.time().

First assign time.time() to a start variable outside of the loop.

Then, inside the loop assign a new time.time() variable. This variable in the loop will always give the current time.

Next, compare the two time variable by finding the difference between the two, then apply this to a conditional operator to

128
Q

What do you need to remember about function parameters and global variables?

A

It is best not to name your function parameters after any global variables you have defined as these could clash when using the function.

For example, having a function parameter called order_id and having a variable named order_id. If you write order_id in the function brackets with the intention of inputting the variable values into the function will throw an error. The function will assume you are trying to assign a value to the order_id parameter and without any assignment it will be considered as ‘None’.

129
Q

What do you need to remember about writing default arguments for functions?

A

Make sure they are immutable. For example, instead of using a list define a tuple. Since the list is mutable, it can be changed and would default to that new value afterward which can cause a bug quietly.

130
Q

What is the naming convention of functions and variables in python?

A

It is snake_case, so lowercase with an underscore to denote spaces.

131
Q

What is the naming convention of classes in python?

A

The naming convention is CamelCase.

132
Q

What is the naming convention for type variable names?

A

It is CapWords.

Anything written afterwards should be denoted by underscore and lower case after that.

133
Q

What is the naming convention for constant in python?

A

Its uppercase with underscores for the spaces.

e.g. MAX_OVERFLOW.

Remember constants are essentially variables that are not be changed in the script.

134
Q

What should you think of doing when a function returns a tuple of values?

What is the most efficient way of dealing with the values contained in the tuple?

A

You can assign the values returned by the function in one variable assignment line, however, you will be assigning multiple variables in one line.

On the LHS of equals sign write down a tuple of variables. Each variable is associated with the corresponding function value in the same index place.

Doing this will save a lot of space and unnecessary effort.

135
Q

In the jupyter environment, how do you find out where a function is from?

A

Write the function in the cell and run it and it will show where it is from.

136
Q

In the jupyter environment, how do you find out what it does?

A

You put a question mark before the function and run the cell

137
Q

In the jupyter environment, how do you find out the source code of a function quickly?

A

Two question marks before function name and run in the cell

138
Q

What does parse_dates do in pd.readcsv() ?

A

parse_dates just allows you to specify to the read_csv function which column contains the dates.

139
Q

What’s a shorting way of writing the transpose module?

A

.T

140
Q

Jupyter notebook: What happens if you just write a variable into a cell and run it? Say a pandas data frame?

A

The jupyter notebook will try to show

141
Q

For Random forests, what do you need to remember about DateTime objects?

A

You need to feature extract the datetime object.

Without doing this you can’t capture any trend/cyclical behaviour as a function of time at

142
Q

What are the naming styles for functions, variables, classes, methods, constant, modules and packages in Python according to PEP 8?

A

Functions: use lowercase word or words. Words separated by underscores.

Variable: use lowercase letter, word or words. Separate words with underscores.

Class: start each word with a capital letter. Do not separate words with underscores. This style is called camel style.

Method: use lowercase word or words. Separate words with underscores.

Constant: use uppercase. Separate words with underscores.

Module: use short, lowercase word or words. Separate using underscores.

Package: use short lowercase word or words. Do not separate using underscores.

143
Q

How much whitespace should you leave between top-level functions and classes?

A

Two blank lines.

144
Q

Boolean Values: Why is this not recommended?

A

The use of the equivalence operator, ==, is unnecessary here. bool can only take values Trueor False. It is enough to write the following:

145
Q

Empty Sequences: Why is this not recommended?

A

If the list is empty, its length is 0 which is equivalent to False when used in an if statement. Thus, it is redudant to check the length first, the variable my_list can simply be placed in the if statement.

146
Q

Comment out blocks of code in visual studio code

A

cmd + /

147
Q

What is a headless browser?

A

A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but are executed via a command-line interface or using network communication.