Python Code Knowledge Flashcards

Question

What is a quick way to extract table data from webpages?

Answer 1

use pd.read\_html("url")

Answer 2

pickle.dump(x,y) x = what you want to add y = the pickle file you want to 'dump' it in.

Answer 3

for row in table.findAll('tr')[1:]: ticker = row.findAll('td').text tickers.append(ticker) #1st line: iterate through each table row, except for the top row as these are the column labels. 2nd line: essentially what this says is find all table data for this row (hence td), convert this data into text. You could slice this list if you only desire content from specific columns.

Answer 4

pickle.load(x) x = pickle file you want to access This can be assigned to a variable to save it.

Answer 5

os.makedirs('x') x = define directory name

Answer 6

Import the library with the module attached and give it a shortened name value using '**as**'. For example, since the pyplot module is heavily used in matplotlib, it is common to find the module with the library imported and defined **as** plt. i.e. import matplotlib.pyplot as plt

Answer 7

You need to label your plots, after adding the x and y variables, add a third parameter label.

Answer 8

You read a csv using csv.reader(csv\_file\_name, delimiter = ',') The delimiter is what the values will be separated by. In the case above they are separated by a comma.

Answer 9

import numpy as np np.loadtxt("File\_name.type", delimiter = ',' , unpack = True) \*Note\* The file does not have to be a .txt, it can be a .csv, it can be any file with text in it. It's also important to remember to add unpack = True if you have two variables to unpack.

Answer 10

when applied to a string it returns a LIST of the all the words in the string.

Answer 11

urllib.request.urlopen() Inside the brackets you paste url within commas.

Answer 12

os.getcwd() its easier to remember the module if you look at what cwd abbreviates, its an abbreviation for current working directory.

Answer 13

sys.agrv allows you to pass a list of command line arguments from the terminal. It is a list in python which contains the command-line arguments passed to the script.

Answer 14

you should use Regular Expressions written as **re** in python. re.findall(r'\d', x) x = text variable

Answer 15

1. You first need to define the variables that you intend to post in a dictionary, reffered to as **values**. 2. For the URL to understand the values it needs to be **encoded** using data = urllib.parse.**urlencode**( values). There's another encoding step after that, encoding to utf-8 bytes, i.e. data = data.**encode(**'utf-8'**)** 3. Once the data is encoded the next step is code a request to the URL to post your values. req = urllib.request.**Request(**url, data**)** 4. The following step is to open the URL with request added on, urllib.request.**urlopen(**req**)**. Opening the URL with the request will return a response, this will be assigned to the variable resp, i.e. resp = urllib.request.urlopen(req). 5. Finally to see the response **.read()** needs to be applied to resp. A GET request is pretty similar to part 4 above. It uses the same base code urllib.request.urlopen() but now we need to decode. The code should look like this, urllib.request.urlopen(website\_url).read().decode()

Answer 16

plt.subplot2grid((x),(y)) x is a tuple stating the number of rows and columns, y is a tuple specifying the origin of the plot. ax1 = subplot2grid((6,1),(0,0), rowspan = 5, colspan = 1) ax2 = subplot2grid((6,1), (5,1), rowspan = 1, colspan = 1) \*note\* you need to remember to adjust the start point.

Answer 17

ax1.xaxis.get\_ticklabels() if you want to access the y axis jus change xaxis to yaxis.

Answer 18

you need to use matplotlib.finance to import candlestick\_ohlc this is written as: from matplotlib.finance import candlestick\_ohlc

Answer 19

Two options: ax1. annotate() ax1. text() For ax1.annotate, the first parameter is the what you want to annotate, it needs to be a string, so ints and floats need to be converted to strings. The second parameter is where you want to annotate, if you're using candlesticks you can specify a specific candle and choose where you want to annotate on the candle, ohlc.

Answer 20

The price data is split up into the training set and a test set. The model is built on the training set and then applied to the unseen test set to see if similar results are obtained.

Answer 21

It is a module applied to a dataframe say df to access a group of rows or columns using the labels used. Note that placing one label in loc returns the values in that row (or column) as a series. If there is more than one label, then a dataframe is returned.

Answer 22

With import bs4 as bs. To parse the content we need to first convert the URL data into a Beautiful Soup object. The Beautiful Soup object is obtained by applying the .BeautifulSoup() module to **bs** from **bs4** library. It is by convention that the object is assigned to the variable *soup,* i.e. soup = bs.BeautifulSoup(text, ' lxml '). With the content now as a Beautiful Soup object, other modules in the library can be applied to parse it. One of the most common modules is find\_all(), it is used on 'soup' and it allows you to filter specific content based on HTML tags. For example, if you want to extract all the URLs in the webpage you can write soup.find\_all('a'). To find whole tables you need to apply the module soup.find('table', {'class': 'wikitable sortable'}). From there you can use find\_all() to filter through the table rows('tr') and within table rows you can access the table data ('td'). \*Note: Beautiful Soup does not acquire web page content, this needs to be done using urllib or requests.\*

Answer 23

The request package allows you to do what urllib does bu shorter and more succinct. It only takes one line to get content from a URL resp = requests.get('url') Similarly, posting information to a URL is a lot shorter. To post the request.post() module simply takes a dictionary as the argument. search\_data = {"search": "Hello World"} resp = request.post('url', data=search\_data)

Answer 24

It joins columns with other data frames either on the index or on a key column. There are optional parameters to customize the joining, one important parameter you need to consider is 'how' it is going to join. The default of 'how' is set to left, which means that the calling frame's index is used, right is the opposite, outer forms union of calling frame index with other and sorts it lexographically, lastly inner is the opposite of outer it forms an intersection.

Answer 25

A numpy representation of the dataframe is returned.

Answer 26

A tuple with the numpy array dimensions.

Answer 27

Return evenly spaced values within a given interval.

Answer 28

Move ticks and ticklabel (if present) to the top of the axes.

Answer 29

Returns the column labels of the dataframe.

Answer 30

To remove a column you need to set axis equal to 1.

Answer 31

Leaving inplace to false does not permanently change the dataframe. To change the underlying data of the dataframe you need to set inplace to True. One way to view it is that you want your changes to stay _in place_, which is why you set it to True. The default value of False for inplace is useful as it allows you to test the changes before making permanent changes.

Answer 32

imshow() & pcolormesh()

Answer 33

They are mostly used in function definitions. \*args and \*\*kwargs allow you to pass a variable number of arguments to a function. In other words, the number of arguments is dependent on the user. \*args is typicaly seen as a list (note that isn't exactly the same). \*\*kwargs is seen as a dictionary as you need to pass keyworded arguments, i.e. name ="potato" where name is the keyword and potato the value. A good way to remember what \*\*kwargs do is to remember that 'kw' stands for keyword, so essentially its \*\*keywordargs.

Answer 34

It is essentially the same as dictionaries, you apply index brackets to the dataframe to assign the column name, this is then equated to what values you want in the column.

Answer 35

Simply put, a feature is an input; the label is an output. A feature is a single column of data in your input set. For example if you're trying to predict what sort of degree someone might choose your input features might be gender, region, family income, etc. The label is the final choice. After having trained the model give a new set of inputs for the features and it should return a predicted label.

Answer 36

Counts the occurrences of a string in a list and returns a dictionary with strings and their associated occurrences.

Answer 37

The initialize function runs once when the script starts. It takes in one parameter which is *context*. *Context* is a python dictionary that stores a bunch of data on your strategy (your protfolio, your performace, leverage, other info about you, etc). When using quantopian the *initialize* function needs to be defined but it does not need to be called in the script.

Answer 38

The history() module returns the price (or volume, etc) for the specified asset for x time back depending on the bar\_count and frequency chosen. Note: that the module is based on a pandas dataframe. Input parameters: **asset** (e.g. the stock), **field** (type of data, price or volume?), **bar\_count** (how many bars do you want), **frequency** (time period).

Answer 39

you can get price data using data.history()

Answer 40

Alpha represents the performance of a portfolio relative to a benchmark. In other words, alpha is a measure of the return on investment that is not a result of general movement in the market.

Answer 41

Beta is a measurement of the volatility of an asset's returns. It is used as a measurement of risk. A higher beta means greater risk, but also greater expected returns. β = 1, exactly as volatile as the market. β \> 1, more volatile than the market. β \< 1 \> 0, less volatile than the market. β = 0, uncorrelated to the market. β \< 0, negatively correlated to the market.

Answer 42

Using schedule\_function() written under *initialize* function. You need to place your function as a parameter within schedule\_function(). You can also define how often it runs, hourly, weekly, monthly, etc. Also, when it runs relative to the market open. For example, you can make it offset so that it only runs 1 hour after the market opens.

Answer 43

It provides python users high-level access to efficient computation on inconveniently large data.

Answer 44

It returns a **pandas** dataframe from a blaze.

Answer 45

Rolling.apply() allows you to apply a function to individual values in a data set. It is typically used to apply functions to values in a dataframe column. Rolling.apply() applied to a dataframe: pandas\_dataframe\_name.Rolling.apply()

Answer 46

You need to slice it to remove the program name from the beginning. arguments = sys.argv[1:]

Answer 47

sys.exit() can be used anywhere and it causes the entire program to end. Break is only used in loops, it causes the loop to end but if there is code after the loop it the program continues.

Answer 48

df['Date'] = pd.**to\_datetime**(df['Date'],unit='s')

Answer 49

The dataframes 'date' column is not in the date type 'datetime64' which is obtained from the datetime library thus it needs to be converted using: pd.**to\_datetime**(df['Date'],unit='s').

Answer 50

using pd.to\_datetime(x) x= is the date column There are also a lot of optional parameters that can be added.

Answer 51

to\_datetime() is capable of converting string dates to datetime objects as long as there is a consistent time differnce between the dates, in other words a consistent pattern. If there are anomalies, it throws the module off.

Answer 52

df['Column\_name'].apply(mdates.datestr2num) converts a string date to num date (which is mdate)

Answer 53

candlestick2\_ohlc It does not takes dates.

Answer 54

Rotate the dates using: plt.xticks(rotation=x). 45 or 60 degrees would work well.

Answer 55

ax.xaxis.set\_major\_formatter(mdates.DateFormatter('%Y-%m-%d')) Where ax is referring to what plot you want this to apply to.

Answer 56

What is happening is that to\_datetime is incorrectly identifying the date format, so you need to define the format. One of the optional parameters of to\_datetime is 'format'. Example, format = '%d/%m/%y'. \*Note\* if the date column has years that are truncated (i.e. 15 for 2015), then you need to use a lowercase %y to define the year.

Answer 57

index the dataframe you want to pull the column from with 2 sets of brackets, i.e. df[['Close']]. If you only you use one set you get a pandas series.

Answer 58

On the LHS use index brackets on the dataframe with the name of the column inside, equate this to

Answer 59

You're going to need to copy the list. If you won't the variables will essentially be linked to the same list and

Answer 60

It allows you to iterate through more than one tuple, list or dictionary and at once. For example, this allows you to pull values from two different lists at once and use them for calculations or aggregate them. You need to bear in mind that zip() returns a tuple of the iterables.

Answer 61

The start value is inclusive, the end value is exclusive. So remember, that start value is included, end isn't.

Answer 62

A figure can have multiple subplots.

Answer 63

First of all, remove any characters that interfere with the identification of time values, do this with strip(). For example, if there are AMs and PMs. Next, the date column needs to be passed through pd.to\_datetime(). The first parameter is the column, the next parameter is the format of the date, make sure to correctly pass through the format of the string date including any spaces/hyphens/backslashes between the values.

Answer 64

Any specific plt definitions for a plot like ylabels , xlabels , xticks , et, need to be defined before any other subplot2grid mentions. For example, say you're creating a plot with volume, rsi and price data. If you want the date for the Volume plot to be rotated you need to define plt.xticks(rotation=45) right after you write ax = plt.subplot2grid().

Answer 65

plt.**setp**(ax2.**get\_xticklabels()**, visible = False) ax2 just refers to the plot that this applies to.

Answer 66

pd.DataFrame()

Answer 67

use df.sort\_index() the parameters you need to add inside the brackets are **ascending = True** and **inplace** **= True**. Ascending set to true specifies that you want the index to be sorted from smallest to largest. Inplace set to true means that you want this change to be permanent.

Answer 68

Check if the module has a inplace parameter defined. If it does then you need to set inplace to True if you want your changes to last.

Answer 69

Using annotate() plt.annotate('text', xy=(x,y), xytext=(x,y), arrowprops=dict( arrowstyle="-|\>", color='r', lw=1.5))

Answer 70

df.iloc(:, n) n is the column number of the column that you want to use. The first parameter is the index of the values or values that you want to obtain from the column, : specifies the whole column.

Answer 71

It needs to be put before the for loop. Example, Negative\_Returns = [i if i \< 0.0 else 0.0 for i in Return]. The way it read is, given i, **if** i is less than 0, then use i **else** use 0.0.

Answer 72

It is a simple bot prevention mechanism. To overcome this block you need to impersonate a human being. This can be done by manually assigning the headers of the request statement. For example, headers = {'USER-AGENT': 'Mozilla/5.0 (iPad; U; CPU OS 3\_2\_1 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7B405'} If you are using requests.get(), this is then assigned to 'headers' parameter.

Answer 73

You cannot use a **for** loop. You can instead use a **while** loop.

Answer 74

First, you need to find all the table tags, you can do that with .table. Apply .table to your beautiful soup object, i.e. table = soup.table. Next step is to find all the table rows, this can be done by applying find\_all('tr') to the table variable assigned above. The last step is to iterate through the table data 'td' in each of the rows. Again, this can be done using the find\_all('td') module. This data can then be appended to a pandas dataframe or

Answer 75

.remove('') needs to be used, the issue with the module is that it only applies once, to remove all the empty entries you need to use a while loop. The while loop tests whether there are any empty list entries in the list, i.e. while '' in NVT\_Ratio:

Answer 76

A subplots() needs to be defined. Plot the first dataset to the axes, e.g. ax1.plot(data) Then you need to state that you want two sets of y-axis scales on the plot using twinx(). This module is applied to the first axes that we defined, ax1, and this is assigned to a new axes, ax2. Lastly, plot the second dataset on the second axes.

Answer 77

It returns a figure and an axes usually defined as ax.

Answer 78

rstrip() intead of strip()

Answer 79

Is the pandas **equivalent** of datetime in python, it is interchangeable with it in most cases. So pd.Timestamp can be used to convert values into timestamps (datetime). Key thing to note that they are essentially equivalent.

Answer 80

First and foremost, make sure you're working with pandas, they are designed for data science and thus pandas are the quickest and most effective way of handling data (do not use lists). Secondly, save the dataframe in a csv and give the user the option of using the data in the csv currently or refreshing the data and giving a new csv file. This prevents constant scraping of the website when it isn't necessary. Thirdly, try to use pickling to save your data.

Answer 81

You can't just write this: df['Column\_Name'] = df\_2['Column\_Name'] You need to write values at the end of the RHS. Why? Well without the .values the indexing of the column results in a pandas series, you cannot simply assign a new column of type pandas series. You can however use numpy types and by using values it converts the pandas series into a numpy array.

Answer 82

you can create an empty dataframe by simply writing pd.DataFrame(columns = ['Column\_One'])

Answer 83

To select rows that are equal to a specific value: df.loc[df['Column\_Name'] == value] If you need to list values from a dataframe column that in an iterable (multiple values to test for): df.loc[df['Column\_Name'].isin(some\_values)] If you want to combine multiple conditions: df.loc[(df['Column\_Name'] == value]) & df['Column\_Name'].isin(some\_values)) \*You need to add & sign between the conditions\*

Answer 84

The index value will still be related to the previous dataframe, so the first row will not be the 0th index instead it will hold the index value it had in the unfiltered dataframe. If you want to access the first row by indexing you need to use .index and apply the logical index value to that, i.e. filtered\_df['Column'].index[0]

Answer 85

If you got any brackets wrong in the lines before the error. For example, a missing bracket.

Answer 86

%timeit before the function

Answer 87

Specialised functions/modules for a specific task outperform general functions. (it's a good habit to check with %timeit anyways). So np.square outperforms the rest for squaring and np.power is the slowest (likely because \*\* function is more optimised for power of 2). That being said, for float powers np.power is more advised. This is because, np.square only finds the square, and when non-integer values are used with \*\* its slower than np.power.

Answer 88

for **i** in data: df = df.append({'Column\_Name': **i** }, ignore\_index=True) The best way to do it is to define a {dictionary} in the data parameter with the column you want to append and the value you want to append in the value parameter. You might also have to put ignore\_index to True.

Answer 89

ctrl+c Make sure you don't have anything selected in the script. Press on the ipyhton console for make sure ctrl+c works.

Answer 90

The best way is to simply apply the operation between the two dataframe columns and then assign it to the column name in the dataframe. Pandas understand that if you apply an operation between two columns that you want to apply the operation row by row.

Answer 91

It gives you only the year of a datetime object.

Answer 92

You can use the input() function. Inside the brackets, you can add a text to ask for a certain input response.

Answer 93

You can use the statistics module to calculate the stdev x = statistics.stdev(data)

Answer 94

form library\_name import \* \* means you want to import everything.

Answer 95

Right click on a folder, go to services and go to 'New terminal at folder'.

Answer 96

In the terminal associated with the appropriate folder, you need to write, node index.js Note it does not necessarily need to be called index.js.

Answer 97

using cp example, cp config.example.js config.js

Answer 98

put ''' ''' triple qoutes inside the brackets. e.g. print(''' This is a mutli-line print ''')

Answer 99

It allows you to input values to a python script from outside, say in the terminal. This allows you to manipulate the values that go into python. You need to remember to have the terminal open in the folder/directory of the file you want to work with. This allows you to pass values from anywhere to python.

Answer 100

You can refer to the cell using iloc[], with the index of the cell in the square brackets, on the specific column, i.e. df['Column\_label'].iloc[-1].

Answer 101

This can be done with time.time(). First assign time.time() to a start variable outside of the loop. Then, inside the loop assign a new time.time() variable. This variable in the loop will always give the current time. Next, compare the two time variable by finding the difference between the two, then apply this to a conditional operator to

Answer 102

It is best not to name your function parameters after any global variables you have defined as these could clash when using the function. For example, having a function parameter called order\_id and having a variable named order\_id. If you write order\_id in the function brackets with the intention of inputting the variable values into the function will throw an error. The function will assume you are trying to assign a value to the order\_id parameter and without any assignment it will be considered as 'None'.

Answer 103

Make sure they are immutable. For example, instead of using a list define a tuple. Since the list is mutable, it can be changed and would default to that new value afterward which can cause a bug quietly.

Answer 104

It is snake\_case, so lowercase with an underscore to denote spaces.

Answer 105

The naming convention is CamelCase.

Answer 106

It is CapWords. Anything written afterwards should be denoted by underscore and lower case after that.

Answer 107

Its uppercase with underscores for the spaces. e.g. MAX\_OVERFLOW. Remember constants are essentially variables that are not be changed in the script.

Answer 108

You can assign the values returned by the function in one variable assignment line, however, you will be assigning multiple variables in one line. On the LHS of equals sign write down a tuple of variables. Each variable is associated with the corresponding function value in the same index place. Doing this will save a lot of space and unnecessary effort.

Answer 109

Write the function in the cell and run it and it will show where it is from.

Answer 110

You put a question mark before the function and run the cell

Answer 111

Two question marks before function name and run in the cell

Answer 112

parse\_dates just allows you to specify to the read\_csv function which column contains the dates.

Answer 113

The jupyter notebook will try to show

Answer 114

You need to feature extract the datetime object. Without doing this you can't capture any trend/cyclical behaviour as a function of time at

Answer 115

Functions: use lowercase word or words. Words separated by underscores. Variable: use lowercase letter, word or words. Separate words with underscores. Class: start each word with a capital letter. Do not separate words with underscores. This style is called camel style. Method: use lowercase word or words. Separate words with underscores. Constant: use uppercase. Separate words with underscores. Module: use short, lowercase word or words. Separate using underscores. Package: use short lowercase word or words. Do not separate using underscores.

Answer 116

Two blank lines.

Answer 117

The use of the equivalence operator, ==, is unnecessary here. bool can only take values Trueor False. It is enough to write the following:

Answer 118

If the list is empty, its length is 0 which is equivalent to False when used in an if statement. Thus, it is redudant to check the length first, the variable my\_list can simply be placed in the if statement.

Answer 119

A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but are executed via a command-line interface or using network communication.

Python Code Knowledge Flashcards

(147 cards)