exam 3 lec notes Flashcards

1
Q

assert statements enable you to convert _____ errors into _______ errors

A

semantic, runtime

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the syntax for assert ?

A

assert BOOLEAN_EXPRESSION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what happens when assert evaluates to true? what about false?

A

true: nothing happens, moves on
false: assertion error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what type of errors are called ‘exceptions’?

A

runtime errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is ‘catching’ an expression in terms of try and except blocks?

A

where there is an exception, the try block gets terminated and the except block runs instead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is ‘e’ in exception?

A

the object instance, you add it after exception if you want a general overview (?) or your error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what does str(e)?

A

gives you the reason for the exception

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

whats does type(e) do?

A

gives you the type of error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what does raising and error do in python?

A

explicitly signals something is wrong and explains the error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what does .read() convert the contents in a file to?

A

a long str

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what needs to be explicitly written for writing in a file?

A

\n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does os.listdir do?

A

list all the paths and directories in a directory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does os.mkdir do?

A

creates a new directory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does os.path.isfile do?

A

returns true if a path points to a file, otherwise returns false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does os.path.isdir do?

A

returns true if path points to directory, otherwise it returns false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does os.path.join do?

A

joins one or more path components into a single path

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is a pandas series?

A

a combination of dict and list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what can pandas series be made from?

A

list or dict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

if the ‘exception’ is too broad in a try and except block? what could you do to try and catch a specific error?

A

you can specify the type of error you want to catch by writing the type after the exception

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is the difference between ‘+’ and ‘append()’ method on lists?

A

+ operator creates a brand new object instance
append() modifies the existing list object instance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is a path on python?

A

a string that represents the location of a file or directory on a computer’s file system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what do you pass as an argument to open(…)?

A

the relative path, you should not hardcode the ‘/’ or ‘'

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what does the read() function return?

A

the file contents as one big string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what does calling list() on a file do?

A

converts file contents as a list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what needs to be explicitly written for the ‘w’ mode in open(…) function?

A

\n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what is os in python?

A

os stands for operating system, and it is a built in module that allows you to interact with the operating system and access various functions of the platform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

what does os.path.join do?

A

allows you to write code in one OS platform and run it on another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

what is pandas?

A

a software installed on top of python that is a package of tools for doing data science

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

what is a pandas series? what can it be created from?

A

a pandas series is a combination of dict and list, it can be created from either a python list or dict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

what is an index in python series, what is an integer position in python list?

A

index is equivalent to a key in python dict
integer position is equivalent to index in python list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

how would create a series using d = {‘one’: 7, ‘two’: 3}? (Syntax)

A

s = pd.Series(d)
or you can add the dictionary as an argument to the series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

what is the difference between iloc and loc for Pandas series?

A

iloc stands for integer location, and can be used to splice the series by rows and column number

loc stands for location and is used to pick out rows and column by it’s label

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

what does .quantile() do? What is its argument, and what is its defaults?

A

it enables to calculate percentages
it takes a argument float value between 0 and 1
it defaults to the 50th percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

what does .value_counts() do?

A

creates a series where the key is the data, and the value is its count in the Series

default return value is descending order of counts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

what does .sort_index() do?

A

sorts the series in order of values and index, returns a new series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

what argument do you need to pass for .sorted_index() to get a series sorted in reverse order?

A

ascending = False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

what are the expressions for booleans for series?

A

& means ‘and’
I means ‘or’
~ means ‘not’
use () for compound boolean expressions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

how must you format boolean expressions for a series?

A

specify them with a pair of []

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

what are some forms that can be created into a data frame?

A
  • dictionary of series
  • dictionary of lists
  • dictionary of dictionaries
  • list of dictionaries
  • list of lists
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

to make a dictionary of series, what do you need to write for columns?

A

the names

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

what is d.loc[r] for pandas dataframe?

A

lookup row by row index

42
Q

d.iloc[r] for pandas dataframe?

A

lookup row by row integer position

43
Q

d[c] for pandas dataframe?

A

lookup column by column index

44
Q

d.loc[r, c] for pandas dataframe?

A

lookup by row index and pandas column index

45
Q

d.iloc[r c] for pandas dataframe?

A

lookup by row integer position and col position

46
Q

what are two ways to set values for a specific entry in pandas dataframe (using loc and iloc)?

A

d.loc[r, c] = new_val
d.loc[r, c] =new_val

47
Q

what does d[bool series] make?

A

a new dataframe of all rows that lined up were true

48
Q

how do you create a dataframe from csv?

A

pd.read_csv(“dataframe.csv”)

49
Q

what does .head(n) do for dataframe? What is the default?

A

it gets the first n lines, 5 is the default

50
Q

what does .tail(n) do for dataframe? what is the default?

A

it gets the last n lines, 5 is default

51
Q

what is a problem that occurs where you write a csv from pandas to file?

A

you may write a new index column because dataFrame already contains an index

52
Q

how do you convert a df to a csv file?

A

df.to_csv(‘hjhsjdh.csv’, index = False)

53
Q

how do you fix double index column after converting a csv to a dataframe?

A

df.iloc[:, 1:]
- all rows
- columns starting from the first

54
Q

what does HTTP status 200 mean? what about 404?

A

200: sucess
404: not found

55
Q

what does .status_code() do in HTML?

A

it tells you whether the code was sucessful (200) or not (404)

56
Q

what is json.load() used for?

A

json.load is used to read JSON data from a file object

57
Q

what does json.loads() do?

A

read JSON data from a string

58
Q

what are hyperlinks?

A

they are HTML Links

59
Q

what do hyperlinks include?

A

an anchor (a) and hyper references (href =)

60
Q

how do you make an unordered list and ordered list in HTML?

A

ul for unordered list
ol for ordered list

61
Q

how to you add items to a HTML list?

A

li

62
Q

what is table (HTML table tags)?

A

start and end of a table

63
Q

what is tr (HTML table tags)?

A

start and end of a new row

64
Q

how can you convert a dataframe to html using a built in function?

A

.to_html()

65
Q

what does prettify() do?

A

it returns a formatted representation of the raw HTML

66
Q

what does find(“”)?

A

returns the first element matching the tag string, none otherwise

67
Q

what does find_all(“”) do?

A

returns an iterable of all matching elements (HTML ‘tags’), empty iterable otherwise

68
Q

what does get_text() do?

A

returns the text associated with the element if applicable, otherwise does not return

69
Q

what does .children do? What can it be converted into?

A

gets children of this element, can be converted into a list

70
Q

what is .attrs? what does it return?

A

is the attribute assoiciated with the tage, returns a dict of containing all attributes of the tag

71
Q

what is the type of find’s return value?

A

bs4.element.tag

72
Q

what is the return type of find_all?

A

bs4.element.ResultSet

73
Q

why can’t you use .children on find_all but you can use it on find?

A

find_all returns a list where every element is given equal priority, whereas find returns a single tag object in which .children iterates to only find the specific attributes

74
Q

What is BeautifulSoup?

A

a Python library used for web scraping purposes to extract the data from HTML and XML documents. It provides a simple way to navigate, search, and modify the parse tree generated from a web page

75
Q

what does it mean to parse HTML?

A

to analyze it’s structure and convert it into a hierarchy of nodes so the info can be easily accessed and manipulated by computer programming

76
Q

what are the 5 steps to parsing HTML?

A
  1. read contents from the file and make sure to close the file afterwards
    2.create BeautifulSoup object instance
    3.Parse the table ( if there is more than one table use find_all and extract appropriate table through indexing)
  2. Parse the header
    5.Parse the data rows and store data into a list of dicts
77
Q

Are databases more structured than CSV and JSON files?

A

yes

78
Q

what are the 3 requirements for databases?

A
  1. all data contained inside one or more tables
  2. all tables must be named, all columns must be named
  3. all values in a column must be the same type
79
Q

What is data transformation?

A

The process of changing the format, structure, or values of the data

80
Q

What does df.set_index(“column”) do? What does it return?

A

Sets the column as the new index of reference; returns new data frame object

81
Q

what is np.NAN?

A

represents ‘Not a Number’, basically any missing infromation

82
Q

how do you replace Nan values in a DataFrame?

A

df. replace (<target>, np.Nan)</target>

83
Q

what types of targets can be inputed into df.replace( target, np.Nan)?

A

atr, int, float, None

84
Q

What is .isna()? what does it return?

A

it checks for any missing data and returns a boolean series (index on left, boolean on right)

85
Q

what does .value_counts do?

A

counts the number of unique values you want, ordered in descending order

86
Q

what does df.fillna(<replace>) do? what does it return?</replace>

A

replaces missing values, a new dataframe

87
Q

what does dropna() do? what does it return?

A

drops all the rows that contain NaN, returns new dataframe

88
Q

What does .apply(function object reference) do? what does it return

A

applies an input function to every element of the Series, returns anew series objects

89
Q

what does .groupby() do? (for pandas)

A

basically similar function as the SQL one, groups data by column, you will need to use aggregation methods like Sum, Avg, etc for this though

90
Q

how do you attach the aggregation on the groupby()?

A

.agreggation()

91
Q

what is the syntax for setting x-axis, y-axis, and graph titles in bar plots for matplotlib?

A

ax.set_xlabel()
ax.set_ylabel()
ax.set_title()

92
Q

what is ax in matplotlib?

A

a general term for axis related things like x-axis, y-axis, labels, titles

93
Q

how do you make an other column so you don’t have many ticmarks on a barplot (hint: has to do with splicing the dataframe)?

A
  1. use iloc to get the relevant rows first
  2. create a new variable and set it to the sum of the other iloc rows that come after the ones in step one
    3.create new column for that new variable
94
Q

what are Axessubplots?

A

subplots in plots

95
Q

if an ax = None when trying to make a Axessubplot, what is the outcome?

A

a new Axessubplot is created

96
Q

what are xlim and ylim and what do you input in them?

A

they are limits for the x and y axis (what the smallest and largest values are for the axis)

they take either just a lower bound or (lower bound, upper bound) as a tuple

97
Q

what is c matplotlib?

A

a way to color markers/lines

98
Q

what are the x and ya xis for Series.plot.line()?

A

x: index
y:values

99
Q

what is the line in Dataframe.plot.line()?

A

each column becomes a line

100
Q

how do you fix a crooked line for series line plot?

A

sort on increasing indices,
either through sort index or sort values

101
Q

how do you fix a crooked line for dataframe line plots?

A

sort on ascending order of indicies