Webscraping Flashcards

1
Q

What are the 3 types of http requests? What do they mean?

A

get : receive data from server
post : send data to server
put : idempotent post (multiple put requests are treated as one)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 types of http codes? What do they mean?

A

200 : good
300 : redirection
400 : error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can you convert a json string to a dictionary?

A

import json
json_string = ‘….’
json_dict = json.loads(json_string)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is Beautiful Soup useful? How can you create a soup object from html?

A

It allows you to parse html and index by tag and attributes. soup = BeautifulSoup(html_string, ‘html.parser’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can you use BS to find the html with given tags and attributes?

A

soup.find_all(tag (str), attributes (dict))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you customize your request headers?

A

Change user agent to be the same as browser:

requests.get(url, headers = {‘User-Agent’ : user_agent})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When would you want to use selenium?

A

When webpage html is dynamic and requires input from the user (e.g. log in, accept cookies, wait for text to be displayed etc….)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly