Webscraping Flashcards
What are the 3 types of http requests? What do they mean?
get : receive data from server
post : send data to server
put : idempotent post (multiple put requests are treated as one)
What are the 3 types of http codes? What do they mean?
200 : good
300 : redirection
400 : error
How can you convert a json string to a dictionary?
import json
json_string = ‘….’
json_dict = json.loads(json_string)
Why is Beautiful Soup useful? How can you create a soup object from html?
It allows you to parse html and index by tag and attributes. soup = BeautifulSoup(html_string, ‘html.parser’)
How can you use BS to find the html with given tags and attributes?
soup.find_all(tag (str), attributes (dict))
How can you customize your request headers?
Change user agent to be the same as browser:
requests.get(url, headers = {‘User-Agent’ : user_agent})
When would you want to use selenium?
When webpage html is dynamic and requires input from the user (e.g. log in, accept cookies, wait for text to be displayed etc….)