AtBS 11 Web Scraping Flashcards
How to open up Google from the shell?
import webbrowser
webbrowser.open (‘www.google.com’)
What module lets you easily download files from the Web?
request module
need to pip install request the first time to get it.
Syntax to download a webpage?
res = requests.get (‘URL’)
pg 237
How to check if a request download worked?
res.status_code == requests .codes.ok
should equal True
How to find the length of a requests download?
len (res.text)
When saving a web page what is important about saving it and why?
Need to save it in a binary format.
Important to do this so that the file can maintain Unicode characters.
pg 239
What does res stand for?
Response
It is what you get from a requests.get (‘URL’) pull
What does res .iter_content (100000) do?
It helps to download files in chuncks instead of having to pull everything at once.
Steps to download and save webpage to harddrive?
import requests
res = requests .get(‘URL’)
FileName= open (‘SaveFileName’, ‘wb’)
for chunk in res.iter_content(100000):
FileName .write(chunk)
FileName .close()
What does the res.raise_for_status() do>
It checks for an error when downloading a webpage
pg 238
how to import the Beautiful Soup module?
import bs4
Steps to create a Beautiful Soup Object from a webpage?
import requests, bs4
res = requests.get(‘URL’)
res.raise _for_status()
Soup Var Name = bs4.BeautifulSoup (res.text)