AtBS 11 Web Scraping Flashcards

Question 1

Q

How to open up Google from the shell?

Answer

A

import webbrowser

webbrowser.open (‘www.google.com’)

Question 2

Q

What module lets you easily download files from the Web?

Answer

A

request module

need to pip install request the first time to get it.

Question 3

Q

Syntax to download a webpage?

Answer

A

res = requests.get (‘URL’)

pg 237

Question 4

Q

How to check if a request download worked?

Answer

A

res.status_code == requests .codes.ok

should equal True

Question 5

Q

How to find the length of a requests download?

Answer

A

len (res.text)

Question 6

Q

When saving a web page what is important about saving it and why?

Answer

A

Need to save it in a binary format.

Important to do this so that the file can maintain Unicode characters.

pg 239

Question 7

Q

What does res stand for?

Answer

A

Response

It is what you get from a requests.get (‘URL’) pull

Question 8

Q

What does res .iter_content (100000) do?

Answer

A

It helps to download files in chuncks instead of having to pull everything at once.

Question 9

Q

Steps to download and save webpage to harddrive?

Answer

A

import requests

res = requests .get(‘URL’)

FileName= open (‘SaveFileName’, ‘wb’)

for chunk in res.iter_content(100000):
FileName .write(chunk)

FileName .close()

Question 10

Q

What does the res.raise_for_status() do>

Answer

A

It checks for an error when downloading a webpage

pg 238

Question 11

Q

how to import the Beautiful Soup module?

Answer

A

import bs4

Question 12

Q

Steps to create a Beautiful Soup Object from a webpage?

Answer

A

import requests, bs4

res = requests.get(‘URL’)

res.raise _for_status()

Soup Var Name = bs4.BeautifulSoup (res.text)

AtBS 11 Web Scraping Flashcards

(12 cards)