Web Scraping/Crawling Flashcards

1
Q

What type of request is used to fetch the content of a web page from a web server?

A

GET

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What standard Python library contains functions for requesting data across the web, handling cookies, and even changing metadata?

A

urllib

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What import statement was used in our Lab and Group Project to import BeautifulSoup?

A

from bs4 import BeautifulSoup

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What Wikipedia page did you webscrape in Lab 8?

A

1999-2000 FA Premier League - Wikipedia

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What two fields(columns) required you to parse the text within the <a>…</a> tags?

A

Manager & Captain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When dealing with HTML elements mapped out as a tree, which elements are exactly one tag below a parent tag?

A

children

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What store was the subject of the web scraping program reviewed during this lesson?

A

Family Dollar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the latest version of HTML?

A

HTML5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is it called when the code accesses a URL, examines that page for another URL, retrieves that page, in a recursive process?

A

web crawling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is web scraping?

A

An automated process of gathering large amounts of data from the Internet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Other names for web scraping:

A
  • screen scraping
  • data mining
  • web harvesting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why use web scraping?

A
  • useful data is available on the web, but isn’t available via downloads or APIs
  • price comparison info
  • social media scraping (what’s trending?)
  • research (stats, weather data, etc)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Difference between web scraping vs web crawling:

A

Web scraping: generally inspect a single web page or two to get the data they’re looking for.

Web crawling: “crawl across the web” following links from web page to web page recursively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Applications for web crawling:

A
  • generating a site map

- gathering data about a specific topic from a large number of websites

How well did you know this?
1
Not at all
2
3
4
5
Perfectly