Web Scraping Flashcards
1
Q
HTTP
A
• HTTP follows a request/response paradigm. Client sends a request, server sends a response. Client is usually a web browser and server is usually a remote computer, but these can be running on the same computer as well.
2
Q
HTML
A
- Language for describing and formatting a document as it is encoded and sent over the Internet
- Inspired by languages called GML (Generalized Markup Language, IBM 1969) and SGML (Standard GML, International Standards Organization ISO, 1986). “As a document markup language, SGML was originally designed to enable the sharing of machine-readable large-project documents in government, law, and industry.”
- SGML became familiar HTML at CERN (European Organization for Nuclear Research) over 1989-1993 then transitioned into official worldwide standard.
3
Q
API
A
- Software speaking to other software. What inputs/outputs should look like, what to do when an operation fails.
- Ex: A shipping company could run an API where users send weight and location information to their server and the server returns a price estimate.
4
Q
BeautifulSoup
A
• Initially published by Leonard Richardson in 2004, BeautifulSoup is a library for parsing HTML, i.e. organizing and searching through its contents. Its primary feature is the ability to search through by tags.