Chapter 12: Networked Programs Flashcards
A Python library for parsing HTML documents and extracting data from HTML documents that compensates for most of the imperfections in the HTML that browsers generally ignore. You can download the code from www.crummy.com.
BeautifulSoup
A number that generally indicates which application you are contacting when you make a socket connection to a server. As an example, web traffic usually uses ____ 80 while email traffic uses ____ 25.
port
When a program pretends to be a web browser and retrieves a web page, then looks at the web page content. Often programs are following the links in one page to find the next page so they can traverse a network of pages or a social network.
scrape
A network connection between two applications where the applications can send and receive data in either direction.
module must be imported
socket
import socket
The act of a web search engine retrieving a page and then all the pages linked from a page and so on until they have nearly all of the pages on the Internet which they use to build their search index.
spider
A set of precise and predetermined rules in hardware and software that determine how data is transmitted between different devices in the same network. Takes large scales processes and breaks them down into smaller functions so that devices can communicate.
protocol
Internet protocol that defines how data is transmitted over the internet and determines how web servers and browsers should respond to commands.
data needs to be sent as bytes objects, not strings
HTTP(S)
hypertext transfer protocol (secure)
syntax to signify EOL (end of line)
syntax to create blank line
\r\n (EOL)
\r\n\r\n (blank line)
technique to receive data from socket in 512-character chunks and prints out data until no more to read (aka recv() returns empty string
while True:
data = mysock.recv(512)
if len(data) < 1:
break
print(data.decode(),end=’’)
socket method to convert strings into bytes objects
.encode()
socket method to convert bytes objects to strings
.decode()
notation to convert strings to bytes objects
b’ ‘
eg. b’Hello World’
method to set amount of time to wait before calling for more data in order to let the server catch up
time.sleep(0.25)
= wait 0.25 seconds between calls
import time
The pausing of either the sending application or the receiving application
flow control
python library that treats a web page like a file
urllib
import urllib.request