Importing Data 2 Flashcards
How do you import beautifulsoup?
from bs4 import BeautifulSoup
Full sample script of using BeautifulSoup
- # Import packages
- import requests
- from bs4 import BeautifulSoup
- # Specify url: url
- url = ‘https://www.python.org/~guido/’
- # Package the request, send the request and catch the response: r
- r = requests.get(url)
- # Extract the response as html: html_doc
- html_doc = r.text
- # Create a BeautifulSoup object from the HTML: soup
- soup=BeautifulSoup(html_doc)
- # Get the title of Guido’s webpage: guido_title
- guido_title=soup.title
- # Print the title of Guido’s webpage to the shell
- print(guido_title)
- # Get Guido’s text: guido_text
- guido_text=soup.get_text()
- # Print Guido’s text to the shell
- print(guido_text)
What does the BeautifulSoup method find_all(‘a’) do?
It returns a result set of all hyperlinks on the page. Sample code as to how to extract all hyperlinks:
Find all ‘a’ tags (which define hyperlinks): a_tags
a_tags=soup.find_all(‘a’)
Print the URLs to the shell
for link in a_tags:
print(link.get(‘href’))
How do you load a json file?
import json with open(“a_movie.json”) as json_file: json_data = json.load(json_file)
Code to load and then read and display json as a dictionary?
Import package
import requests
Assign URL to variable: url
url = ‘http://www.omdbapi.com/?apikey=ff21610b&t=social+network’
Package the request, send the request and catch the response: r
r=requests.get(url)
Decode the JSON data into a dictionary: json_data
json_data=r.json()
Print each key-value pair in json_data
for k in json_data.keys():
print(k + ‘: ‘, json_data[k])