Web Scraper Project Flashcards

1
Q

beautifulsoup method from bs4

syntax and function

A

BeautifulSoup(string/html code, parser)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

.li

A

returns FIRST OCCURRENCE of list item header in html code <li>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

.head method from bs4

A

attach to variable containing beautiful soup object

returns headers from html code (?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are html tags (2)

A

the building blocks of HTML documents, defining the structure and content of the webpage.

Tags are enclosed in angle brackets (< >) and usually come in pairs: an opening tag and a closing tag.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the basic structure of an html tag? (3)

A

opening tag, content, and closing tag, in that order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what about self-closing tags?

A

do not have content and are self-closing. They end with a forward slash before the closing angle bracket.

Example: <img src="image.jpg" alt="Image description" /
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a div tag?

A

<div>: Defines a division or section in an HTML document.
</div>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a span tag?

A

<span>: Defines a section in a document (inline) for styling purposes.</span>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the body tag?

A

<body>: Contains the content of the HTML document that is visible to users.
</body>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the title tag?

A

<title>: Sets the title of the webpage (displayed in the browser tab).
</title>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the head tag?

A

<head>: Contains meta-information about the HTML document (e.g., title, meta tags, links to stylesheets).
</head>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are h1 and h2, etc tags?

A

<h1> to <h6>: Define headings, with <h1> being the highest level and <h6> the lowest.
</h6></h1></h6></h1>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is a p tag?

A

<p>: Defines a paragraph.
</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is an a tag?

A

<a>: Defines a hyperlink.</a>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is an img tag?

A

<img></img>: Embeds an image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is an html class? (4)

A

1 a class is an attribute used to define a group of elements with similar properties. Classes are primarily used for styling and scripting purposes.

2 Reusability: Classes allow you to apply the same styles or behaviors to multiple elements.

3 Multiple Classes: An element can have multiple classes, enabling the combination of different styles and behaviors.

4 CSS and JavaScript Integration: Classes are extensively used in CSS for styling and in JavaScript for dynamic behavior.

17
Q

what do find and find_all do?

A

allow you to locate elements based on tag names, attributes, and more.

18
Q

find method

A

searches for the first occurrence of a specified tag or element that matches the given criteria.

Common Parameters:

name: The name of the tag to search for (e.g., 'div', 'p', 'a').
attrs: A dictionary of attributes to match (e.g., {'class': 'example'}).
recursive: If True (default), it searches within all descendants. If False, it only searches within direct children.
string: A NavigableString or regular expression to search for text content.
19
Q

find all method (bs4)

A

searches for all occurrences of a specified tag or element that match the given criteria and returns them as a list.

20
Q

what is the <ol> tag?

A

ordered list. often used in conjunction with the <li> (list item) tag to define each item within the list. also nestable.

syntax:

<ol>
<li>First item</li>
<li>Second item</li>
<li>Third item</li>
</ol>

21
Q

find all method (bs4)

A

soup.find_all(name, attrs, recursive, string, limit, **kwargs)

Common Parameters:

name: The name of the tag to search for.
attrs: A dictionary of attributes to match.
recursive: If True (default), it searches within all descendants. If False, it only searches within direct children.
string: A NavigableString or regular expression to search for text content.
limit: Limits the number of results returned.
22
Q

html attribute

A

a modifier of an HTML element that provides additional information about the element. Attributes are used to configure elements and can affect their behavior or appearance. always appear in quotes.

23
Q

common html attributes

A

id: A unique identifier for the element within the HTML document.

class: Specifies one or more class names for the element, which can be used by CSS and JavaScript.

src: Specifies the source URL of an embedded content like an image or a script.

href: Specifies the URL of a link.

alt: Provides alternative text for an image, which is displayed if the image cannot be loaded.

title: Provides additional information about the element, often displayed as a tooltip when the mouse hovers over the element.

style: Specifies inline CSS styles for an element.

type: Specifies the type of an input element in forms.

24
Q

what is the generalised format of selecting information using attributes/

A

soup.find_all(attrs = {“attribute_name” : “Value of attribute”})

25
Q

soup.select method

A

allows you to use CSS selectors to locate elements within the parsed HTML document. allows the use of CSS selectors, which can be more flexible and powerful compared to other methods like find or find_all.

26
Q

what is a css selector?

A

a pattern used to select and style elements within an HTML document. CSS selectors define which HTML elements a set of CSS rules apply to.

27
Q

what is the .head method?

A