Mod 10 - Search Engines Flashcards
What is a regular graph (social media)?
It uses nodes and edges to connect things together.
ex. Friends on Facebook (mutual)
What is a directed graph (social media)?
It uses nodes and edges too, but there are also arrows as there are more complex relationships.
ex. follow someone on Twitter but they don’t follow you back or vice versa
How do links work?
Web pages have links to other pages which allows you to “travel” around the web. Hence, worldwide web.
What are spiders?
They start at one web page and explore others linked to it so they can gather info about it
(ex. how google gathers info on pages)
What is a focused spider?
It is a spider that targets a specific topic, only looking at pages related to that topic, & gathering information.
What is a polite spider?
It is very cooperative with other websites and will follow the website instructions.
What is spider revisit frequency?
It considers how often pages change to figure out how often spiders visit that page.
What is the issue with paywalls for spiders?
A subscription is required to view the content, so how do spiders access it to advertise it? Websites open backdoors for the spiders to enter through.
What is the issue with dynamic content & query strings for spiders?
- This content is varied for the different users viewing it - complex what spiders see.
- Do spiders care about query strings - additional information about content
What do web searches need to worry about including (indexing)?
- list occurrences: where, when & how many times a word appears in a website (makes a list)
- punctuation: email vs. e-mail (detects the same thing)
- accents: Beyonce vs. Beyoncé
- “stop” words: the, it, is (don’t index)
- word variants: sell, sells, selling, etc. (indexed together)
What are examples of advanced indexing?
- synonymy: synonyms treated similarly (big & large)
- polysemy: create separate indexes for words spelled the same with different meanings (ex river bank and money bank)
What can evil spiders do?
- They can steal content and claim it as their own
2. They can steal emails to send spam emails
How do search engines search for phrases with more than 1 word?
The search engine goes through all the webpages containing each word separately and looks for combinations and returns those pages.
How does page ranking work?
Pages that have lots of other pages linking to it are often more authoritative sources & thus, more important so they receive a higher ranking (meaning they show up more in searches). It looks for HTML elements that relate to the search (ex. href, title, h1)
How does penalization/rewarding work in page ranking?
Pages will be penalized for having excessive ads or aggregators.
Pages are rewarded for content quality, reputability, and authority sources.