Exam review Module 10 Flashcards
facebook graph
¥ Series of connections between you and your friends, their friends, etc. (makes up for all of the people on Facebook and how they’re inter-connected.
¥ A graph is some nodes (you, your friends) that are connected by edges.
¥ You can use this to represent physical systems, virtual systems.
¥ ex: Cities, Airports, Computers.
example 2: directed graph
¥ Sally may follow you, and you may follow Sally.
¥ However, you and Sally follow Justin Bieber, but he does not follow you.
¥ (Shows direction of following with arrows, as opposed to a normal graph.)
¥ It can represent more complex relationships
Example 3: World Wide Web
¥ It is called WWW because the graph that represents the web looks like a spider’s web.
¥ Every (almost) webpage is interconnected through hyperlinks, etc.
¥ Represented by a directed graph
Spiders/Crawlers/Robots
¥ A spider is a computer program that starts at one website, and will start exploring the links for other websites. It will go through multiple websites. (It tries until it can’t go any further).
¥ They are constantly searching the internet and collect indexed information so you can search it later.
indexing the web
in malicious ways
: DDos - denial of service attack -> throw millions of spiders to overwhelm a webpage or network)
focus-spider
a spider with focus will focus on one subject like potatoes until it can’t find any more info.
politeness
co-operative, follows instructions, doesn’t bring tons of requests.
revisit frequency
¥ some webpages change a lot, some never change. spiders need to check if anything is new.
Paywalls
subscriptions to websites, also the ability to enter into these websites through a back-doosr.
Dynamic Content
some webpages will adapt according to who is viewing it. (spider might see something different from what you’re seeing).
Query Strings
¥ does the spider stop and only look at the base page, or does it look at the additional pages provided through the query strings?
ex: http://learn.com/class.html?course=cs100&page=3
stop words
The, it, is
These words are so common it is not practical nor useful to even try to index them within the web.
¥ Word Variants
sell, sells, selling, sold, resell, resold, and unsold
When building your index, you may want to treat these words the same way, because if you’re searching for one of them, you’re probably looking for the others as well.
¥ Spelling variants
color vs. colour
You want to keep these in the same index as they also hold the same value.
Synonymy:
big & large
you want them to be on the same indexes, as they hold similar value.