Exam - google search Flashcards

1
Q

Analogy - Internet as a directed graph + practical importance for modelling the web

A

sites hyperlink to other sites –> sometimes 2-way, sometimes one-way

serves as the basis for web discoverability + page ranking for search results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

web crawlers

A

aka bots/spiders

  • automated entities that visit websites and collect information on what the content is
  • may use provided web-maps to effectively collect content information that may otherwise be missed
  • travel to websites using hyperlinks –> notes which websites link to which + how many hyperlinks
  • websites get updated –> need to be re-crawled
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

good webcrawler

A
  • used to develope search queries –> collect info on web content to determine which websites are most relevant to given search results
  • does not recrawl too frequently –> prevent stressing the web server with excess traffic
  • obeys paywalls –> does not break content barriers unless directly permitted by site map code
  • in unable to scrape personal info
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

bad webcrawler

A
  • scrapes web content to duplicate it –> conternt theft
  • gather personal data to generate spam/phishing (may involve exploiting vulnerabilities)
  • generates spam comments in forums/chat
  • ad hosting costs $$$$ –> bot clicks teh ad to intentionally waste advertiser money
  • excess web crawling –> DDOS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Web indexing explained + factors influencing (TRUQD)

A

Analysis of web content to classify website –> use data to shape web results

factors:
- website trustworthiness
- content readability
- content uniqueness
- content quality
- duplication of existing content

TRUQD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Website is crawled, analyzed and indexxed –> how is the index info stored

A

search engine stores keywords + sequence of appearance + frequency of each – >used to gauge relevance to diff topics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

assessing content quality (SHERMIUQ)

A
  • Relevance to search query
  • quality of writing
  • Importance to the problem
  • last updated (recent = better)
  • mobile friendly (friendly = better)
  • HTML structure (organized tags = better)
  • Social media presence (more shares = better)
  • Engagement (longer visits + more views = better)

SHERMIUQ
(social, HTML, Engag, Rel, Mobile, Import, Upda, Quality)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

page ranking - Hyperlinking + factors explained

A
  • more redirects to the page = good
  • redirected from trustworthy sites = good
  • redirected from popular sites = good
  • page is bookmarked more = good
  • more web engagement = good

of redirects + trustworthiness/popularity of redirects + web engagement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

SEO - premise

A

Search engine optimization
- 3rd party company hired to increase web traffic to a website
- SEO reverse engineers the search algo –> determines what factors improve page rank

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

blackhat/evil SEO

A
  • keyword stuffing + content cloaking - embed keywords into website (boost rank) + embed hidden keywords (shows up in irrelevant search queries)
  • embed hidden hyperlinks –> search engine crawlers combat this by analyzing if the links are even seen by the user (unseen = irrelevant)
  • paying other websites to link to customer’s site
  • spamming comments/chats/forums with hyperlinks –> more redirects# (if posted in popular sites, even better)
  • content theft - steal higher quality content to improve site quality ranking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

black hat SEO sabotage

A

send bad traffic to competitors

eg redirects from shady/sketchy sites –> degrade page rank

How well did you know this?
1
Not at all
2
3
4
5
Perfectly