Search engines Flashcards
Descripe the process of being listed by a search engine
Preferably: the website owner submits the link to a pay-per-click program, or the crawler discovers new links by crawling the web and add these links to the search index
- The crawler retrieves a page, extracts the links and index the page content
- the crawler selects the next link to visit
important aspects/hinders for crawlers
dynamic links - The crawlers do not like dynamic links since they are affraid of spider traps (dynamic links that points to dynamic pages that points to dynamic links, never ending, trap) and becuase some dynamic links consider parameters such as session Id or user ID which creates different URLs for the same content, waste energy and crawling resources
Sitemaps - not for humans, rather helps finding relevant pages for crawlers to index
Robots exlusion protocol - keeps polite crawlers away from non-searchable areas
Broken links - no document beneath the link
what is a dynamic link?
Not saved documents in a file, they are created on the run, different URL same content
difference in HTML or XML sitemap
HTML sitemap - Used by humans but for crawlers, containing links to all relevant paes
XML sitemap - People can’t see it, can fit alot of information for crawlers that tells the crawler how to search in the web site. only useful when large websites.
You yourself need to tell the search engine you have a sitemap by adding it to your webmaster tool account
Explain robots exlusion protocol, why and how?
Purpose - Instruct the crawler which pages on the website shouldn’t be included in the indexing, it lists folders and separate files which shouldn’t be crawled and indexed. the information is given in robots.txt file or in the robots meta tag of HTML documents.
You can adress crawlers from particular search engines
Why - In order to save crawler resources, by combining site maps and robots exlusion protocols you can facilitate the best crawling experience
What is rel=”nofollow”
It tells the crawler to ignore the link and not crawl it. Also these links are ignored by link analysis algorithms (page rank).
what is frames
Frames are part of the webpage or browser window which display content independently of its container with the ability to load independently. Frames interfer with link analysis, one web doc several URL.. who gets page rank?
importance of keyword matching
Determine if the webpage has relevance to the query
Keywords in the title - the title tells what the page is all about, they must match the expected search query
keywords in the domain name and URL, file path and file name are attributes of the page
Keywords in the beginning of the body text is regardes as more important than the frequency of the keywords
Keywords in the meta tag is unimportant, it is more used as a desciptive summary when showing your result of the search engine, they can look for spamming in the meta tag
Image ALT tag - visible when the mouse passes the image, appear whenever the image is not loaded. Keywords in ALT tag are attributes to the page. i.e. considered
Keywords in the link text
Link/anchor text that points to the web pae has the most important keywords that describe the page. In order to avoid spam the search engine control that the topic of the page is the same as the topic of the link target page
what does link reputation show
It shows what other pages “say” when they link to a page and how qualified these other pages are to have an “opinion” about the linking landing page.
is keyword density an important feature?
No, not really.. keyword density does not consider the placement of the keywords of the text, and search engines do pay attention to where relevant keywords are located
How does images appear in search results
The search engine rely on words in the:
- file name, ALT-tags
- Descriptive text around the image
- Link/anchor text pointing to webpage
- image dimension, bigger is better
Summary; where does search engines look for keywords?
- achor text of incoming links, also text around the link and the title of the page of the link
- title of the page
- domain name:
- Headings
- file name
- body text
- Image ALT-tag, file name, file path
What is considered important to provide a high authority with link analysis
Link diversity - variety of domains - Different authors that think our website is good.
Deep linking –> Specially if the link deep in our
Important that they are topically relevant and links from highly trusted websites transfer a part of the trust
- domain age (older is better)
- physical postal adress
- opinion in social media about the brand/company
- task completion rate on the website: products purchased, subscriptions made etc.
a part of user behavior/ relevance feedback: What is click through rate, time on site
. Nr of click a link gets per 1000 times the link is shown. Amount of time spent on the site. More is better of course