06: Search Engines Flashcards

Question

**Write an http-equiv meta tag indicating the page should not be cached.**

Answer 1

* Search engines **must by definition download** and **save URLs** since they identify the link to the resource

Answer 2

* Bad SEO URLs work fine for programs **but cannot be read by humans** * Can be improved by adding: * **Descriptive path** components * **Descriptive file names**

Answer 3

* Sites that **rely heavily on JavaScript or Flash** for their content and navigation **will suffer from poor indexing**

Answer 4

nest the navigation inside of **"nav"** tags to demonstrate semantically that these links exist to navigate your site

Answer 5

* **Formal framework that captures website structure** * Using XML, defines a URl set for the root item, then as many URL items as desired for the site

Answer 6

* Anchor text of links indexed along with backlinks * e.g. "Click here" * Use of anchor text is **not encouraged** as it **says little about what will be at that URL** * e.g. link to a page of **services** should read **Services and Rates**

Answer 7

* The file name is first element that can be optimized since it can be parsed for words * e.g. instead of **1.png**, it should be **rose.png** * ****Use the **alt attribute** to give a textual description of the image that can help site ranking * Utilize **anchor text** if there is a link to the image * e.g. instead of **"Full size"**, it should be **"Full size image of a red rose"**

Answer 8

* Search engines tend to prefer pages that are updated regularly over those who are static * If your website can permit users to comment or write content on your site, you should consider enabling it * The idea of having users generate content is now extremely important

Answer 9

Google and other search engines may **punish or ban your site from their results**

Answer 10

Any technique that **uses the content of a website** to try and manipulate search engine results

Answer 11

* **Keyword stuffing** * **Hidden content** * **Paid links** * **Doorway pages**

Answer 12

* A technique whereby **you purposely add keywords into the site** in a most unnatural way with the intention of increasing the affiliation between certain key items and your URL * As keywords are added throughout a web page, the content becomes **diluted** with them * **Meaningful sentences** **are replaced with content written primarily for robots**, not humans * Any technique where you find yourself writing for robots before humans, as a rule of thumb, is **discouraged**

Answer 13

**making irrelevant words the same color as the background to hide them**

Answer 14

* Frowned upon by many search engines since **their intent is to discover good content by relying on referrals** (i.e. backlinks) * **Purchased advertisements on a site are not considered paid links** so long as they are well identified as such, and are not hidden in the body of a page * Many link affiliated programs (like Google’s own AdWords) **do not impact PageRank** because the advertisements are shown using JavaScript

Answer 15

* Pages **written to be indexed by search engines and included in search results** * **Normally crammed full of keywords**, and effectively useless to real users of your site * These doorway pages **then link to your home page**, which you are trying to boost in the search results

Answer 16

* **Hidden links** * **Comment spam** * **Link farms** * **Link pyramids**

Answer 17

* **Same as hidden content** * With **hidden links websites hide the color of the link to match the background**, hoping that * **Real users will not see the links** * **Search engines will follow the links**, thus manipulating the search engine without impacting the human reader.

Answer 18

**Automated process utilizing bots that scour the web for comment sections and leave poorly auto-written spam with backlinks to their sites** ## Footnote (\* be sure to secure a comment section on your site or you will be flagged as a source of comment spam)

Answer 19

**Set of websites that all interlink each other with the intent of sharing any incoming PageRank to any one site with all the sites that are members of the link farm**

Answer 20

* Similar to link farms in that there is a great deal of interlinking * Unlike a link farm, **a pyramid has the intention of promoting one or two sites**

Answer 21

* **Google Bowling** * **Cloaking** * **Duplicate content**

Answer 22

* **Requires masquerading as the site you want to weaken/remove** * black-hat techniques are applied as though you were working on their behalf. This might include subscribing to link farms, keyword stuffing, commenting on blogs, and more * report the competitors’ website to Google for all the black-hat techniques they employed!

Answer 23

* Process of **identifying crawler requests** and **serving them content different from regular users** * A simple script can **redirect users if *googlebot* is the user-agent to a page,** normally stuffed with keywords

Answer 24

* **Stealing content to build a fake site** * To attribute content to yourself use the **rel=author** attribute ## Footnote (\* Google has also introduced a concept called Google authorship through their Google+ network to attribute content to the originator.) * Sometimes you have several versions of a page, for example, **a display and print version** * **To prevent being penalized, you can use the canonical tag in the head section of duplicate pages** to affiliate them with a single canonical version to be indexed

Answer 25

* **Triplet processing** * **Lack of Sensitivity to Vocabulary** * **Extracting Information from Several Resources** * **Searching RDF (OWL) ontologies** * **Merging ontologies** * **Integrating knowledge from different sources** * **Having Web Inference Capability** * **Efficiency in crawling, page ranking, and indexing** * **Handling trust** * **Valuing Security**

Answer 26

The main difference is that they need to deal with **_federative datasets_** rather simple **_HTML_** files. **_URI_** play key roles for such possible future semantic web engines.