How Do Search Engines Work
How Do Search Engines Work
Search engines do not really search the World Wide Web directly. Each one searches a database of web pages that it has harvested and cached. When you use a search engine, you are always searching a somewhat stale copy of the real web page. When you click on links provided in a search engine's search results, you retrieve the current version of the page. Search engine databases are selected and built by computer robot programs called spiders. These "crawl" the web, finding pages for potential inclusion by following the links in the pages they already have in their database. They cannot use imagination or enter terms in search boxes that they find on the web. If a web page is never linked from any other page, search engine spiders cannot find it. The only way a brand new page can get into a search engine is for other pages to link to it, or for a human to submit its URL for inclusion. All major search engines offer ways to do this. After spiders find pages, they pass them on to another computer program for "indexing." This program identifies the text, links, and other content in the page and stores it in the search engine database's files so that the database can be searched by keyword and whatever more advanced approaches are offered, and the page will be found if your search matches its content. Google is currently the most used search engine. It has one of the largest databases of Web pages, including many other types of web documents (blog posts, wiki pages, group discussion threads and document formats (e.g., PDFs, Word or Excel documents, PowerPoints). Despite the presence of all these formats, Google's popularity ranking often places worthwhile pages near the top of search results. Google alone is not always sufficient, however. Not everything on the Web is fully searchable in Google. Overlap studies show that more than 80% of the pages in a major search engine's database exist only in that database. For this reason, getting a "second opinion" can be worth your time. For this purpose, we recommend Yahoo! Search or Exalead. We do not recommend using meta-search engines as your primary search tool. Table of features Some common techniques will work in any search engine. However, in this very competitive industry, search engines also strive to offer unique features. When in doubt, look for "help", "FAQ", or "about" links.
IMMENSE. Size not HUGE. Claims over LARGE. Claims to have over disclosed in any way 20 billion total "web 8 billion searchable pages. that allows objects." comparison. Probably
the biggest. Noteworthy PageRank system features includes hundreds of factors, emphasizing pages most heavily linked from other pages. Many additional databases including Book Search, Scholar (journal articles), Blog Search, Patents, Images, etc. Phrase searching what's this? Boolean logic what's this? Enclose phrase in "double quotes". Partial. AND assumed between words. Capitalize OR. ( ) accepted but not required. In Advanced Search, partial Boolean available in boxes. Shortcuts give quick access to dictionary, synonyms, patents, traffic, stocks, encyclopedia, and more. Truncation lets you search by the first few letters of a word. Proximity search lets you find terms NEAR each other or NEXT to each other. Thumbnail page previews. Extensive options for refining and limiting your search.
Enclose phrase in "double quotes". Accepts AND, OR, NOT or AND NOT. Must be capitalized. ( ) accepted but not required.
Enclose phrase in "double quotes". Partial. AND assumed between words. Capitalize OR. ( ) accepted. See Web Search Syntax for more options.
+Requires/ - excludes -Excludes + retrieves "stop what's this? words" (e.g., +in) SubSearching what's this? The search box at the top of the results page shows your current search. Modify this (e.g., add more terms at the end.)
- excludes + will allow you to search common words: "+in truth" The search box at the top of the results page shows your current search. Modify this (e.g., add more terms at the end.)
- excludes + retrieves "stop words" (e.g., +in) The search box at the top of the results page shows your current search. Modify this (e.g., add more terms at the end.)
Based on page Automatic Fuzzy popularity measured AND. in links to it from other pages: high rank if a lot of other pages link to it. Fuzzy AND also invoked. Matching and ranking based on "cached" version of pages that may not be the most recent version. link: link:
Popularity ranking emphasizes pages most heavily linked from other pages.
Field
intitle:
site: intitle: inurl: Offers U.S.Gov't Search and other special searches. Patent search.
inurl: site: after:[time period] before:[time period] (For details, click on "Advanced search")
No truncation. Stems Neither. Search with Use * some words. Search OR as in Google. example: messag* variant endings and synonyms separately, separating with OR (capitalized): airline OR airlines Yes. Major Romanized and nonRomanized languages in Advanced Search. Yes. Major Extensive language and Romanized and non- geographic options. Use Romanized "Advanced Search". languages. Available as a separate service. Yes, in "Translate this page" link following some pages.
Language
Translation Yes, in "Translate this page" link following some pages. To and sometimes from English and major European languages and Chinese, Japanese, Korean. Ues its own translation software with user feedback.