Search Engine Optimization
Search Engine Optimization
DEFINITION - Search Engine optimization is the process of improving your sites ranking in search engines so that your website appears on first page of search engines result for specific largely searched keywords.
listing mentioning 2011 tax rates. Bing advantage: Social integrations are stronger. Bing's results win in terms of the smoothness of its social integrations. The company's contracts with both Facebook and Twitter give it access to more social data than Google. The way it integrates social recommendations into its SERPs is also much less cluttered than Google's results. Google advantage: Instant search saves time. Although Google and Bing both offer instant search features -- which display potential search queries as you're typing into the search box -- Google's instant search tends to provide more relevant results, more quickly. Case StudyTo measure this, I began entering the sample query, "latest Packers score" into both engines. Google provided this as a possible query result after only "latest Pack," while Bing required the full "latest Packers s" to turn up the same result. Bing advantage: Results pages are more attractive. Bing's search results pages have a certain visual appeal. And it appears that there are plenty of people who agree.
Seo techniquesBefore describing the techniques, we should brief about the keywordsKeywords are really important for SEO and you should always check the important keywords or related keywords to attract more traffic to your site. Wordtracker is a free tool which suggest some keywords based on your search term. Wordtracker also filter out some offensive keywords to provide you the best suitable results. Before writing any post or article, it is always better to scan more and more keywords to target them thru the post.
White Hat SEO Techniques To improve a Web page's position in a SERP, you have to know how search engines work. Search engines categorize Web pages based on keywords -important terms that are relevant to the content of the page. Most search engines use computer programs called spiders or crawlers to search the Web and analyze individual pages. These programs read Web pages and index them according to the terms that show up often and in important sections of the page. Most SEO experts recommend that we use important keywords throughout the Web page, particularly at the top, but it's possible to overuse keywords. If we use a keyword too many times, some search engine spiders will flag our page as spam. Keywords aren't the only important factor search engines take into account when generating SERPs. Just because a site uses keywords well doesn't mean it's one of the best resources on the Web. To determine the quality of a Web page, most automated search engines use link analysis. Link analysis means the search engine looks to see how many other Web pages link to the page in question. In other words, if the pages linking to your site are themselves ranked high in Google's system, they boost the page's rank more than lesser-ranked pages. Another way is to offer link exchanges with other sites that cover material related to your content. Too many irrelevant links and the search engine will think one is trying to cheat the system. META TAGS Meta tags provide information about Web pages to computer programs but aren't visible to humans visiting the page. You can create a meta tag that lists keywords for your site, but many search engines skip meta tags entirely because some people use them to exploit search engines. Black Hat SEO Techniques Some people seem to believe that on the Web, the ends justify the means. There are lots of ways webmasters can try to trick search engines into listing their Web pages high in SERPs, though such a victory doesn't usually last very long. One of these methods is called keyword stuffing, which skews search engine results by overusing keywords on the page. Usually webmasters will put repeated keywords towards the bottom of the page where most visitors won't see them. They can also use invisible text, text with a color matching the page's background. Since search engine spiders read content through the page's HTML code, they detect text even if people can't see it. Some search engine spiders can identify and ignore text that matches the page's background color. Webmasters might include irrelevant keywords to trick search engines. The webmasters look to see which search terms are the most popular and then use those words on their Web pages. While search engines might index the page under more keywords, people who follow the SERP links often leave the site
once they realize it has little or nothing to do with their search terms. The page also includes a program that redirects visitors to a different page that often has nothing to do with the original search term. With several pages that each focus on a current hot topic, the webmaster can get a lot of traffic to a particular Web site. Page stuffing also cheats people out of a fair search engine experience. Webmasters first create a Web page that appears high up on a SERP. Then, the webmaster duplicates the page in the hopes that both pages will make the top results. The webmaster does this repeatedly with the intent to push other results off the top of the SERP and eliminate the competition. Most search engine spiders are able to compare pages against each other and determine if two different pages have the same content. Selling and farming links are popular black hat SEO techniques. Because many search engines look at links to determine a Web page's relevancy, some webmasters buy links from other sites to boost a page's rank. A link farm is a collection of Web pages that all interlink with one another in order to increase each page's rank. Small link farms seem pretty harmless, but some link farms include hundreds of Web sites, each with a Web page dedicated just to listing links to every other site in the farm. When search engines detect a link selling scheme or link farm, they flag every site involved. Sometimes the search engine will simply demote every page's rank. In other cases, it might ban all the sites from its indexes. SEO Obstacles The biggest challenge in SEO approaches is finding a content balance that satisfies both the visitors to the Web page and search engine spiders. A site that's entertaining to users might not merit a blip on a search engine's radar. A site that's optimized for search engines may come across as dry and uninteresting to users. It's usually a good idea to first create an engaging experience for visitors, then tweak the page's design so that search engines can find it easily. One potential problem with the way search engine spiders crawl through sites deals with media files. Most people browsing Web pages don't want to look at page after page of text. They want pages that include photos, video or other forms of media to enhance the browsing experience. Unfortunately, most search engines skip over image and video content when indexing a site. For sites that use a lot of media files to convey information, this is a big problem. Some interactive Web pages don't have a lot of text, which gives search engine spiders very little to go on when building an index. The best approach for these webmasters is to use keywords in important places like the title of the page and to get links from other pages that focus on relevant content. .
Below diagram is a concise representation of Googles basis to create the order of the search results displayed on SERP
1. PageRank Page rank is the relevancy score or authority score that Google assigns to a website depending on various factors in its own trademark algorithm. PageRank is like the result of voting system. When you get link from some other website, Google considers it as that website has voted for you, so higher the number of unique and relevant votes, higher is the PageRank that you can expect. But you should remember, every vote is not equal. Thus, link quantity is counted but at the same time the weightage of the link source is also counted in the Google PageRank algorithms. 2. In Links In links are actually the links you get from the owners of other web sites. You can either contact to other site owners to submit your site or request the link from them or you can write or create such an attractive or essential or useful content on your website, that other websites are allured to link back to you. The number of in links are the number of votes your website has got. But remember that do not build the links so rapid that Google would take its as spamming. Keep the process regular, as SEO really yields a lot when done slowly and steadily. 3. Frequency of Keywords Frequency of the keywords means how many times the keywords are relevantly repeated on your page. 4. Location of Keywords
Google also gives importance to the adjacent text near to the keywords because it is irrelevant to repeat the keywords large number of times, without making some textual sense. 5. Trust Rank Millions of websites are created and exhausted every week. Google gives more value to pre-established web pages or domains i.e. which do exist from a bit long time as they have a good Trust Rank in Google.
Methods
Getting indexedThe leading search engines, such as Google, Bing and Yahoo!, use crawlers to find pages for their algorithmic search results. Pages that are linked from other search engine indexed pages do not need to be submitted because they are found automatically. Some search engines, operate a paid submission service that guarantee crawling for either a set fee or cost per click. Such programs usually guarantee inclusion in the database, but do not guarantee specific ranking within the search results. Search engine crawlers may look at a number of different factors when crawling a site. Not every page is indexed by the search engines. Distance of pages from the root directory of a site may also be a factor in whether or not pages get crawled.
Preventing crawlingTo avoid undesirable content in the search indexes, webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file in the root directory of the domain. Additionally, a page can be explicitly excluded from a search engine's database by using a meta tag specific to robots. When a search engine visits a site, the robots.txt located in the root directory is the first file crawled. As a search engine crawler may keep a cached copy of this file, it may on occasion crawl pages a webmaster does not wish crawled. Pages typically prevented from being crawled include login specific pages such as shopping carts and user-specific content such as search results from internal searches. A Web crawler is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing. Web search engines and some other sites use Web crawling or spidering software to update their web content or indexes of others sites' web content. Web crawlers can copy all the pages they visit for later processing by a search engine that indexes the downloaded pages so that users can search them much more quickly. Crawlers can validate hyperlinks and HTML code. They can also be used for web
scraping. A Web crawler starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies. The large volume implies that the crawler can only download a limited number of the Web pages within a given time, so it needs to give priority to its downloads. The high rate of change implies that the pages might have already been updated or even deleted. The number of possible crawlable URLs being generated by server-side software has also made it difficult for web crawlers to avoid retrieving duplicate content. The mathematical combination creates a problem for crawlers, as they must sort through endless combinations of relatively minor scripted changes in order to retrieve unique content.
Selection policyThis requires a metric of importance for prioritizing Web pages. The importance of a page is a function of its intrinsic quality, its popularity in terms of links or visits, and even of its URL. Designing a good selection policy has an added difficulty: it must work with partial information, as the complete set of Web pages is not known during crawling. The importance of a page for a crawler can also be expressed as a function of the similarity of a page to a given query. Web crawlers that attempt to download pages that are similar to each other are called focused crawler or topical crawlers.
Restricting followed linksA crawler may only want to seek out HTML pages and avoid all other MIME types. In order to request only HTML resources, a crawler may make an HTTP HEAD request to determine a Web resource's MIME type before requesting the entire resource with a GET request. To avoid making numerous HEAD requests, a crawler may examine the URL and only request a resource if the URL ends with certain characters such as .html, .htm, .asp, .aspx, .php, .jsp, .jspx or a slash. This strategy may cause numerous HTML Web resources to be unintentionally skipped. Some crawlers may also avoid requesting any resources that have a "?" in them (are dynamically produced) in order to avoid spider traps that may cause the crawler to download an infinite number of URLs from a Web site. This strategy is unreliable if the site uses a rewrite engine to simplify its URLs.
Re-visit policyThe Web has a very dynamic nature, and crawling a fraction of the Web can take weeks or months. By the time a Web crawler has finished its crawl, many events could have happened, including creations, updates and deletions. From the search engine's point of view, there is a cost associated with not detecting an event, and thus having an outdated copy of a resource. The mostused cost functions are freshness and age. Freshness: This is a binary measure that indicates whether the local copy is accurate or not.
Proportional policy:
This involves re-visiting more often the pages that change more frequently. The visiting frequency is directly proportional to the (estimated) change frequency. In both cases, the repeated crawling order of pages can be done either in a random or a fixed order. The freshness of rapidly changing pages lasts for shorter period than that of less frequently changing pages. In other words, a proportional policy allocates more resources to crawling frequently updating pages, but experiences less overall freshness time from them. To improve freshness, the crawler should penalize the elements that change too often. The optimal re-visiting policy is neither the uniform policy nor the proportional policy. The optimal method for keeping average freshness high includes ignoring the pages that change too often, and the optimal for keeping average age low is to use access frequencies that monotonically increase with the rate of change of each page. In both cases, the optimal is closer to the uniform policy than to the proportional policy.
Politeness policyCrawlers can retrieve data much quicker and in greater depth than human searchers, so they can have a crippling impact on the performance of a site. Needless to say, if a single crawler is performing multiple requests per second and/or downloading large files, a server would have a hard time keeping up with requests from multiple crawlers. Parallelisation policyA parallel crawler is a crawler that runs multiple processes in parallel. The goal is to maximize the download rate while minimizing the overhead from parallelization
and to avoid repeated downloads of the same page. To avoid downloading the same page more than once, the crawling system requires a policy for assigning the new URLs discovered during the crawling process, as the same URL can be found by two different crawling processes.
Crawler identificationWeb crawlers typically identify themselves to a Web server by using the Useragent field of an HTTP request. Web site administrators typically examine their Web servers' log and use the user agent field to determine which crawlers have visited the web server and how often. The user agent field may include a URL where the Web site administrator may find out more information about the crawler. Examining Web server log is tedious task therefore some administrators use tools such as CrawlTrack or SEO Crawlytics to identify, track and verify Web crawlers. Spambots and other malicious Web crawlers are unlikely to place identifying information in the user agent field, or they may mask their identity as a browser or other well-known crawler. It is important for Web crawlers to identify themselves so that Web site administrators can contact the owner if needed. In some cases, crawlers may be accidentally trapped in a crawler trap or they may be overloading a Web server with requests, and the owner needs to stop the crawler. Identification is also useful for administrators that are interested in knowing when they may expect their Web pages to be indexed by a particular search engine.
Web indexingWeb indexing (or Internet indexing) refers to various methods for indexing the contents of a website or of the Internet as a whole. Individual websites or intranets may use a back-of-the-book index, while search engines usually use keywords and metadata to provide a more useful vocabulary for Internet or onsite searching. With the increase in the number of periodicals that have articles online, web indexing is also becoming important for periodical websites. Web indexes may be called "web site A-Z indexes". The implication with "A-Z" is that there is an alphabetical browse view or interface. This interface differs from that of a browse through layers of hierarchical categories which are not necessarily alphabetical, but are also found on some web sites. Although an A-Z index could be used to index multiple sites, rather than the multiple pages of a single site, this is unusual.
Web scraping
Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either
implementing low-level Hypertext Transfer Protocol (HTTP), or embedding a fully-fledged web browser, such as Internet Explorer or Mozilla Firefox. Web scraping is closely related to web indexing, which indexes information on the web using a bot or web crawler and is a universal technique adopted by most search engines. In contrast, web scraping focuses more on the transformation of unstructured data on the web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet. Web scraping is also related to web automation, which simulates human browsing using computer software. Uses of web scraping include online price comparison, weather data monitoring, website change detection, research, web mashup and web data integration.
CONCLUSION
----xxx----