SEO is the active practice of optimizing a web site by improving internal and external aspects in order to increase the traffic the site receives from search engines. Firms that practice SEO can vary; some havea highly specialized focus while others take a more broad and general approach. Optimizing a web site for search engines can require looking at so many unique elements that many practitioners of SEO (SEOs) consider themselves to be in the broad field of website optimization (since so many of those elements intertwine). This guide is designed to describe all areas of SEO - from discovery of the terms and phrases that will generate traffic, to making a site search engine friendly to building the links and marketing the unique value of the site/organization's offerings.
Search engines have a short list of critical operations that allows them to provide relevant web results when searchers use their system to find information.
1. Crawling the Web Search engines run automated programs, called "bots" or "spiders" that use the hyperlink structure of the web to "crawl" the pages and documents that make up the World Wide Web. Estimates are that of the approximately 20 billion existing pages, search engines have crawled between 8 and 10 billion.
2. Indexing Documents Once a page has been crawled, it's contents can be "indexed" - stored in a giant database of documents that makes up a search engine's "index". This index needs to be tightly managed, so that requests which must search and sort billions of documents can be completed in fractions of a second.
3. Processing Queries When a request for information comes into the search engine (hundreds of millions do each day), the engine retrieves from its index all the document that match the query. A match is determined if the terms or phrase is found on the page in the manner specified by the user. For example, a search for car and driver magazine at Google returns 8.25 million results, but a search for the same phrase in quotes ("car and driver magazine") returns only 166 thousand results. In the first system, commonly called "Findall" mode, Google returned all documents which had the terms "car" "driver" and "magazine" (they ignore the term "and" because it's not useful to narrowing the results), while in the second search, only those pages with the exact phrase "car and driver magazine" were returned. Other advanced operators (Google has a list of 11) can change which results a search engine will consider a match for a given query.
4. Ranking Results Once the search engine has determined which results are a match for the query, the engine's algorithm (a mathematical equation commonly used for sorting) runs calculations on each of the results to determine which is most relevant to the given query. They sort these on the results pages in order from most relevant to least so that users can make a choice about which to select. Although a search engine's operations are not particularly lengthy, systems like Google, Yahoo!, AskJeeves and MSN are among the most complex, processing-intensive computers in the world, managing millions of calculations each second and funneling demands for information to an enormous group of users.
Modern commercial search engines rely on the science of information retrieval (IR). That science has existed since the middle of the 20th century, when retrieval systems powered computers in libraries, research facilities and government labs. Early in the development of search systems, IR scientists realized that two critical components made up the majority of search functionality: Relevance - the degree to which the content of the documents returned in a search matched the user's query intention and terms. The relevance of a document increases if the terms or phrase queried by the user occurs multiple times and shows up in the title of the work or in important headlines or subheaders. Popularity - the relative importance, measured via citation (the act of one work referencing another, as often occurs in academic and business documents) of a given document that matches the user's query. The popularity of a given document increases with every other document that references it. These two items were translated to web search 40 years later and manifest themselves in the form of document analysis and link analysis. In document analysis, search engines look at whether the search terms are found in important areas of the document - the title, the meta data, the heading tags and the body of text content. They also attempt to automatically measure the quality of the document (through complex systems beyond the scope of this guide). In link analysis, search engines measure not only who is linking to a site or page, but what they are saying about that page/site. They also have a good grasp on who is affiliated with whom (through historical link data, the site's registration records and other sources), who is worthy of being trusted (links from .edu and .gov pages are generally more valuable for this reason) and contextual data about the site the page is hosted on (who links to that site, what they say about the site, etc.). Link and document analysis combine and overlap hundreds of factors that can be individually measured and filtered through the search engine algorithms (the set of instructions that tell the engines what importance to assign to each factor). The algorithm then determines scoring for the documents and (ideally) lists results in decreasing order of importance (rankings).
Search engines rely on the terms queried by users to determine which results to put through their algorithms, order and return to the user. But, rather than simply recognizing and retrieving exact matches for query terms, search engines use their knowledge of semantics (the science of language) to construct intelligent matching for queries. An example might be a search for loan providers that also returned results that did not contain that specific phrase, but instead had the term lenders. The engines collect data based on the frequency of use of terms and the co-occurrence of words and phrases throughout the web. If certain terms or phrases are often found together on pages or sites, search engines can construct intelligent theories about their relationships. Mining semantic data through the incredible corpus that is the Internet has given search engines some of the most accurate data about word ontologies and the connections between words ever assembled artificially. This immense knowledge of language and its usage gives them the ability to determine which pages in a site are topically related, what the topic of a page or site is, how the link structure of the web divides into topical communties and much, much more. Search engines' growing artificial intelligence on the subject of language means that queries will increasingly return more intelligent, evolved results. This heavy investment in the field of natural language processing (NLP) will help to achieve greater understanding of the meaning and intent behind their users' queries. Over the long term, users can expect the results of this work to produce increased relevancy in the SERPs (Search Engine Results Pages) and more accurate guesses from the engines as to the intent of a user's queries.