Tuesday, February 20, 2007

The Importance of Backlinks

If you've read anything about or studied Search Engine Optimization, you've come across the term "backlink" at least once. For those of you new to SEO, you may be wondering what a backlink is, and why they are important. Backlinks have become so important to the scope of Search Engine Optimization, that they have become some of the main building blocks to good SEO. In this article, we will explain to you what a backlink is, why they are important, and what you can do to help gain them while avoiding getting into trouble with the Search Engines.

What are "backlinks"? Backlinks are links that are directed towards your website. Also knows as Inbound links (IBL's). The number of backlinks is an indication of the popularity or importance of that website. Backlinks are important for SEO because some search engines, especially Google, will give more credit to websites that have a good number of quality backlinks, and consider those websites more relevant than others in their results pages for a search query.

When search engines calculate the relevance of a site to a keyword, they consider the number of QUALITY inbound links to that site. So we should not be satisfied with merely getting inbound links, it is the quality of the inbound link that matters.
A search engine considers the content of the sites to determine the QUALITY of a link. When inbound links to your site come from other sites, and those sites have content related to your site, these inbound links are considered more relevant to your site. If inbound links are found on sites with unrelated content, they are considered less relevant. The higher the relevance of inbound links, the greater their quality.

For example, if a webmaster has a website about how to rescue orphaned kittens, and received a backlink from another website about kittens, then that would be more relevant in a search engine's assessment than say a link from a site about car racing. The more relevant the site is that is linking back to your website, the better the quality of the backlink.

Search engines want websites to have a level playing field, and look for natural links built slowly over time. While it is fairly easy to manipulate links on a web page to try to achieve a higher ranking, it is a lot harder to influence a search engine with external backlinks from other websites. This is also a reason why backlinks factor in so highly into a search engine's algorithm. Lately, however, a search engine's criteria for quality inbound links has gotten even tougher, thanks to unscrupulous webmasters trying to achieve these inbound links by deceptive or sneaky techniques, such as with hidden links, or automatically generated pages whose sole purpose is to provide inbound links to websites. These pages are called link farms, and they are not only disregarded by search engines, but linking to a link farm could get your site banned entirely.

Another reason to achieve quality backlinks is to entice visitors to come to your website. You can't build a website, and then expect that people will find your website without pointing the way. You will probably have to get the word out there about your site. One way webmasters got the word out used to be through reciprocal linking. Let's talk about reciprocal linking for a moment.

There is much discussion in these last few months about reciprocal linking. In the last Google update, reciprocal links were one of the targets of the search engine's latest filter. Many webmasters had agreed upon reciprocal link exchanges, in order to boost their site's rankings with the sheer number of inbound links. In a link exchange, one webmaster places a link on his website that points to another webmasters website, and vice versa. Many of these links were simply not relevant, and were just discounted. So while the irrelevant inbound link was ignored, the outbound links still got counted, diluting the relevancy score of many websites. This caused a great many websites to drop off the Google map.

We must be careful with our reciprocal links. There is a Google patent in the works that will deal with not only the popularity of the sites being linked to, but also how trustworthy a site is that you link to from your own website. This will mean that you could get into trouble with the search engine just for linking to a bad apple. We could begin preparing for this future change in the search engine algorithm by being choosier with which we exchange links right now. By choosing only relevant sites to link with, and sites that don't have tons of outbound links on a page, or sites that don't practice black-hat SEO techniques, we will have a better chance that our reciprocal links won't be discounted.

Many webmasters have more than one website. Sometimes these websites are related, sometimes they are not. You have to also be careful about interlinking multiple websites on the same IP. If you own seven related websites, then a link to each of those websites on a page could hurt you, as it may look like to a search engine that you are trying to do something fishy. Many webmasters have tried to manipulate backlinks in this way; and too many links to sites with the same IP address is referred to as backlink bombing.

One thing is certain: interlinking sites doesn't help you from a search engine standpoint. The only reason you may want to interlink your sites in the first place might be to provide your visitors with extra resources to visit. In this case, it would probably be okay to provide visitors with a link to another of your websites, but try to keep many instances of linking to the same IP address to a bare minimum. One or two links on a page here and there probably won't hurt you.

There are a few things to consider when beginning your backlink building campaign. It is helpful to keep track of your backlinks, to know which sites are linking back to you, and how the anchor text of the backlink incorporates keywords relating to your site. A tool to help you keep track of your backlinks is the Domain Stats Tool. This tool displays the backlinks of a domain in Google, Yahoo, and MSN. It will also tell you a few other details about your website, like your listings in the Open Directory, or DMOZ, from which Google regards backlinks highly important; Alexa traffic rank, and how many pages from your site that have been indexed, to name just a few.

Another tool to help you with your link building campaign is the Backlink Builder Tool. It is not enough just to have a large number of inbound links pointing to your site. Rather, you need to have a large number of QUALITY inbound links. This tool searches for websites that have a related theme to your website which are likely to add your link to their website. You specify a particular keyword or keyword phrase, and then the tool seeks out related sites for you. This helps to simplify your backlink building efforts by helping you create quality, relevant backlinks to your site, and making the job easier in the process.

There is another way to gain quality backlinks to your site, in addition to related site themes: anchor text. When a link incorporates a keyword into the text of the hyperlink, we call this quality anchor text. A link's anchor text may be one of the under-estimated resources a webmaster has. Instead of using words like "click here" which probably won't relate in any way to your website, using the words "Please visit our tips page for how to nurse an orphaned kitten" is a far better way to utilize a hyperlink. A good tool for helping you find your backlinks and what text is being used to link to your site is the Backlink Anchor Text Analysis Tool. If you find that your site is being linked to from another website, but the anchor text is not being utilized properly, you should request that the website change the anchor text to something incorporating relevant keywords. This will also help boost your quality backlinks score.

Building quality backlinks is extremely important to Search Engine Optimization, and because of their importance, it should be very high on your priority list in your SEO efforts. We hope you have a better understanding of why you need good quality inbound links to your site, and have a handle on a few helpful tools to gain those links.

Do not ever do “black hat” SEO

The fight to top search engines' results knows no limits – neither ethical, nor technical. There are often reports of sites that have been temporarily or permanently excluded from Google and the other search engines because of malpractice and using “black hat” SEO optimization techniques. The reaction of search engines is easy to understand – with so many tricks and cheats that SEO experts include in their arsenal, the relevancy of returned results is seriously compromised to the point where search engines start to deliver completely irrelevant and manipulated search results. And even if search engines do not discover your scams right away, your competitors might report you.

Keyword Density or Keyword Stuffing?
Sometimes SEO experts go too far in their desire to push their clients' sites to top positions and resort to questionable practices, like keyword stuffing. Keyword stuffing is considered an unethical practice because what you actually do is use the keyword in question throughout the text suspiciously often. Having in mind that the recommended keyword density is from 3 to 7%, anything above this, say 10% density starts to look very much like keyword stuffing and it is likely that will not get unnoticed by search engines. A text with 10% keyword density can hardly make sense, if read by a human. Some time ago Google implemented the so called “Florida Update” and essentially imposed a penalty for pages that are keyword-stuffed and over-optimized in general.

Generally, keyword density in the title, the headings, and the first paragraphs matters more. Needless to say that you should be especially careful not to stuff these areas. Try the Keyword Density Cloud tool to check if your keyword density is in the acceptable limits, especially in the above-mentioned places. If you have a high density percentage for a frequently used keyword, then consider replacing some of the occurrences of the keyword with synonyms. Also, generally words that are in bold and/or italic are considered important by search engines but if any occurrence of the target keywords is in bold and italic, this also looks unnatural and in the best case it will not push your page up.

Doorway Pages and Hidden Text
Another common keyword scam is doorway pages. Before Google introduced the PageRank algorithm, doorways were a common practice and there were times when they were not considered an illegal optimization. A doorway page is a page that is made especially for the search engines and that has no meaning for humans but is used to get high positions in search engines and to trick users to come to the site. Although keywords are still very important, today keywords alone have less effect in determining the position of a site in search results, so doorway pages do not get so much traffic to a site but if you use them, don't ask why Google punished you.

Very similar to doorway pages was a scam called hidden text. This is text, which is invisible to humans (e.g. the text color is the same as the page background) but is included in the HTML source of the page, trying to fool search engines that the particular page is keyword-rich. Needless to say, both doorway pages and hidden text can hardly be qualified as optimization techniques, there are more manipulation than everything else.

Duplicate Content
It is a basic SEO rule that content is king. But not duplicate content. In terms of Google, duplicate content means text that is the same as the text on a different page on the SAME site (or on a sister-site, or on a site that is heavily linked to the site in question and it can be presumed that the two sites are related) – i.e. when you copy and paste the same paragraphs from one page on your site to another, then you might expect to see your site's rank drop. Most SEO experts believe that syndicated content is not treated as duplicate content and there are many examples of this. If syndicated content were duplicate content, that the sites of news agencies would have been the first to drop out of search results. Still, it does not hurt to check from time if your site has duplicate content with another, at least because somebody might be illegally copying your content and you do not know. The Similar Page Checker tool will help you see if you have grounds to worry about duplicate content.

Links Spam
Links are another major SEO tool and like the other SEO tools it can be used or misused. While backlinks are certainly important (for Yahoo backlinks are important as quantity, while for Google it is more important what sites backlinks come from), getting tons of backlinks from a link farm or a blacklisted site is begging to be penalized. Also, if outbound links (links from your site to other sites) considerably outnumber your inbound links (links from other sites to your site), then you have put too much effort in creating useless links because this will not improve your ranking. You can use the Domain Stats Tool to see the number of backlinks (inbound links) to your site and the Site Link Analyzer to see how many outbound links you have.

Using keywords in links (the anchor text), domain names, folder and file names does boost your search engine rankings but again, the precise measure is the boundary between topping the search results and being kicked out of them. For instance, if you are optimizing for the keyword “cat”, which is a frequently chosen keyword and as with all popular keywords and phrases, competition is fierce, you might not see other alternative for reaching the top but getting a domain name like http://www.cat-cats-kittens-kitty.com, which no doubt is packed with keywords to the maximum but is first – difficult to remember, and second – if the contents does not correspond to the plenitude of cats in the domain name, you will never top the search results.

Although file and folder names are less important than domain names, now and then (but definitely not all the time) you can include “cat” (and synonyms) in them and in the anchor text of the links. This counts well, provided that anchors are not artificially stuffed (for instance if you use “cat_cats_kitten” as anchor for internal site links this anchor certainly is stuffed). While you have no control over third sides that link to you and use anchors that you don't like, it is up to you to perform periodic checks what anchors do other sites use to link to you. A handy tool for this task is the Backlink Anchor Text Analysis, where you enter the URL and get a listing of the sites that link to you and the anchor text they use.

Finally, to Google and the other search engines it makes no difference if a site is intentionally over-optimized to cheat them or over-optimization is the result of good intentions, so no matter what your motives are, always try to keep to reasonable practices and remember that do not overstep the line.


What is Robots.txt

It is great when search engines frequently visit your site and index your content but often there are cases when indexing parts of your online content is not what you want. For instance, if you have two versions of a page (one for viewing in the browser and one for printing), you'd rather have the printing version excluded from crawling, otherwise you risk being imposed a duplicate content penalty. Also, if you happen to have sensitive data on your site that you do not want the world to see, you will also prefer that search engines do not index these pages (although in this case the only sure way for not indexing sensitive data is to keep it offline on a separate machine). Additionally, if you want to save some bandwidth by excluding images, stylesheets and javascript from indexing, you also need a way to tell spiders to keep away from these items.

One way to tell search engines which files and folders on your Web site to avoid is with the use of the Robots metatag. But since not all search engines read metatags, the Robots matatag can simply go unnoticed. A better way to inform search engines about your will is to use a robots.txt file.

What Is Robots.txt?
Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too na├»ve to rely on robots.txt to protect it from being indexed and displayed in search results.

The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it – they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory (i.e. http://mydomain.com/robots.txt) and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. So, if you don't put robots.txt in the right place, do not be surprised that search engines index your whole site.

The concept and structure of robots.txt has been developed more than a decade ago and if you are interested to learn more about it, visit http://www.robotstxt.org/ or you can go straight to the Standard for Robot Exclusion because in this article we will deal only with the most important aspects of a robots.txt file. Next we will continue with the structure a robots.txt file.

Structure of a Robots.txt File
The structure of a robots.txt is pretty simple (and barely flexible) – it is an endless list of user agents and disallowed files and directories. Basically, the syntax is as follows:



“User-agent” are search engines' crawlers and disallow: lists the files and directories to be excluded from indexing. In addition to “user-agent:” and “disallow:” entries, you can include comment lines – just put the # sign at the beginning of the line:

# All user agents are disallowed to see the /temp directory.

User-agent: *

Disallow: /temp/

The Traps of a Robots.txt File
When you start making complicated files – i.e. you decide to allow different user agents access to different directories – problems can start, if you do not pay special attention to the traps of a robots.txt file. Common mistakes include typos and contradicting directives. Typos are misspelled user-agents, directories, missing colons after User-agent and Disallow, etc. Typos can be tricky to find but in some cases validation tools help.

The more serious problem is with logical errors. For instance:

User-agent: *

Disallow: /temp/

User-agent: Googlebot

Disallow: /images/

Disallow: /temp/

Disallow: /cgi-bin/

The above example is from a robots.txt that allows all agents to access everything on the site except the /temp directory. Up to here it is fine but later on there is another record that specifies more restrictive terms for Googlebot. When Googlebot starts reading robots.txt, it will see that all user agents (including Googlebot itself) are allowed to all folders except /temp/. This is enough for Googlebot to know, so it will not read the file to the end and will index everything except /temp/ - including /images/ and /cgi-bin/, which you think you have told it not to touch. You see, the structure of a robots.txt file is simple but still serious mistakes can be made easily.

Tools to Generate and Validate a Robots.txt File
Having in mind the simple syntax of a robots.txt file, you can always read it to see if everything is OK but it is much easier to use a validator, like this one: http://tool.motoricerca.info/robots-checker.phtml. These tools report about common mistakes like missing slashes or colons, which if not detected compromise your efforts. For instance, if you have typed:

User agent: *

Disallow: /temp/

this is wrong because there is no slash between “user” and “agent” and the syntax is incorrect.

In those cases, when you have a complex robots.txt file – i.e. you give different instructions to different user agents or you have a long list of directories and subdirectories to exclude, writing the file manually can be a real pain. But do not worry – there are tools that will generate the file for you. What is more, there are visual tools that allow to point and select which files and folders are to be excluded. But even if you do not feel like buying a graphical tool for robots.txt generation, there are online tools to assist you. For instance, the Server-Side Robots Generator offers a dropdown list of user agents and a text box for you to list the files you don't want indexed. Honestly, it is not much of a help, unless you want to set specific rules for different search engines because in any case it is up to you to type the list of directories but is more than nothing.

