What is Duplicate Content?

To be a valuable search engine you need to have a fresh index with lots of new information in it – all the time. To get this information a search engine needs to crawl the web and then filter this information into usable chunks of data that relate to search queries.

One of the things search engines do to ensure quality search results is to remove duplicate content from their index. Duplicate content fills up search engines with superfluous content and creates a bad user experience for searchers.  

what is duplicate contentCould you imagine getting the same article from different websites on all ten results from a search engine page?

Searching for information, products, services or answers through a search engine is about getting a range of different, relevant results to make an informed choice about something, someone or somewhere. Search engines know this – that’s why they spend so much time making sure the content that is served up for a search query is relevant, non-spammy, not duplicated across other pages and will create a great user experience.

So we’ve established that search engines don’t like duplicate content, but what exactly is duplicate content? Like the name implies, it’s content on your website that is identical or incredibly similar to other content. As mentioned before, this is bad for search engines so making sure you have unique content on your website is a must if you want any chance of ranking well.

What types of duplicate content are there?

There are many reasons for duplicate content on your website, here are just a few:-

Print pages

If you have a normal web page and then an additional ‘print friendly’ page, you now have two copies of that page on your site – or in other words, a duplicate copy. Search engines are getting better at filtering out duplicate content, but don’t leave it up to the engine's algorithm to get these out of their index. Fix this issue if you can. 

To find out if a search engine has two copies of your page in the index, use the site: operator (detailed below) or set your website up in Google Webmaster Tools and go to Search Appearance / HTML Improvements (here’s more info about the HTML improvements feature).

FIX: You could put all your print friendly pages into a directory on your server and then disallow the search engine crawlers to this directory using a robots.txt file
If you’re running a Joomla website there’s a good chance you will be on an Apache server, so you can add this to your .htaccess file also. 
     <FilesMatch ".(doc|pdf)$">
     Header set X-Robots-Tag "index, noarchive, nosnippet"

Canonicalization issues

Don’t worry… I thought “what the hell does that word mean?” when I first saw it too. Essentially your website homepage could have multiple URLs pointing to it. For example:-   

  • http://yourwebsite.com   
  • http://www.yourwebsite.com   
  • http://yourwebsite.com/index.htm   
  • http://www.yourwebsite.com/index.htm   
  • https://yourwebsite.com   
  • https://www.yourwebsite.com   
  • https://yourwebsite.com/index.htm   
  • https://www.yourwebsite.com/index.htm

These could all point to your homepage (please note this is an extreme example but still possible all the same). If a search engine crawls and indexes all these versions of URLs there could be multiple versions of your homepage – or worse, your entire website in the index. This would be very bad news indeed.

One thing to keep in mind is that if your competitors are smart and notice that you haven’t re-directed your URLs correctly, they could point links from other websites or directories to your different URLs causing a search engine to crawl these (basically creating a forced crawl of all your different URLs) that could lead to a drop in rankings.

FIX: First you will need to find out if there are any additional versions of your website or homepage in the index.
You can do this by using the site: operator (put site: before all your URLs in a search engine’s search box to check if they’re in the index eg. site:http://www.yourwebsite.com). 

If you have multiple versions of your site in a search engine’s index you will need to ‘301 re-direct’ the unwanted URLs to your main URL as a fix. Manufacturers product descriptionsIf you’re selling a product online through a distributor or manufacturer, chances are the products they provide come with a standard piece of text or a product description that many people use on their sites. If there are a hundred other websites out there with the same product – or worse still, an entire range of products that you sell that are all using the same manufacturer's product description, you will have duplicate content issues.

FIX: Really the only way to get around this is to modify your content so that it’s unique.
Try writing your own product descriptions so your content is original.

Product pages

If you have a shopping cart, product pages are a hotbed for duplicate content. Usually most people will add multiple products to their site using the same product description but only changing colour, size or another minor element to differentiate these products. As most of the content is the same you could have hundreds of pages with duplicate content on them.

FIX: You could re-write all your shopping cart pages however, if you have a few thousand products this could be a very large job indeed.
Another option is to analyse your website and find out which product generates the most revenue for you and filter out the others using a robots.txt file. (This isn’t the best solution however, you may find the lift in rankings due to less duplicate content penalties will increase your revenue.)

Stolen content

If others have stolen your content, this could lead to a search engine indexing the wrong version (theirs!!). To see if anyone has stolen your content try using Copyscape.

Multiple domains

If you have multiple domains you will want to ‘301 re-direct’ these to your main domain name. Don’t set up multiple websites using the same content on different domains as this will cause issues with duplicate content filters.

The final word

duplicate contentAs you can see there’s a few ways your site can produce duplicate content. If you are aware of these issues and take appropriate measures to ensure your site doesn’t suffer from these, you shouldn’t have too many problems.