Google

Thursday, July 14, 2011

DupeFree Pro - Are You Duplicating Your Own Content?

Sounds crazy but it's possible you might be creating duplicates of
your own content without even knowing it.

If the Search Engines see duplicates of your pages they
automatically choose which page to rank and shove the rest in to
their supplemental index (the black hole of Search Engine traffic).

It's important you understand how to avoid duplicating your own
content so that you can stay in control of which of your pages rank
in the Search Engines.

The two main possible causes for this are:

1) Duplicate Domain URL's
2) Internal Duplicates


--------------------------------------------------
Duplicate Domain URL's
--------------------------------------------------

The Search Engines view all the following URL's as *separate* pages
even though they all actually point to the same page...

http://yourdomain.com
http://yourdomain.com/
http://yourdomain.com/index.html
http://www.yourdomain.com
http://www.yourdomain.com/
http://www.yourdomain.com/index.html

If you (or others) are linking to your site using a variety of
these different URL's you'll not only be diluting PR (Page Rank) on
your site but you also stand the chance of having your content
labelled as duplicate.

At the time of writing Google is known to be aware of this issue
and are working to solve it. However, I urge you to not leave it to
fate. Take control of the situation as soon as you can.

Fortunately the work around is very simple and only involves a
small code to be placed in to your htaccess file on your webserver
(this works on Apache servers only).

Jason Katzenback from PortalFeeder has created a YouTube video
tutorial showing the code you need and how to use it.

Check it out here: http://www.youtube.com/watch?v=76CltyxnFVw


--------------------------------------------------
Internal Duplicates
--------------------------------------------------

Are you 100% certain you do not have duplicates of your own content
within your own sites?

If you are using one of the popular free content management systems
(i.e. WordPress) your site might already be suffering from this.

For example, WordPress the popular Blog management system,
automatically creates archive and category pages on your Blog. The
default settings of WordPress result in these archive and category
pages containing duplicates of the exact same posts appearing
elsewhere in your Blog.

When Google finds all the multiple versions of your post their bot
tries to determine which page to rank and places all the rest in to
the supplemental index.

This might not sound like a major problem because one way or
another your content it still getting ranked but if it is left up
to the Search Engine bots you may not get the page *you* want to
rank.

Some content management systems create other kinds of internal
duplicates such as different formats of the same page (i.e. PDF,
text, word doc).

Perform the following Google search if you want to see how many
pages your website has in the supplemental index:

[UPDATE] the search syntax for viewing pages in the
supplemental index at Google no longer works. However, I've
found a website that shows how continue doing this:

http://www.ksl-consulting.co.uk/google_supplemental_result.html

Pages in the supplemental index are known to hardly ever get
traffic if at all. This is until they move out of the supplemental
index, however, many report this as hard to achieve.

The work around for this issue is to tell the Search Engine spiders
to ignore specific locations on your website. This will enable you
to control which pages will be indexed and ranked.

You can do this by adding the following code to a 'robots.txt' file
at the root of your website:

User-agent:*
Disallow: /example/directory/
Disallow: /another/example/directory/
Disallow: /one/more/example/directory/

The first part 'User-agent:*' causes the following statements to
apply to all search engine bots that read the robots.txt file.

The 'Disallow: /.../' lines are where you list each directory
location on your webserver that you want the Search Engine bots to
ignore (i.e. NOT index).

So in our above example we are telling all search engine bots to
*not* index any webpage or indexable file located in the three
stated directory locations on our website.

Doing this correctly can really help you control which pages are
chosen by the SE's to rank in their results.

If you're not sure how all this works please do take the time to
understand the robots.txt properly before you implement it. Search
Google for info on robots.txt and also check out the Wikipedia page
below:

http://en.wikipedia.org/wiki/Robots.txt


If you are putting in all the effort required to make sure your
content is unique you really don't want to fall over at this last
hurdle.

I hope if you weren't aware of these potential pitfalls before that
you take the simple action necessary to ensure you don't fall
victim to self-imposed duplicate content.

Talk soon,

Michael & Steven Grzywacz
DupeFree Pro

No comments: