SMX Advanced 2012: Pagination & Canonicalization For The Pros

SMX Advanced 2012 (Seattle)
SMX Advanced 2012 (Seattle)

Adam Audette, President RKG (@audette)

Noindex Pagination Requirements

– pages 2-N annotated with noindex

rel-prev / next

Check out zales

Check out betterjobs.com

Check out analytics (entries for paginated pages)

rel-canonical tag can interfere with rel prev/next tags

Keep in mind

– pages with rel next/prev can still be show in results
— but this is an extreme “edge case”
— can optionally use noindex
– use of rel next/prev consolidates signals

Cool techniques: Target uses a hash on their paginated URLs (Quora and Twitter does this, too)

Downsides:

– solves nothing for decreasing crawl overheads
– labor intensive and error prone

Canonical Action

Use rel-canonical to signal the preferred URL, not as a shortcut

Internal link signals should be consistent

Next Up: Jeff Carpenter from Petco

They had a messed up situation.

Results:

– 13% increase in conversion from natural search traffic
– reduced amount of pages indexed in SERPs

Maile Ohye from Google

2009: worked through issues of pagerank sculpting

2010:  zappos and faceted navigation issues, exponential number of URLs to crawl

2011: launched improved URL parameters in Webmaster Tools

2011: REI using rel-canonical for non-duplication issues
— Google launched rel-prev/next 5 months later
— helped us identify more sequences than we detect ourselves (2012 statistic)

URL Parameters in Webmaster Tools

Assists understanding parameters to crawl site more efficiently
– for URL removals to remove certain documents

URL Parameters is a hint (not a directive like robots.txt)

Advanced Feature
– some sites already have high crawl coverage as determined by Google
– improper actions can result in pages not appearing in Search

Issue: inefficient crawling
Step 1: specify the parameters that do not change page content (session id, affiliate id, tracking id)
– likely mark as “does not change content”

Step 2a: specify parameters that change content

Step 2b: specify Googlebot’s preferred behavior

This puts a lot of control in your hand.

Sort parameter: changes the order the content is presented

Option 1: is the sort param optional throughout entire site?

Option 2: Can googlebot discover everything useful when the sort parameter isn’t displayed?

If yes, go with crawl no urls

Sort parameter: same sort values used sitewide

Option 1: are the same sort values used consistently for every category?

Option 2: When user changes the sort value is the total number of items unchanged?

If yes, choose only urls with value x

Sort Setting

Narrows: filters content on the page by showing a subset of the total items

Specifies

Translates: almost always crawl every URL

Paginates: displays a component page of a multipage sequence

Multiple parameters in one URL
– imagine all URLs begin as eligible for crawling, then apply each setting as a process of elimination, not inclusion

Q&A

When there’s a low quality set of content (like content that got hit by Panda), don’t redirect it or rel-canonical it. Just 404 it. Maile agrees because often those 301s are hard to maintain. And a redirect is not going to help you beat the system. If it already is low content, it’s not going to pass any value.

Rel-canonical can be used as a good discovery signal. Now that’s interesting.