Adam Audette, President RKG (@audette)
Noindex Pagination Requirements
– pages 2-N annotated with noindex
rel-prev / next
Check out zales
Check out betterjobs.com
Check out analytics (entries for paginated pages)
rel-canonical tag can interfere with rel prev/next tags
Keep in mind
– pages with rel next/prev can still be show in results
— but this is an extreme “edge case”
— can optionally use noindex
– use of rel next/prev consolidates signals
Cool techniques: Target uses a hash on their paginated URLs (Quora and Twitter does this, too)
Downsides:
– solves nothing for decreasing crawl overheads
– labor intensive and error prone
Canonical Action
Use rel-canonical to signal the preferred URL, not as a shortcut
Internal link signals should be consistent
Next Up: Jeff Carpenter from Petco
They had a messed up situation.
Results:
– 13% increase in conversion from natural search traffic
– reduced amount of pages indexed in SERPs
Maile Ohye from Google
2009: worked through issues of pagerank sculpting
2010: zappos and faceted navigation issues, exponential number of URLs to crawl
2011: launched improved URL parameters in Webmaster Tools
2011: REI using rel-canonical for non-duplication issues
— Google launched rel-prev/next 5 months later
— helped us identify more sequences than we detect ourselves (2012 statistic)
URL Parameters in Webmaster Tools
Assists understanding parameters to crawl site more efficiently
– for URL removals to remove certain documents
URL Parameters is a hint (not a directive like robots.txt)
Advanced Feature
– some sites already have high crawl coverage as determined by Google
– improper actions can result in pages not appearing in Search
Issue: inefficient crawling
Step 1: specify the parameters that do not change page content (session id, affiliate id, tracking id)
– likely mark as “does not change content”
Step 2a: specify parameters that change content
Step 2b: specify Googlebot’s preferred behavior
This puts a lot of control in your hand.
Sort parameter: changes the order the content is presented
Option 1: is the sort param optional throughout entire site?
Option 2: Can googlebot discover everything useful when the sort parameter isn’t displayed?
If yes, go with crawl no urls
Sort parameter: same sort values used sitewide
Option 1: are the same sort values used consistently for every category?
Option 2: When user changes the sort value is the total number of items unchanged?
If yes, choose only urls with value x
Sort Setting
Narrows: filters content on the page by showing a subset of the total items
Specifies
Translates: almost always crawl every URL
Paginates: displays a component page of a multipage sequence
Multiple parameters in one URL
– imagine all URLs begin as eligible for crawling, then apply each setting as a process of elimination, not inclusion
Q&A
When there’s a low quality set of content (like content that got hit by Panda), don’t redirect it or rel-canonical it. Just 404 it. Maile agrees because often those 301s are hard to maintain. And a redirect is not going to help you beat the system. If it already is low content, it’s not going to pass any value.
Rel-canonical can be used as a good discovery signal. Now that’s interesting.