Webdevelopment KnowHow

TIPS, TRICKS and much more

Archive for the ‘Webmaster Resources’ Category

Introducing new and improved sitelinks

Webmaster level: All

This week we launched an update to sitelinks to improve the organization and quality of our search results. Sitelinks are the two columns of links that appear under some search results and ads that help users easily navigate deeper into the site. Sitelinks haven’t changed fundamentally: they’re still generated and ranked algorithmically based on the link structure of your site, and they’ll only appear if useful for a particular query.

Sitelinks before today’s changes

Here’s how we’ve improved sitelinks with today’s launch:
  • Visibility. The links have been boosted to full-sized text, and augmented with a green URL and one line of text snippet, much like regular search results. This increases the prominence of both the individual sitelinks and the top site overall, making them easier to find.

  • Flexibility. Until now, each site had a fixed list of sitelinks that would either all appear or not appear; there was no query-specific ranking of the links. With today’s launch, sitelink selection and ranking can change from query to query, allowing more optimized results. In addition, the maximum number of sitelinks that can appear for a site has been raised from eight to 12, and the number shown also varies by query.

  • Clarity. Previously, pages from your site could either appear in the sitelinks, in the regular results, or both. Now we’re making the separation between the top domain and other domains a bit clearer. If sitelinks appear for the top result, then the rest of the results below them will be from other domains. One exception to this is if the top result for a query is a subpart of a domain. For instance, the query [the met exhibitions] has www.metmuseum.org/special/ as the top result, and its sitelinks are all from within the www.metmuseum.org/special section of the site. However, the rest of the results may be from other parts of the metmuseum.org domain, like store.metmuseum.org or blog.metmuseum.org/alexandermcqueen/about.

  • Quality. These user-visible changes are accompanied by quality improvements behind the scenes. The core improvement is that we’ve combined the signals we use for sitelinks generation and ranking -- like the link structure of your site -- with our more traditional ranking system, creating a better, unified algorithm. From a ranking perspective, there’s really no separation between “regular” results and sitelinks anymore.

Sitelinks after today’s changes

These changes are also reflected in Webmaster Tools, where you can manage the sitelinks that appear for your site. You can now suggest a demotion to a sitelink if it’s inappropriate or incorrect, and the algorithms will take these demotions into account when showing and ranking the links (although removal is not guaranteed). Since sitelinks can vary over time and by query, it no longer makes sense to select from a set list of links -- now, you can suggest a demotion of any URL for any parent page. Up to 100 demotions will be allowed per site. Finally, all current sitelink blocks in Webmaster Tools will automatically be converted to the demotions system. More information can be found in our Webmaster Tools Help Center.

It’s also worth mentioning a few things that haven’t changed. One-line sitelinks, where sitelinks can appear as a row of links on multiple results, and sitelinks on ads aren’t affected. Existing best practices for the link structure of your site are still relevant today, both for generating good quality sitelinks and to make it easier for your visitors. And, as always, you can raise any questions or comments in our Webmaster Help Forum.

Written by Harvey Jones, Software Engineer, & Raj Krishnan, Product Manager, Sitelinks team


(Cross-posted on the Inside Search blog)

Webmaster level: All

For many months, we’ve been focused on trying to return high-quality sites to users. Earlier this year, we rolled out our “Panda” change for searches in English around the world. Today we’re continuing that effort by rolling out our algorithmic search improvements in different languages. Our scientific evaluation data show that this change improves our search quality across the board and the response to Panda from users has been very positive.

For most languages, this change impacts typically 6-9% of queries to a degree that a user might notice. This is distinctly lower than the initial launch of Panda, which affected almost 12% of English queries to a noticeable amount. We are launching this change for all languages except Chinese, Japanese, and Korean, where we continue to test improvements.

For sites that are affected by this algorithmic change, we have a post providing guidance on how Google searches for high-quality sites. We also have webmaster forums in many languages for publishers who wish to give additional feedback and get advice. We’ll continue working to do the right thing for our users and serve them the best results we can.

Posted by Amit Singhal, Google Fellow


New webmaster tutorial videos

Webmaster level: All

Over the past couple of years, we’ve released over 375 videos on our YouTube channel, with the majority of them answering direct questions from webmasters. Today, we’re starting to release a freshly baked batch of videos, and you might notice that some of these are a little different. Don’t worry, they still have Matt Cutts in a variety of colored shirts. Instead of only focusing on quick answers to specific questions, we’ve created some longer videos which cover important webmaster-related topics. For example, if you were wondering what the limits are for 301 redirects at Google, we now have a single video for that:

Thanks to everyone who submitted questions for this round. You can be the first to hear about the new videos as they’re released by subscribing to our channel or following us on Twitter.

Posted by Michael Wyszomierski, Search Quality Team


A new, improved form for reporting webspam

Webmaster level: All

Everyone on the web knows how frustrating it is to perform a search and find websites gaming the search results. These websites can be considered webspam - sites that violate Google’s Webmaster Guidelines and try to trick Google into ranking them highly. Here at Google, we work hard to keep these sites out of your search results, but if you still see them, you can notify us by using our webspam report form. We’ve just rolled out a new, improved webspam report form, so it’s now easier than ever to help us maintain the quality of our search results. Let’s take a look at some of our new form’s features:

Option to report various search issues
There are many search results, such as sites with malware and phishing, that are not necessarily webspam but still degrade the search experience. We’ve noticed that our users sometimes report these other issues using our webspam report form, causing a delay between when a user reports the issue and when the appropriate team at Google handles it. The new form’s interstitial page allows you to report these other search issues directly to the correct teams so that they can address your concerns in a timely manner.

Simplified form with informative links
To improve the readability of the form, we’ve made the text more concise, and we’ve integrated helpful links into the form’s instructions. Now, the ability to look up our Webmaster Guidelines, get advice on writing actionable form comments, and block sites from your personalized search results is just one click away.

Thank you page with personalization options
Some of our most valuable information comes from our users, and we appreciate the webspam reports you submit to us. The thank you page explains what happens once we’ve received your webspam report. If you want to report more webspam, there’s a link back to the form page and instructions on how to report webspam more efficiently with the Chrome Webspam Report Extension. We also provide information on how you can immediately block the site you’ve reported from your personalized search results, for example, by managing blocked sites in your Google Account.

At Google, we strive to provide the highest quality, most relevant search results, so we take your webspam reports very seriously. We hope our new form makes the experience of reporting webspam as painless as possible (and if it doesn’t, feel free to let us know in the comments).

Posted by Jen Lee and Alissa Roberts, Search Quality Team


Webmaster level: All

Google’s Webmaster Team is responsible for most of Google’s informational websites like Google’s Jobs site or Privacy Centers. Maintaining tens of thousands of pages and constantly releasing new Google sites requires more than just passion for the job: it requires quality management.

In this post we won’t talk about all the different tests that can be run to analyze a website; instead we’ll just talk about HTML and CSS validation, and tracking quality over time.

Why does validation matter? There are different perspectives on validation—at Google there are different approaches and priorities too—but the Webmaster Team considers validation a baseline quality attribute. It doesn’t guarantee accessibility, performance, or maintainability, but it reduces the number of possible issues that could arise and in many cases indicates appropriate use of technology.

While paying a lot of attention to validation, we’ve developed a system to use it as a quality metric to measure how we’re doing on our own pages. Here’s what we do: we give each of our pages a score from 0-10 points, where 0 is worst (pages with 10 or more HTML and CSS validation errors) and 10 is best (0 validation errors). We started doing this more than two years ago, first by taking samples, now monitoring all our pages.

Since the beginning we’ve been documenting the validation scores we were calculating so that we could actually see how we’re doing on average and where we’re headed: is our output improving, or is it getting worse?

Here’s what our data say:

Validation score development 2009-2011.

On average there are about three validation issues per page produced by the Webmaster Team (as we combine HTML and CSS validation in the scoring process, information about the origin gets lost), down from about four issues per page two years ago.

This information is valuable for us as it tells us how close we are to our goal of always shipping perfectly valid code, and it also tells us whether we’re on track or not. As you can see, with the exception of the 2nd quarter of 2009 and the 1st quarter of 2010, we are generally observing a positive trend.

What has to be kept in mind are issues with the integrity of the data, i.e. the sample size as well as “false positives” in the validators. We’re working with the W3C in several ways, including reporting and helping to fix issues in the validators; however, as software can never be perfect, sometimes pages get dinged for non-issues: see for example the border-radius issue that has recently been fixed. We know that this is negatively affecting the validation scores we’re determining, but we have no data yet to indicate how much.

Although we track more than just validation for quality control purposes, validation plays an important role in measuring the health of Google’s informational websites.

How do you use validation in your development process?

Posted by Jens O. Meiert, Google Webmaster Team


Webmaster level: Advanced

You may have noticed that the Parameter Handling feature disappeared from the Site configuration > Settings section of Webmaster Tools. Fear not; you can now find it under its new name, URL Parameters! Along with renaming it, we refreshed and improved the feature. We hope you’ll find it even more useful. Configuration of URL parameters made in the old version of the feature will be automatically visible in the new version. Before we reveal all the cool things you can do with URL parameters now, let us remind you (or introduce, if you are new to this feature) of the purpose of this feature and when it may come in handy.

When to use
URL Parameters helps you control which URLs on your site should be crawled by Googlebot, depending on the parameters that appear in these URLs. This functionality provides a simple way to prevent crawling duplicate content on your site. Now, your site can be crawled more effectively, reducing your bandwidth usage and likely allowing more unique content from your site to be indexed. If you suspect that Googlebot's crawl coverage of the content on your site could be improved, using this feature can be a good idea. But with great power comes great responsibility! You should only use this feature if you're sure about the behavior of URL parameters on your site. Otherwise you might mistakenly prevent some URLs from being crawled, making their content no longer accessible to Googlebot.

A lot more to do
Okay, let’s talk about what’s new and improved. To begin with, in addition to assigning a crawl action to an individual parameter, you can now also describe the behavior of the parameter. You start by telling us whether or not the parameter changes the content of the page. If the parameter doesn’t affect the page’s content then your work is done; Googlebot will choose URLs with a representative value of this parameter and will crawl the URLs with this value. Since the parameter doesn’t change the content, any value chosen is equally good. However, if the parameter does change the content of a page, you can now assign one of four possible ways for Google to crawl URLs with this parameter:

  • Let Googlebot decide
  • Every URL
  • Only crawl URLs with value=x
  • No URLs
We also added the ability to provide your own specific value to be used, with the “Only URLs with value=x” option; you’re no longer restricted to the list of values that we provide. Optionally, you can also tell us exactly what the parameter does--whether it sorts, paginates, determines content, etc. One last improvement is that for every parameter, we’ll try to show you a sample of example URLs from your site that Googlebot crawled which contain that particular parameter.

Of the four crawl options listed above, “No URLs” is new and deserves special attention. This option is the most restrictive and, for any given URL, takes precedence over settings of other parameters in that URL. This means that if the URL contains a parameter that is set to the “No URLs” option, this URL will never be crawled, even if other parameters in the URL are set to “Every URL.” You should be careful when using this option. The second most restrictive setting is “Only URLs with value=x.”

Feature in use
Now let’s do something fun and exercise our brains on an example.
- - -
Once upon a time there was an online store, fairyclothes.example.com. The store’s website used parameters in its URLs, and the same content could be reached through multiple URLs. One day the store owner noticed, that too many redundant URLs could be preventing Googlebot from crawling the site thoroughly. So he sent his assistant CuriousQuestionAsker to The GreatWebWizard to get advice on using the URL parameters feature to reduce the duplicate content crawled by Googlebot. The Great WebWizard was famous for his wisdom. He looked at the URL parameters and proposed the following configuration:

Parameter nameEffect on content?What should Googlebot crawl?
trackingIdNoneOne representative URL
sortOrderSortsOnly URLs with value = ‘lowToHigh’
sortBySortsOnly URLs with value = ‘price’
filterByColorNarrowsNo URLs
itemIdSpecifiesEvery URL
pagePaginatesEvery URL

The CuriousQuestionAsker couldn’t avoid his nature and started asking questions:

CuriousQuestionAsker: You’ve instructed Googlebot to choose a representative URL for trackingId (value to be chosen by Googlebot). Why not select the Only URLs with value=x option and choose the value myself?
Great WebWizard: While crawling the web Googlebot encountered the following URLs that link to your site:
  1. fairyclothes.example.com/skirts/?trackingId=aaa123
  2. fairyclothes.example.com/skirts/?trackingId=aaa124
  3. fairyclothes.example.com/trousers/?trackingId=aaa125
Imagine that you were to tell Googebot to only crawl URLs where “trackingId=aaa125”. In that case Googlebot would not crawl URLs 1 and 2 as neither of them has the value aaa125 for trackingId. Their content would neither be crawled nor indexed and none of your inventory of fine skirts would show up in Google’s search results. No, for this case choosing a representative URL is the way to go. Why? Because that tells Googlebot that when it encounters two URLs on the web that differ only in this parameter (as URLs 1 and 2 above do) then it only needs to crawl one of them (either will do) and it will still get all the content. In the example above two URLs will be crawled; either 1 & 3, or 2 & 3. Not a single skirt or trouser will be lost.

CuriousQuestionAsker: What about the sortOrder parameter? I don’t care if the items are listed in ascending or descending order. Why not let Google select a representative value?
Great WebWizard: As Googlebot continues to crawl it may find the following URLs:
  1. fairyclothes.example.com/skirts/?page=1&sortBy=price&sortOrder=’lowToHigh’
  2. fairyclothes.example.com/skirts/?page=1&sortBy=price&sortOrder=’highToLow’
  3. fairyclothes.example.com/skirts/?page=2&sortBy=price&sortOrder=’lowToHigh’
  4. fairyclothes.example.com/skirts/?page=2&sortBy=price&sortOrder=’ highToLow’
Notice how the first pair of URLs (1 & 2) differs only in the value of the sortOrder parameter as do URLs in the second pair (3 & 4). However, URLs 1 and 2 will produce different content: the first showing the least expensive of your skirts while the second showing the priciest. That should be your first hint that using a single representative value is not a good choice for this situation. Moreover, if you let Googlebot choose a single representative from among a set of URLs that differ only in their sortOrder parameter it might choose a different value each time. In the example above, from the first pair of URLs, URL 1 might be chosen (sortOrder=’lowToHigh’). Whereas from the second pair URL 4 might be picked (sortOrder=’ highToLow’). If that were to happen Googlebot would crawl only the least expensive skirts (twice):
  • fairyclothes.example.com/skirts/?page=1&sortBy=price&sortOrder=’lowToHigh’
  • fairyclothes.example.com/skirts/?page=2&sortBy=price&sortOrder=’ highToLow’
Your most expensive skirts would not be crawled at all! When dealing with sorting parameters consistency is key. Always sort the same way.

CuriousQuestionAsker: How about the sortBy value?
Great WebWizard: This is very similar to the sortOrder attribute. You want the crawled URLs of your listing to be sorted consistently throughout all the pages, otherwise some of the items may not be visible to Googlebot. However, you should be careful which value you choose. If you sell books as well as shoes in your store, it would be better not to select the value ‘title’ since URLs pointing to shoes never contain ‘sortBy=title’, so they will not be crawled. Likewise setting ‘sortBy=size’ works well for crawling shoes, but not for crawling books. Keep in mind that parameters configuration has influence throughout the whole site.

CuriousQuestionAsker: Why not crawl URLs with parameter filterByColor?
Great WebWizard: Imagine that you have a three-page list of skirts. Some of the skirts are blue, some of them are red and others are green.
  • fairyclothes.example.com/skirts/?page=1
  • fairyclothes.example.com/skirts/?page=2
  • fairyclothes.example.com/skirts/?page=3
This list is filterable. When a user selects a color, she gets two pages of blue skirts:
  • fairyclothes.example.com/skirts/?page=1&flterByColor=blue
  • fairyclothes.example.com/skirts/?page=2&flterByColor=blue
They seem like new pages (the set of items are different from all other pages), but there is actually no new content on them, since all the blue skirts were already included in the original three pages. There’s no need to crawl URLs that narrow the content by color, since the content served on those URLs was already crawled. There is one important thing to notice here: before you disallow some URLs from being crawled by selecting the “No URLs” option, make sure that Googlebot can access the content in another way. Considering our example, Googlebot needs to be able to find the first three links on your site, and there should be no settings that prevent crawling them.
- - -

If your site has URL parameters that are potentially creating duplicate content issues then you should check out the new URL Parameters feature in Webmaster Tools. Let us know what you think or if you have any questions post them to the Webmaster Help Forum.

Written by Kamila Primke, Software Engineer, Webmaster Tools Team


Subscribe to email feed

  • RSS
  • Delicious
  • Digg
  • Facebook
  • Twitter
  • Linkedin
  • Youtube

Introducing new and

Webmaster level: All This week we launched an update to sitelinks ...

High-quality sites a

(Cross-posted on the Inside Search blog) Webmaster level: All For many months, ...

New webmaster tutori

Webmaster level: All Over the past couple of years, we’ve released ...

A new, improved form

Webmaster level: All Everyone on the web knows how frustrating it ...

Validation: measurin

Webmaster level: All Google’s Webmaster Team is responsible for most of ...

Twitter updates

No public Twitter messages.