Google applied for a patent on their ranking algorithm as of 15
months ago on December 31, 2003 and that application was posted
on March 31st at the US Patent Office. It got the discussion
forums buzzing this weekend. Even though I had substantial work
to do and was behind on a project, I couldn't resist the
temptation to read the very long 14,000 word, 45 page
application and see what it could mean to the volatile world of
search.
So I tripped on over to the the US Patent & Trademark Office
(USPTO) and started reading the document
United States Patent
Application: 0050071741 seems to be Google applying for a
patent on their search algorithm. There seems to be no
reference to PageRank here, but it seems to be PageRank
redefined with a few variations to limit link spamming and
reduce stale results, along with multiple innovative elements
not previously considered.
They discuss link spamming limitations extensively, which would
be a welcome relief as Linking Psychosis is rampant and I'd
like to see an end to it. Much of historical data related to
pages seems to be a bit onerous because it would appear to
limit the perceived value of a page unless it becomes wildly
popular over time. Bigger is better seems to be a enduring
theme of this algorithm as described generically in text of
their application.
An odd addition to the historical ranking discussion is
amazingly - the "Advertising Traffic" for a particular
document! They will rank a site based on the advertiser
choosing to advertise on a particular site. If Amazon wants to
advertise on your site, then Google will rank you higher!
That's good, I guess, if you have a site that attracts highly
rated advertising, and don't rely on cross promotion of your
separate products or those of suppliers to appear in your site
advertising. Example: If I have a discussion forum on coffee,
don't I want to advertise my coffee products? Why would I serve
ads from highly rated advertiser Starbucks to rank higher at
Google? What if I sell thousands of products and simply cross
promote and upsell my own products sitewide? Odd stuff, ranking
based on advertisers.
How does affiliate advertising factor into that advertising
element of the algorithm? Do they know you are advertising a
book from Amazon as part of affiliate program through your
direct Amazon affiliate program links and do they recognize
tracking links through affiliate management companies
differently than the tracking URL's of ad serving monsters like
DoubleClick and confer higher ranking upon the big boys of
advertising above affiliate tracking firms?
Also seems to call into question their own Adsense ads and how
that factors into this algorithm! Do the Adsense ads along my
blog border gain more ranking score because it is from a
monster advertising company - Google - or is it downgraded
because I'm not a "Premium" advertiser serving over 20 million
content page views? Again, seems that reward for being large
outweighs relevance in this formula. Or does it? How do they
value Overture advertising in the formula? Adbrite? Smaller ad
networks versus large advertising aggregators?
They extensively discuss historical data related to rankings
over time, looking at seasonality, popularity during spikes in
traffic due to news coverage of a particular topics and changes
in ranking related to those items. The historical data related
to ranking over time are interesting since they refer to link
spamming, relevance, and topicality when they say:
"As a further measure to differentiate a document related to a
topical phenomenon from a spam document, search engine may
consider mentions of the document in news articles, discussion
groups, etc. on the theory that spam documents will not be
mentioned, for example, in the news. Any or a combination of
these techniques may be used to curtail spamming attempts."
They've added another interesting element in the algorithm of
determining value of pages based on "user maintained/generated
data" (patent item 113) read that "bookmarks" and "favorites
lists" built into your browser. Is this one of the reasons that
Google recently hired Ben Goodger, the lead developer of Firefox?
Snooping into my favorites and cookies on my machine seems like
a bit more than I want Google doing on MY machine. It strains
the limits of privacy as well. We can stop sites from serving
us cookies, but can't stop who reads them? Ouch!
Further, they reference user's browser cache files as a method
of determining value of a site. "For example, the "temp" or
cache files associated with users could be monitored by search
engine to identify whether there is an increase or decrease in
a document being added over time. Similarly, cookies associated
with a particular document might be monitored by search engine
to determine whether there is an upward or downward trend in
interest in the document." Apparently they can see this info,
but I'd like them to stay out of my cache and cookies too!
It appears to apply further penalties to new sites by keeping
them poorly ranked for even longer periods and applies an
apparently new item to algorithms not seen or (at least
discussed publicly) of long term purchase of domain names and
historical data related to IP address and hosting company!
Here's the snip about that longevity of domain registration to
ranking:
"[0099] Certain signals may be used to distinguish between
illegitimate and legitimate domains. For example, domains can
be renewed up to a period of 10 years. Valuable (legitimate)
domains are often paid for several years in advance, while
doorway (illegitimate) domains rarely are used for more than a
year. Therefore, the date when a domain expires in the future
can be used as a factor in predicting the legitimacy of a
domain and, thus, the documents associated therewith."
I'll be extending the term of my domain registrations ASAP!
What a boon to registrars if that element of ranking becomes as
valued as linking has been! Everyone will get 10 year
registrations if they want to rank well. The domain name
aftermarket will also be changed dramatically if this becomes
as important as this element makes it appear to ranking. People
will buy and sell domains when disposing of them rather than
simply letting them expire at the end of the registration
period, as most do now.
It appears they will be penalizing domains "associated" with
"illegitimate" domains. Hopefully they have a method of
determining that it isn't a competitor linking to your domain
from their "illegitimate" domain! That suggests they will be
able to eliminate "Domain Scrapers" that have been known to
scrape search engine results of high ranking domains and
posting those on "illegitimate domains" which in effect drags
down the ranking of those previously highly ranked domains. How
odd the search world is sometimes!
Altogether, it seems that older content will suffer overall
because it hasn't changed, because nobody new is linking to it
and because it will lose links over time. What if you are
posting a historical document that you can't change or an
authored piece that is copyrighted? Does it decrease the value
of the information? Hmmmm. I guess links would continue to
increase if the information remains valuable, so there is some
protection in that. But older site content may be unchanged
because it is popular, not because it is stale - that's an odd
Catch-22.
The anchor text issue discussed in this patent application
suggests that "[0118] Unique Words, Bigrams, Phrases in Anchor
Text " are significant in determining rank, because if natural
links develop, they would vary when webmasters link to a
document differently, some would use the URL and embed the link
in that, others would use requested text from the webmaster if
it were a link request that successfully garnered a link and
still others might simply use Google's own Blogger "Blog This"
link which simply takes the page title. (I routinely change
link text generated by "Blog This" in my blog posts to
emphasize the topic discussed and eliminate
business/publication names usually added ahead of the topic of
the page.)
The US Patent office has a link to images including
illustrations and figures that are linked to the filing but
they are absurdly large and don't fit in the viewable framed
window. This is silliness. Do they mean to hide it by making it
unviewable?
I'll attempt to post a smaller version of images on my blog.
The final notable item seems to me to be the clickthrough
data that Google sees to sites from their own search results.
They will rank site higher that get significant clickthrough
rates from the Google SERP's.
"Google may monitor the number of times that a document is
selected from a set of search results and/or the amount of
time one or more users spend accessing the document. Search
engine may then score the document based, at least in part, on
this information."
How will they know how long I spend accessing the document
unless they can monitor my actions AFTER I've left the Google
SERP's to visit the linked site? Wonder what's at work in that?
Do they have some way of tracking our actions after we leave
their site? I wonder if this has anything to do with the Google
acquisition of Urchin traffic statistics company last week.
Well, it's back to work for now, but it will be interesting to
see where this patent application is discussed in forums and
SEO blogs over the coming week.
Mike Banks Valentine is a Search Engine Optimization Specialist
and blogs about the search world at: http://RealitySEO.com
while operating a small business ecommerce tutorial at:
http://website101.com