The War on Spam: Google Fights Back


Google is engaged in a war. It is a war on spam. With new strategies and filters ready to put into place, the search engine is adding new firepower to its arsenal almost daily. Webmasters and SEO Consultants alike are terrified; fearing what the future holds for them. But for those of us that believe in the cause, the future isn't scary. In fact, the future looks very bright.

My ten year old son is fascinated with war. He has a dozen buckets full of army men, and makes everything a battlefield-the kitchen, my bedroom, and even the bathroom. He has a new bicycle helmet that's army green. For Halloween, when other kids were Spiderman and Batman, he was a soldier. He constantly plays computer games like Soldiers of WWII and Battlefield 1942; he even turns brooms and mops into weapons to combat the invisible enemy. War is all he talks about. He loves movies like Saving Private Ryan, Pearl Harbor, and Platoon. He knows more about both World Wars and Vietnam then I'll ever hope to, or care to, know. His obsession with war got me thinking about how it applied to what I do every day. What does SEO and war have in common? More to the point, how does Google implement strategies that declare war on spam?

SEO is a constant struggle to get our clients' websites to the top. We combat lousy SEO companies that give us a bad name, flagrant ads that claim they can do what we do for only $29 by submitting your site to a thousand search engines, and other little annoyances that pop up every day. Even still, my small battles are really nothing when you compare it to the war that Google is waging. Google's number one goal is to bring the visitor the most relevant results possible in a search engine. This means filtering and sorting through all the junk out there, so that you, the visitor, doesn't have to.

"It's an arms race," Steve Linford, director of the London-based SpamHaus Project, said. "The more we lock (spammers) down, the more techniques they try to get around us." The SpamHaus Project is a nonprofit organization that posts information about the groups behind the majority of unsolicited e-mail, and maintains a "black hole" list of domains from which spammers operate. Spam accounted for at least one in four email messages a business received in 2002. The U.S. Attorney General's website has an entire page on the subject. "Almost 45 percent of all email is now spam and that number is growing each year. Nearly three trillion spam messages are sent each year - 13 times the total snail mail delivered by the U.S. Postal service. The average wired American is hit with nearly 2,200 spam messages annually - this after most ISPs have filtered 80-90 percent of the junk messages. Some reports indicate that these numbers could increase by five times in the near future."

Market research firm, Gartner Inc., estimates that their company of over 10,000 employees suffers more than $13 million worth of lost productivity because of internally generated spam. This is just email spam. Throw in the spam on the internet, and it's a huge productivity drain. It causes companies financial losses because they have to purchase more high tech software like spam blockers and spy-ware removers, and it's a strain on system servers and bandwidth.

Google defines Internet Spam as any unwanted information or propaganda that may have been received through deceptive measures on the part of the sender. To a search engine, spam is hyperlinked pages that are intent on misleading the search engine. It is estimated that 80% of search results for any keyword phrases entered into a search engine are considered spam.

During World War II, the term propaganda earned the negative connotation because of intended deceptions used to dispirit those on the front lines by Nazi Germany. Soldiers and citizens were constantly bombarded with this new psychological weapon. Most propaganda in Germany was produced by the Ministry for Public Enlightenment and Propaganda, or PROMI. Joseph Goebbels was placed in charge of this ministry shortly after Adolf Hitler took power in 1933. Hitler was impressed by the power of Allied propaganda during World War I and believed that it had been a primary cause of the collapse of morale and revolts in the German home front and Navy in 1918. Nazis had no moral qualms about spreading propaganda which they themselves knew to the false and indeed spreading deliberately false information was part of a doctrine known as the "Big Lie", the theory he wrote about in his book, Mein Kampf. In Mein Kampf, Hitler wrote that people came to believe that Germany was defeated in the First World War in the field due to a propaganda technique used by Jews who were influential in the German press.

"British and Allied fliers were depicted as cowardly murderers and Americans in particular as gangsters in the style of Al Capone. At the same time, German propaganda sought to alienate Americans and British from each other, and both these Western belligerents from the Soviets." --World War 2 Propaganda (www.world-war-2.info) The propaganda was effective to a degree; however, it was repudiated by the Allied Powers' own positive and truthful doctrine.

Now, the term propaganda has come to mean, "information that is spread for the purpose of promoting some cause, such as a doctrine in a war." It's ironic that Google used this word when it defined Internet Spam.

Google trademarked the term "TrustRank" and is working on a new spam removing model that they explain in what forum posters are referring to as the Stanford White Paper. "Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine's results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam. We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good. In this paper we discuss possible ways to implement the seed selection and the discovery of good pages. We present results of experiments run on the World Wide Web indexed by AltaVista and evaluate the performance of our techniques. Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites." This comes from a 12 page abstract, called "Combating Spam with TrustRank", on Stanford University's website that outlines the methodology of TrustRank.

In summary, TrustRank is a way to cut down on spam and filter out content that is not relevant to the searcher in order to bring them results they really want, by branding good sites with a high trust rating, and by stamping the spam sites as untrustworthy, including any site that links to these delineated sites. Google's abstract says, "Human editors help search engines combat search engine spam, but reviewing all content is impractical. TrustRank places a core vote of trust on a seed set of reviewed sites to help search engines identify pages that would be considered useful from pages that would be considered spam. This trust is attenuated to other sites through links from the seed sites." Google's famous PageRank seems to have lost meaning, as sites are easily able to produce back links or purchase them, which defeats the purpose of PageRank. In my opinion, TrustRank makes more sense. It makes a webmaster more careful with whom he or she links to in the first place, making back links harder to get, but well worth the reward once they are earned.

Another way Google is fighting Internet spam is called the "Sandbox Effect". The Sandbox Effect is essentially a delay of a few months once a site is spidered before it is indexed. Sometimes, a new site may initially receive a high ranking in the search engines, and then drop into search engine obscurity. They may receive no page rank, and can be virtually invisible in the search engines for up to 120 days. While this may seem like a penalty to new website owners, especially if they are unaware of the new filters or how they work and why, it is Google's way of fighting spam. Their methodology is that in the "sandbox" (named such for the analogy of a bunch of new kids playing in the sandbox together away from the grownups), spammers won't see the results of their efforts in the search engine, and may possibly be fooled into thinking they've either been caught, or their efforts have been futile. Google hopes the spammers will then simply give up and go away. In war, we call this technique flanking, hoping to catch the enemy off guard by coming around behind their line, causing them to panic or withdraw. The desired result of the Sandbox Effect is that the spammers most likely will do both: panic and withdraw; or better yet, surrender. Flanking is one of the most effective plan of attack, and the most difficult to achieve, as it requires finesse, secrecy, and being able to know your enemy's moves before they do.

As in any war, it can be long, bloody, and both sides can sustain heavy casualties. While spammers are filtered out, some legitimate websites can be annihilated as well, due to inadequate SEO, mistakes in their pages (like broken links), or just simple ignorance to the way search engines work. It is the responsibility of your five-star General to guide you and develop your strategy. Your SEO consultant can lead you through the minefield of search engine optimization techniques without triggering any of the mines, and keeping you safe. If you inadvertently set off a mine, you lose your hard earned ranking, the traffic that goes with it, and the resulting sales from that traffic. You will then fall into the multitudes of spam casualties; possibly earning a Google ban forever. Will the casual observer see these casualties? No. On the surface, everything feels peaceful. In fact, the war only helps the average citizens and their relevant search results, and in the end, brings a better search environment for all. This is, after all, what Google really wants. Peace.

Jennifer E. Sullivan is an Internet Business Consultant who specializes in search engine optimization and web marketing. Her emphasis is on small to medium business marketing. She has written several web marketing articles, including "Hiring an SEO Consultant: 10 Reasons Why You Should", "Let's Not Forget About the Little Guy", and "PageRank for Websites: Is There More To the Web?". You can find more information on her services at First Class SEO, http://www.firstclass-seo.com.

home | site map
© 2005