In the Throne Room of the Neural Spider King

After reading only the most basic description of a neural network, it's obvious to me that the Google Spider is an "intelligence" that is now operating on this basis. Trustrank (the marriage of mathematics and subjectivity) has replaced prior, purely mathematical, methods for determining the "merit" of pages and sites. This subjectivity was imparted to the Spider's functioning by an initial human "seeding" of data, during which the Spider was basically told by its "teachers" -- "This is what good looks like". This is exactly the same as the initial "training" required of neural networks before they can be left to their own devices, to proceed down a pathway of self-education and ever-increasing effectiveness in the performance of their pre-defined objectives.

The transfer of the initial "training" throughout the neural network is called back propagation. In the example of a basic neural network illustrated below, whose job it is to identify vegetables based on physical attributes, we know that some initial set of associations must have been provided, to prime the network. If no one had told the neural nework that potatoes are brown and smaller than pumpkins, for example (and referenced a few real-world examples), it could never have undertaken its task.

As in human intelligence and human education, the rate of back propagation is a function of the complexity of the required task and the time/effort given to the training of the neural network for that task. If all goes as planned, once the training of the network is completed, it gets smarter with the passage of time, and it gets better in the performance of its task, requiring no additional human intervention. Once a neural network is up and running, human contact happens only at the "input layer" (where data enters the system for processing) and at the "output layer" (where results data, or "answers" are collected). What happens in between is called the "hidden layer" and can neither be measured nor verified.

The ways in which neural networks operate internally is fascinating, but beyond the intended scope of this paper.

What will concern us in this paper is what it means to Joe Webmaster that the Google Spider is a neural network, assuming Joe is concerned with creating pages and sites the Google Spider will like.

Many webmasters are frustrated right now, because they don't understand how the Spider is working. The problem is that most people trying to curry the Spider's favor are focused on what is happening inside the "hidden layer" of its neural network. When the functioning of the Google Search Engine relied entirely on mathematical algorithms and conventional (sequential) computing, "figuring the Spider out" was just a matter of math and computer science.

Previously, the Spider could not escape the fact it was essentially functioning as a calculator, performing arithmetical functions and yielding results which provided the formulaic basis of Pagerank. For this reason, it was possible to "fool" the spider. You only had to pursue the same arithmetical results as the Spider was told should yield a PR5 to obtain a PR5, for example. There were other ways SE spammers obtained undeserved treatment for their pages, sites and images. All operated on the Spider's arithmetical vulnerability. That has all changed now, and where once there was decipherable mathematics, replicable precision and, therefore, the ability to reverse engineer for favorable Google treatment, now there's just a lot of "fuzz".

Let's go back to the simple "veggie sorting" neural network example. Whomever "trained" this neural network had to establish a number of "exemplary associations", linking a finite set of input values (in this case, physical attributes along two dimensions -- size and color) with a finite set of possible vegetable "conclusions". If the variety of vegetables passing through the system were too great in number, the network's ability to properly identify them would decrease. Likewise, it would be insane to expect the neural network, once trained, to begin "sniffing" veggies with any usefulness.

Now imagine that you are trying to fool this system, as many people try to fool the Google Spider. Say, for example, you are trying to produce "pumpkin" as a result. This gets a little tough, even though you know what the physical attributes of a pumpkin are (including it's color and average size), you don't know ANYTHING about the training the neural network received, which allows it to "decide" when a vegetable is a pumpkin or is not. Remember that direct perception of fact -- as in, "That's a pumpkin because I know what pumpkins are..." is NOT possible. The neural network in this case is deciding when it has a pumpkin based on size and color comparisons made against a set of other vegetables, and a unique history of failures and successes -- neither of which you would know if you were trying to pull a fast one. Even if you knew that the neural network's "experience" consisted only of pumpkins, zucchinis, yams, carrots and potatoes, it would take you some time to achieve your desired result, the mathematical expression of which might contain such values as 45% carrot and 225% potato.

You can see, also, that a finite range of input values and a finite range of output expectations are necessary for the system to work. There are about a hundred different types of vegetable in the average supermarket, a number (and diversity of colors and sizes) that would overhwhelm our example neural network. If taxed with the identification of so great a diversity of vegetables, a neural network as in our example would take a very long time to train and would experience a high rate of error.

Consider the task faced by the Google Spider, now that you understand that its operating behavior strongly resembles a neural network. We have already established and explained why the most efficient and best performing neural networks are those where both the set of possible input values and the set of expected (desired) conclusions are finite and pre-defined. Unfortunately for the Spider, though its input value set is based on a finite number of variables (such as anchor text, tags, link context, etc.) it may encounter these varibales in a virtually infinite number of possible combinations.

If it were the function of the Spider only to identify and categorize pages, sites and their constituent elements, its job would be much easier than it is. This was its original function, and this basic task remains an important part of what the Spider does. Indeed, the vast majority of "SEO advice" given to Webmasters relates to making one's pages and sites readily classifiable. And likewise, the bulk of SEO spam activity relates to trying to obtain false classifications.

Following the traditional rules still yields rewards, however, such as basic inclusion in the index -- but does little to bring your sites and pages any "special" consideration from the Spider. That your visibility within the index should be a function of who links to you and how much they link to you is absolutely correct. However, it is lamentable that SEO Science should give up the goat to pure Content Merit. What I mean is, once you have properly constructed your pages and dotted all your i's and crossed your t's, there's little left in your hands (short of making the most link-worthy pages on the subject -- lol) that you as a Webmaster can do, to Deservingly Improve your treatment by the Spider.

HOW IN THE WORLD COULD YOU DESERVINGLY IMPROVE YOUR TREATMENT BY THE SPIDER?

The answer to that question relates to the Spider's function beyond mere identification and classification. The question might also be re-phrased:

HOW CAN I APPEAL TO THE SPIDER, MAKE GOOD PAGES FOR USERS AND REDUCE MY RELIANCE ON OTHER WEBMASTERS LINKING TO ME?

AN ALLEGORICAL TANGENT

Unlike the simple veggie-sorting example above, which we know must reject some number (in this case the vast majority) of vegetables, in order to make accurate decisions about just a few, the Google Spider Neural Network has no such luxury. The possible input set approaches infinity. So does the output set. But, like a blog whose 6 or 7 "templates" can be used to create an enornmous number of page instantiations when the server is called, the Google Search Engine is now "making decisions" about pages and sites when asked.

Links, tags, anchor text, etc... comprise a language of nearly infinite scope. In the narrow-minded pursuit of their own narrow objectives so called legit SEO's and spammers are crowded into a narrow band of the Spider's language capability, all trying to convince the Spider of this thing or that thing, by telling it what they think it wants to hear. LOL!

GO SPATZIEREN


A 2HP Site