
After reading only the most basic description of a neural
network, it's obvious to me that the Google Spider is an "intelligence"
that is now operating on this basis. Trustrank (the marriage of mathematics
and subjectivity) has replaced prior, purely mathematical, methods for
determining the "merit" of pages and sites. This subjectivity
was imparted to the Spider's functioning by an initial human "seeding"
of data, during which the Spider was basically told by its "teachers"
-- "This is what good looks like". This is exactly the same
as the initial "training" required of neural networks before
they can be left to their own devices, to proceed down a pathway of self-education
and ever-increasing effectiveness in the performance of their pre-defined
objectives.
The transfer of the initial "training" throughout
the neural network is called back
propagation. In the example of a basic neural network illustrated
below, whose job it is to identify vegetables based on physical attributes,
we know that some initial set of associations must have been provided,
to prime the network. If no one had told the
neural nework that potatoes are brown and smaller than pumpkins, for example
(and referenced a few real-world examples), it could never have undertaken
its task.

As in human intelligence and human education, the rate
of back propagation is a function of the complexity of the required task
and the time/effort given to the training of the neural network for that
task. If all goes as planned, once the training of the network is completed,
it gets smarter with the passage of time, and it gets better in the performance
of its task, requiring no additional human intervention. Once a neural
network is up and running, human contact happens only at the "input
layer" (where data enters the system for processing) and at the "output
layer" (where results data, or "answers" are collected).
What happens in between is called the "hidden layer" and can
neither be measured nor verified.
The ways in which neural networks operate internally
is fascinating, but beyond the intended scope of this paper.
What will concern us in this paper is what it means to
Joe Webmaster that the Google Spider is a neural network,
assuming Joe is concerned with creating pages and sites the Google Spider
will like.
Many webmasters are frustrated right now, because they
don't understand how the Spider is working. The problem is that most people
trying to curry the Spider's favor are focused on what is happening inside
the "hidden layer" of its neural network. When the functioning
of the Google Search Engine relied entirely on mathematical algorithms
and conventional (sequential) computing, "figuring the Spider out"
was just a matter of math and computer science.
Previously, the Spider could not escape the fact it was
essentially functioning as a calculator, performing arithmetical functions
and yielding results which provided the formulaic basis of Pagerank. For
this reason, it was possible to "fool" the spider. You only
had to pursue the same arithmetical results as the Spider was told should
yield a PR5 to obtain a PR5, for example. There were other ways SE spammers
obtained undeserved treatment for their pages, sites and images. All operated
on the Spider's arithmetical vulnerability. That has all changed now,
and where once there was decipherable mathematics, replicable precision
and, therefore, the ability to reverse engineer for favorable Google treatment,
now there's just a lot of "fuzz".
Let's go back to the simple "veggie sorting"
neural network example. Whomever "trained" this neural network
had to establish a number of "exemplary associations", linking
a finite set of input values (in this case, physical attributes along
two dimensions -- size and color) with a finite set of possible vegetable
"conclusions". If the variety of vegetables passing through
the system were too great in number, the network's ability to properly
identify them would decrease. Likewise, it would be insane to expect the
neural network, once trained, to begin "sniffing" veggies with
any usefulness.
Now imagine that you are trying to fool
this system, as many people try to fool the Google Spider. Say, for example,
you are trying to produce "pumpkin" as a result. This gets a
little tough, even though you know what the physical attributes of a pumpkin
are (including it's color and average size), you don't know ANYTHING about
the training the neural network received, which allows it to "decide"
when a vegetable is a pumpkin or is not. Remember that direct
perception of fact -- as in, "That's a pumpkin because I
know what pumpkins are..." is NOT possible.
The neural network in this case is deciding when it has a pumpkin based
on size and color comparisons made against a set of other vegetables,
and a unique history of failures and successes -- neither of which you
would know if you were trying to pull a fast one. Even if you knew that
the neural network's "experience" consisted only of pumpkins,
zucchinis, yams, carrots and potatoes, it would take you some time to
achieve your desired result, the mathematical expression of which might
contain such values as 45% carrot and 225% potato.
You can see, also, that a finite range of input values
and a finite range of output expectations are necessary for the system
to work. There are about a hundred different types of vegetable in the
average supermarket, a number (and diversity of colors and sizes) that
would overhwhelm our example neural network. If taxed with the identification
of so great a diversity of vegetables, a neural network as in our example
would take a very long time to train and would experience a high rate
of error.
Consider the task faced by the Google Spider, now that
you understand that its operating behavior strongly resembles a neural
network. We have already established and explained why the most efficient
and best performing neural networks are those where both
the set of possible input values and the set of expected (desired) conclusions
are finite and pre-defined. Unfortunately
for the Spider, though its input value set is based on a finite number
of variables (such as anchor text, tags, link context,
etc.) it may encounter these varibales in a virtually infinite
number of possible combinations.
If it were the function of the Spider only to
identify and categorize pages, sites and their constituent elements,
its job would be much easier than it is. This was its original function,
and this basic task remains an important part of what the Spider does.
Indeed, the vast majority of "SEO advice" given to Webmasters
relates to making one's pages and sites readily classifiable.
And likewise, the bulk of SEO spam activity relates to trying to obtain
false classifications.
Following the traditional rules still yields rewards,
however, such as basic inclusion in the index -- but does little to bring
your sites and pages any "special" consideration from the Spider.
That your visibility within the index should be a function of who links
to you and how much they link to you is absolutely correct. However, it
is lamentable that SEO Science should give up the goat to pure Content
Merit. What I mean is, once you have properly constructed your
pages and dotted all your i's and crossed your t's, there's little left
in your hands (short of making the most link-worthy pages on the subject
-- lol) that you as a Webmaster can do, to Deservingly Improve
your treatment by the Spider.
HOW IN THE WORLD COULD YOU DESERVINGLY
IMPROVE YOUR TREATMENT BY THE SPIDER?
The answer to that question relates to the Spider's function
beyond mere identification and classification. The question might also
be re-phrased:
HOW CAN I APPEAL TO THE SPIDER, MAKE GOOD PAGES FOR USERS
AND REDUCE MY RELIANCE ON OTHER WEBMASTERS LINKING TO ME?
Unlike the simple veggie-sorting example above, which
we know must reject some number (in this case the vast
majority) of vegetables, in order to make accurate decisions about
just a few, the Google Spider Neural Network has no such luxury.
The possible input set approaches infinity. So does the output set. But,
like a blog whose 6 or 7 "templates" can be
used to create an enornmous number of page instantiations when
the server is called, the Google Search Engine is now
"making decisions" about pages and sites when asked.
Links, tags, anchor text, etc... comprise a language
of nearly infinite scope. In the narrow-minded pursuit of their own narrow
objectives so called legit SEO's and spammers are crowded into a narrow
band of the Spider's language capability, all trying to convince the Spider
of this thing or that thing, by telling it what they think it wants to
hear. LOL!
|