------------------------------------------------------------------------
BIOWULF'S MATHEMATICAL TOOLS:
-----------------------------
MACHINE LEARNING AND THE "SVM" METHOD
-------------------------------------
(PART 2 OF 3)
-------------
screenplay by
Alan B. Scrivener
26 May 2001
first draft
------------------------------------------------------------------------
34. CARD YELLOW LETTERS ON BLUE BACKGROUND:
combinatorial explosion
NARRATOR (CONTINUING)
To understand the combinatorial explosion, let's take a
momentary diversion into the world of subsets.
35. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
First lets consider some small sets and their subsets. If you
have a set with only one thing in it, there are two possible
subsets: the original set with one thing in it and the
empty set, which you will recall is the name for a set with
nothing in it.
36. ANIMATION A-2
NARRATOR (CONTINUING)
This animation is cycling through these two options.
One thing, then nothing. Or if you prefer, nothing,
then one thing. The sphere is the one object, and
we use red to show not being in the set and green to
show being in the set. Just like red means stop and
green means go at a traffic light, here red means "no"
and green means "yes."
So far with this example it is no problem to just breeze
through all the combinations.
37. ANIMATION A-3
NARRATOR (CONTINUING)
Next consider a set of two objects. There are four subsets:
no objects, one object, the other object, or both objects.
We highlight the connection between them to emphasize that
they are in a subset together.
38. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
So one object had 2 subsets, while 2 objects had 4 subsets.
What do you think, will 3 objects have 6 subsets?
39. ANIMATION A-4
NARRATOR (CONTINUING)
As you can see in this animation, there is one way to take
no things, three ways to take one thing, three ways to
take two things, and one way to take three things.
That adds up to eight subsets.
40. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
Eight? The sequence goes 2, 4, 8 so far. Is it doubling
every time? Let's see.
41. ANIMATION A-5
NARRATOR (CONTINUING)
Sure enough. When there are four objects, there is
one way to have no thing, four ways to have one thing,
six ways to have two things -- four around the square
and the two diagonals -- four ways to have three things
and one way to have four things.
42. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
This is getting boring. You know why? Because it's
taking so long! It is doubling every time. That was
sixteen subsets, and the next one, seven objects, would
be 32 subsets. So we're not going to do that one. At the
2 frames per second we've been using, it would take over
a minute. We promised this would be short.
So let's jump ahead. What if we have 30 objects?
That's not a very big number. A classroom of 30
students is a common thing. How many possible
clubs could you form out of a classroom of thirty
students? If we try to count them up I don't think
we'll make it.
43. ANIMATION A-6
| * * * |
NARRATOR (CONTINUING)
It starts out easy -- there's one way to have a club
with no students. Thirty ways to have a club with one student.
335 ways to have a club with 2 students. This is going
to take a while. Almost 3 minutes just for the clubs of two
students. While we're watching the animation run, let me talk
about how we figure this out. There's 30 ways to pick the
first student, and because they can't be in a club twice you
exclude them from the second choice, and have 29 ways to pick
the second student. But that means we've counted each club twice,
so the formula is 30 times 29 divided by 2, which is how we get 335.
As you can imagine, this is quite a pain to do for all the sizes of
clubs.
Luckily there's another way. If it doubles each time, the
general rule is that the number of subsets is just 2 to the
power of the number of objects. Call this N for number of
objects. The number of subsets is 2 raised to the N power.
2 raised to the 30th power is over a billion (that's American
billion, 1 followed by 9 zeroes). At this frame rate it will take us
17 years to watch this animation. So get comfy!
Just kidding.
[ANIMATION IS INTERRUPTED]
44. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
Now you can see why we call it a combinatorial explosion.
Remember, we're dealing with data with up to 22,000 things to
correlate in subsets. Long before we get up that high
we'll be dealing with more susbsets than there are particles
in the known universe. People like to use that phrase a lot:
"more than the number of particles in the known universe."
But that's just peanuts to the combinatorial explosion. These
numbers are staggeringly bigger than that!
Allow me to quote from Douglas Adams, in The Hitchhiker's
Guide to the Galaxy.:
[HOLDS UP BOOK]
"Space is big. Really big. You just won't believe how
vastly hugely mind-bogglingly big it is. I mean, you may
think it's a long way down the road to the chemist, but
that's just peanuts to space."
Well, as you know, space is mostly devoid of matter -- the
known universe is mostly empty space. But imagine if the same
sized known universe was full of particles, as many as it could
hold. The number of subsets of 22,000 objects is mind-numbingly
bigger than the number of particles the universe could hold.
So my point, and I do have one, is that because of the
combinatorial explosion there is no way we are going
to try all possible subsets of what we are calling the
columns or the dimensions of data. So, we have to do
something else. Something smarter.
Now that we've returned from our diversion through the world
of subsets, our next concept is...
45. CARD YELLOW LETTERS ON BLUE BACKGROUND:
correlation
NARRATOR (CONTINUING)
...correlation. One of the most important questions
people can ask is, "why?" We want to establish causality.
The first step in that process is to establish a correlation.
46. CARD: FIGURE F-7

NARRATOR (CONTINUING)
In this state space plot all of the data points occur
pretty much on a line. This shows a strong correlation.
Low values of one variable always seem to be associated
with low values of the other, and high values of one
variable always seem to be associated with high values
of the other.
What we are hoping to do is to find a small number of
columns among the 22,000 that are highly correlated
with the decision function, so we can throw away the rest.
47. CARD YELLOW LETTERS ON BLUE BACKGROUND:
orthogonal
NARRATOR (CONTINUING)
This is related to the idea of being orthogonal.
Now don't get confused. The word orthogonal is also
used in geometry to mean at right angles. A flagpole
is orthogonal to the ground. But here it has a different,
more subtle meaning. It means roughly the opposite of
correlated. Two variables are orthogonal if they are
completely uncorrelated, if they don't affect each other
at all. This means there's no causality between them.
48. CARD: FIGURE F-8

NARRATOR (CONTINUING)
If you make a state space plot -- also called a phase
plot -- of two orthogonal variables, it will look like a
random scatter. These variables were actually plotted
by rolling dice, and you can see how uncorrelated they are.
49. CARD YELLOW LETTERS ON BLUE BACKGROUND:
linear
NARRATOR (CONTINUING)
Our next concept is linear.
50. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
This is a very important concept but it comes with a
lot of baggage. The term "non-linear" comes around
every 20 years or so and makes a big splash. It got a
lot of attention in the late 70s and early 80s with
the beginnings of chaos theory: a whole zoo of self-similar
fractals, strange attractors and chaotic bifurcations.
Unfortunately, an awful lot of people used the terms
linear and non-linear without really understanding what
they meant.
Consider this simple linear equation: y = ax - b
51. CARD YELLOW LETTERS ON BLUE BACKGROUND:
y = ax - b
NARRATOR (CONTINUING)
Remember that x is the variable and a and b are constants.
This is a linear equation because the most complicated term
is 'a' times 'x,' a linear relationship.
52. ANIMATION A-7
[Note: In this animation the narrator's finger should be
represented on a dial or knob, or else editing should be
used to incorporate live action of the narrator moving
a mouse.]

NARRATOR (CONTINUING)
The graph of this equation looks like this. You can see
that it is a straight line no matter what values 'a' and
'b' have. You may recall the idea of slop and intercept
from algebra. 'a' controls the slope of the line, and
'b' controls the Y intercept, the place where the line
intersects the vertical axis here; we show a blue sphere
at that point.
53. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
This shows how a linear equation of one variable looks.
(By the way, in higher mathematics 'x' could also be a
vector or a function.)
54. CARD: FIGURE F-9
[THE SUM OF TWO LINEAR FUNCTIONS]

NARRATOR (CONTINUING)
An almost magical property of linear equations is
that if you add two or more of them in any linear
combination you always get another linear equation.
Here we show visually how two linear equations are
added to produce a third.
55. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
A lot of work has been done over a long period of time
coming up with techniques and tricks for solving linear
equations. At this point they represent a sort of
double-edged sword. Whenever a problem can be reduced
to a linear form it becomes much easier to analyze and
solve. But most real-world systems cannot be described
by linear equations. So we much chose between ease of
solution and accuracy.
In fact, saying linear equations are easier to solve is
a vast understatement. Consider this chart from the book
General System Theory by Ludwig von Bertalanffy, 1968.
The chapter is called "The Difficulty of Solving Equations."
56. TABLE T-5
|
Table 1.1
|
| . | . | . | Linear | . | . | . | . | Nonlinear | . |
|---|---|---|---|---|---|---|---|---|---|
| Equation Type: | One Equation |
. | Several Equations |
. | Many Equations |
. | One Equation |
Several Equations |
Many Equations |
| Algebraic | Trivial | . | Easy | . | Essentially Impossible | . | Very Difficult | Very Difficult | Impossible |
| Ordinary Differential | Easy | . | Difficult | . | Essentially Impossible | . | Essentially Impossible | Impossible | Impossible |
| . | . | . | x | . | . | . | . | . | . |
| Partial Differential | Difficult | . | Essentially Impossible | . | Essentially Impossible | . | Impossible | Impossible | Impossible |
* Courtesy of Electronic Associates, Inc.
NARRATOR (CONTINUING)
Take a moment to look at this. Down the left column are
listed types of equations. Of the other two columns to
the right, one is for linear and the other for non-linear
equations. [PAUSE]
Notice that all the non-linear equations are very
difficult or worse. Even the linear equations are pretty
bad. Everything to the right of the thick black line is
pretty useless.
57. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
Here's what the author had to say in his own dryly
humorous style: "[these equations are], if linear,
tiresome to solve even in the case of a few variables;
if nonlinear, they are unsolvable except in special cases."
So, to paraphrase an old song, we will use linear
equations when we can and non-linear equations when we must.
58. CARD YELLOW LETTERS ON BLUE BACKGROUND:
weights
NARRATOR (CONTINUING)
Building on the concept of linearity, our next
concepts is weights.
59. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
Our best hope is that we can produce a linear decision
function, with weights for each data dimension. We
use the term weights to mean a set of constants, like
'a' in the last example, only a vector of them. We call
this vector 'w' for weights, and it consists of a set of
constants named w-sub-1 through w-sub-22,000 or whatever.
So our decision function d becomes this.
60. CARD YELLOW LETTERS ON BLUE BACKGROUND:
vector form
D = w.x
expanded
D = w1 x1 + w2 x2 + w3 x3 +
NARRATOR (CONTINUING)
We use bold in the vector form to show that w and x are vectors.
The w's and x's in the expanded form are the components of the
vectors.
61. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
We can visualize the x vectors as a bunch of strips of cloth
tied onto a string to become the tail of a kite. The land on
top of the weight vectors and as you see in this animation...
62. ANIMATION A-8
[Note: Again, in this animation the narrator's finger should be
represented on a dial or knob, or else editing should be
used to incorporate live action of the narrator moving
a mouse.]

NARRATOR (CONTINUING)
... each weight -- the red bars -- multiplies by its
corresponding x -- the blue bars -- to make a weighted result
-- the purple bars -- which are then all added together to make
the decision function. The only challenge is to choose the
values of the w's correctly.
63. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
Of course, that's the whole game. Assuming you can live
with a linear result, and we know how much grief that will
save us, the decision function we just gave is the holy
grail of machine learning. How do you set the weights?
From this point we'll be looking at ways that Biowulf's
mathematicians solve this problem. The first important tool is...
64. CARD YELLOW LETTERS ON BLUE BACKGROUND:
recursion
NARRATOR (CONTINUING)
... recursion.
65. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
This is actually an idea that is actually easier
to understand than it is to explain. Perhaps this old
Mad Magazine cover will help.
66. CARD: FIGURE F-10

NARRATOR (CONTINUING)
As you can see, recursion is a sort of magic trick
that seems to create something out of nothing. But
it actually just involves use of the magic words
"and so on."
In this case, Alfred E. Neuman is holding a hat that
contains a rabbit, and the rabbit is holding a hat that
contains a smaller Alfred, and so on. In this case the
"and so on" only goes two levels down, but it's easy to
see how it could go any number of levels.
67. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
So, armed with that concept we move on to...
68. CARD YELLOW LETTERS ON BLUE BACKGROUND:
recursive feature elimination
NARRATOR (CONTINUING)
... recursive feature elimination.
69. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
This is one of the methods that Biowulf's mathematicians have
refined. Perhaps the best way to explain it is to compare it
to that recent hit TV show in the United States -- Survivor.
Rather than trying to evaluate each contestant's survival
skills alone, they tested them as a team. This may not be
the way to identify the best survivor, but it makes it pretty
easy to identify the worst.
Similarly, it is easier to analyze a set of columns of
data in our chart to see which is least useful for deciding
the decision function. Then that column, or feature as it
is called here, is eliminated and the same procedure is
done on the remaining group, and so on. There's
those magic words, which make it recursive feature
elimination.
previous page
next page
Last update 26-May-2001 10:41 PM by ABS.