------------------------------------------------------------------------
BIOWULF'S MATHEMATICAL TOOLS:
-----------------------------
MACHINE LEARNING AND THE "SVM" METHOD
-------------------------------------
(PART 3 OF 3)
-------------
screenplay by
Alan B. Scrivener
26 May 2001
first draft
------------------------------------------------------------------------
70. CARD YELLOW LETTERS ON BLUE BACKGROUND:
support vectors
NARRATOR (CONTINUING)
Now we get to the fundamental tool used by BIOwulf's
mathematicians: support vectors.
71. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
And it's only fair to warn you, this is the weird part.
What's weird about it is that it involves higher dimensions.
Higher even than the 22,000 we mentioned before.
But first, let me clear up some possible confusion about
terminology. We've been throwing a lot of specialized
mathematical terms around here, sometimes with subtle
distinctions. Just like Eskimos have many words for snow,
mathematicians have many words for dimensions.
72. CARD: TABLE T-6
| rat # | Carb. | Fat | Cholest. | Sodium | Protein | Vitamin A | Vitamin C | Calcium | Iron | +/- |
|---|---|---|---|---|---|---|---|---|---|---|
| rat #1 | 0 g | 0.5 g | 30 mg | 200 mg | 13 g | 0% | 0% | 0% | 2% | - |
| rat #2 | 20 g | 0 g | 0 mg | 10 mg | 1 g | 0% | 20% | 0% | 2% | - |
| rat #3 | 25 g | 2 g | 0 mg | 45 mg | 0 g | 0% | 0% | 0% | 2% | - |
| rat #4 | 4 g | 0 g | 0 mg | 0 mg | 1 g | 4% | 8% | 2% | 4% | + |
NARRATOR (CONTINUING)
Recall our table of which nutrients the rats were fed. We
called the rows in this table data points, points in the
state space, vectors, and sometimes test data or training
data. We called the columns in this tables variables,
dimensions and features. They're dimensions when we want
to represent them in a state space, and features when we
want to eliminate them with recursive feature elimination.
Now we're going to call them dimensions again as we do some
math tricks with them.
73. ANIMATION A-9
[NOTE: THIS FOOTAGE HAS ALREADY BEEN PRODUCED BY BEOWULF]

NARRATOR (CONTINUING)
Here is a 2D state space plot of some data. We'd like to
be able to find a linear function decision function that
cleanly separates the +'s from the -'s. That would
mean finding a line that separates them in this plot, but
there isn't one.
Now you can see as the animation rotates to show a third
dimension, that in a 3D state space plot there is a linear
decision function that cleanly separates the space, which
means in 3D that a plane can separate them, and you see
that one can.
74. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
But what if 3 dimensions doesn't cut it? Well, we
can just add more. The problem is, as we've mentioned,
we can't you what that looks like. The best we can do
is use some toothpicks and some miniature marshmallows
to evoke the idea.
Here I have one marshmallow. It stands for a one
dimensional point. It has -- humor me on this -- no
height, width or depth.
I move the point in some direction and trace out a line
segment. We can build that with a toothpick and another
marshmallow.
[CUT]
75. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR IS HOLDING MODEL OF LINE SEGMENT MADE OF
TWO MARSHMALLOWS AND A TOOTHPICK
NARRATOR (CONTINUING)
So this represents a 1-dimensional line segment which
has length, but no width or height.
If I move it at right angles the same distance, it
traces out a square. I can build that with two more
marshmallows and three more toothpicks.
[CUT]
76. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR IS HOLDING MODEL OF SQUARE MADE OF
FOUR MARSHMALLOWS AND FOUR TOOTHPICKS
NARRATOR (CONTINUING)
This represents a 2-dimensional square which has
length and width but no height.
If I once again move it at right angles the same
distance it traces out a cube. I can build a
cube by adding four marshmallows and eight toothpicks.
[CUT]
77. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR IS HOLDING MODEL OF CUBE MADE OF
EIGHT MARSHMALLOWS AND TWELVE TOOTHPICKS
NARRATOR (CONTINUING)
This represents a three-dimensional cube which has
length, width and height.
Now we'd like to do this trick again but we can't --
we've run out of dimensions. I can even imagine what
the next step would be. There are people who claim to
be able to visualize higher dimensions, but other people
wonder if they really can.
Still, there is something we can do. We can study each
transition from one dimension to the next, and then say
what would happen by analogy.
78. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
So without a visual model to guide us, we can say: "Imagine
moving the cube at right angles again, the same distance,
and tracing out a hypercube. We 4 dimensions we could
build one of those with eight more marshmallows and
20 more toothpicks."
Now, I have no idea what that means, and we don't have that
extra fourth dimension to try it in. But the point is, a
lot can be figured out by analogy. We know how the math
works at any dimension, up to infinite dimensions, believe
it or not. And luckily, we don;t need to visualize the
problem to solve it -- only to explain it! That's why we're
better at solving it than explaining it.
Okay, I've been jollying you along here with this higher dimension stuff to prepare you for this next statement. It probably won't make perfect sense, but it's the best we can do.
The Support Vector Machine -- SVM -- method, works this way.
It keeps adding dimensions until it can find a hyperplane to
split the state space, which means there is a linear decision
function. Then it finds all the data vectors that are closest
to this hyper-plane. These are called the support vectors.
Then the support vectors are used to do recursive feature
elimination and to create the decision function.
Let me run through that again.
The SVM method keeps adding dimensions until it can
find a hyper-plane -- a higher-dimensional plane -- to split
the state space, which means there is a linear decision function.
Then it finds all the data vectors that are closest to this
hyper-plane. These are called the support vectors. Then the
support vectors are used to do recursive feature elimination
and to create the decision function.
Now one thing that's interesting about this is that we
started out wanting to throw away columns, and we did,
but we also threw away rows! Any vectors which are not
support vectors are thrown away and not used for the
calculating the decision function.
Okay, I know your mind is swirling. Maybe some pictures
will help.
79. CARD: FIGURE F-11

NARRATOR (CONTINUING)
The following images are taken from an interactive Java
Applet written by Hong Zhang of BIOwulf.
Now once again we have the advantages and disadvantages
of being back in 2 dimensions. It's good because we can
see the whole state space at once, but it's bad because
it can't show any of the higher dimensional data.
The way this Applet works is the user places data points in a
white rectangle, black dots for the pluses and open dots for
the minuses. Then the Applet chooses the support vectors
and computes the decision function using the SVM method.
Next the decision function is displayed as color, red
being the low negative values and blue the high positive
values. The data points chosen to be support vectors are
circled.
80. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
This was for a linear decision function. Here's how it
looks if we allow a non-linear decision function.
81. CARD: FIGURE F-12

NARRATOR (CONTINUING)
You can see that the data won't permit a line to divide
the state space, so a curve is needed. In this cave the
curve is defined by a third order polynomial. This is an
equation with no more complex terms than x cubed.
82. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
Let's take a look at an animated sequence of these.
83. CARD: FIGURE F-13 (f01.gif)

NARRATOR (CONTINUING)
We started out with a grid of nine data points. The three
upper rightmost were minuses while the rest were pluses.
Three points near the dividing line were chosen by the
Applet to be support vectors, and the decision function
follows a slight curve down the middle, though it bends
more elsewhere.
Next we added another "plus" data point, near the center,
just up and to the right of the center point. We let the
Applet compute again, and...
84. CARD: FIGURE F-14 (f02.gif)

NARRATOR (CONTINUING)
... now you can see the Applet has chosen the new point to
replace the center point as a support vector, and moved the
decision function a little, but there are still only three
support vectors.
85. CARD: FIGURE F-15 (f03.gif)

NARRATOR (CONTINUING)
Now we add a few more data points, this time minuses,
farther to the upper right of the center. We let the
Applet compute once again ...
86. CARD: FIGURE F-16 (f04.gif)

NARRATOR (CONTINUING)
... and now you can see the Applet has added one of the new
points as a support vector, and dropped the one in the upper
left, but still has only three of them. Of course it has
moved the decision function again.
87. CARD: FIGURE F-17 (f05.gif)

NARRATOR (CONTINUING)
Here's another positive data point added...
88. CARD: FIGURE F-18 (f06.gif)

NARRATOR (CONTINUING)
... and it gets chosen to be a support vector. The decision
function nows jumps more dramatically.
89. CARD: FIGURE F-19 (f07.gif)

NARRATOR (CONTINUING)
Here's another positive data point added...
90. CARD: FIGURE F-20 (f08.gif)

NARRATOR (CONTINUING)
... and it gets chosen to be a support vector. The decision
function is tightening up as we give it finer and finer distinctions.
91. CARD: FIGURE F-21 (f09.gif)

NARRATOR (CONTINUING)
Now we've hopped down to region lower and right of center,
and put another negative data point near the dividing line.
92. CARD: FIGURE F-22 (f10.gif)

NARRATOR (CONTINUING)
... and it gets chosen to be a support vector. But note that
now there are four circled points -- four support vectors.
The support vector machine needs more data points to define
the decision function.
93. CARD: FIGURE F-23 (f11.gif)

NARRATOR (CONTINUING)
Here's another negative data point added, advancing into
what was clearly positive territory.
94. CARD: FIGURE F-24 (f12.gif)

NARRATOR (CONTINUING)
... and it gets chosen to be a support vector, replacing the
previous point. But notice that the positive data point in
the lower right has also bbeen made a support vector, totaling
five. The decision function is really tightening up now.
95. CARD: FIGURE F-25 (f13.gif)

NARRATOR (CONTINUING)
Once again we add another negative data point in what was
previously positive territory.
96. CARD: FIGURE F-26 (f14.gif)

NARRATOR (CONTINUING)
Woah! The decision function just got wider and more curved!
And the number of support vectors has increased again --
now there are six. Each I time I add a point in the wrong
territory I'm violating its assumptions, so it has to make
a subtler decision function.
97. CARD: FIGURE F-27 (f15.gif)

NARRATOR (CONTINUING)
Okay, we add one last negative data point the same way.
98. CARD: FIGURE F-28 (f16.gif)

NARRATOR (CONTINUING)
... and it gets chosen to be a support vector, along with
another point -- now there are eight!. The decision
function has twisted and widened to accommodate the more
stringent data.
99. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
Well, we could go on like this all day. It is fascinating
to watch, and even more fascinating to interact with the
Applet yourself.
But now lets look at something you don't see when you
play with the Applet. The following animation shows
those frames we just went through speeded up, with the
compute time removed.
100. ANIMATION A-10

NARRATOR (CONTINUING)
It's a thing of beauty. Sometimes the eye can pick out
patterns in motion that we don't see any other way.
Well, enough of that.
101. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
I hope this is giving you a sense of what the Support
Vector Machine method does. Unfortunately, we can show
you 2 dimensions but all the real benefits occur with
much higher dimensions, which we can't show you.
So we've decided to do what the folks at Hollywood
special effects houses do -- make something up. The
following animation isn't designed to be an exact
mathematical representation, but instead is an "artist's
conception."
102. ANIMATION A-11
![]()
[ARTISTS CONCEPTION OF SVM PROCESS]
NARRATOR (CONTINUING)
The sort of flowering of the data represents going
into higher dimensions... and then you see it compacting
down again when the support vectors are found.
103. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND
NARRATOR (CONTINUING)
Well, I'm sure some of this made perfect sense, and
some of it seemed like mumbo jumbo. It's that way to
most of us, too. But the proof, they say, is in the pudding.
Now that we've done a survey of the mathematical tools
used by BIOwulf, let's ask the most important question:
104. CARD YELLOW LETTERS ON BLUE BACKGROUND:
Does it work?
NARRATOR (CONTINUING)
Well, the short answer to this is: it sure does!
In the paper "Gene Selection for Colon Classification
using Support Vector Machines by Isabelle Guyon, Jason
Weston, Stephen Barnhill, and Vladimir Vapnik, the story
is told of these 62 biopsies samples we mentioned earlier.
They report that SVM is better than the next best known
technique with 95.8% confidence. Furthermore, the SVM
used 8 instead of 32 genes to make the classification,
after recursive feature elimination, and still had a
lower error rate.
Remember that this was from frozen tissue samples that
were combined with 30 years of follow-up patient data.
This stuff can save lives!
It was also interesting to find that other methods
falsely associated the presence of skin cell genes
with cancerous cells, just because cancerous tissue
samples tended to have more skin cells while healthy
samples had more muscle cells. The SVM method did not
make this false correlation. Also, the SVM method
did correlate lack cancer with the presence
of genes from a tiny parasite, a protozoa named
Trypanosoma, indigenous to Africa and South America.
To quote the paper: "We thought this may be an anomaly
of our gene selection method or an error in the data,
until we discovered that clinical studies report that
patients infected by Trypanosoma -- a colon parasite
-- develop resistance against colon cancer."
Clearly we need to put this new tool to work on medical diagnosis
right away!
Thank you for your time in viewing this explanatory video.
For more information about BIOwulf, ______.
previous page
Last update 26-May-2001 09:41 PM by ABS.