------------------------------------------------------------------------

                      BIOWULF'S MATHEMATICAL TOOLS:
                      -----------------------------

                  MACHINE LEARNING AND THE "SVM" METHOD
                  -------------------------------------

                              (PART 2 OF 3)
                              -------------

                              screenplay by

                            Alan B. Scrivener

                                                             26 May 2001
                                                             first draft

------------------------------------------------------------------------

34. CARD YELLOW LETTERS ON BLUE BACKGROUND:

   combinatorial explosion

         NARRATOR (CONTINUING)
      To understand the combinatorial explosion, let's take a
      momentary diversion into the world of subsets.

35. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)

      First lets consider some small sets and their subsets.  If you
      have a set with only one thing in it, there are two possible
      subsets: the original set with one thing in it and the
      empty set, which you will recall is the name for a set with
      nothing in it.

36. ANIMATION A-2



         NARRATOR (CONTINUING)
      This animation is cycling through these two options. 
      One thing, then nothing.  Or if you prefer, nothing,
      then one thing.  The sphere is the one object, and
      we use red to show not being in the set and green to
      show being in the set.  Just like red means stop and
      green means go at a traffic light, here red means "no"
      and green means "yes."

      So far with this example it is no problem to just breeze
      through all the combinations.

37. ANIMATION A-3



         NARRATOR (CONTINUING)
      Next consider a set of two objects.  There are four subsets:
      no objects, one object, the other object, or both objects.
      We highlight the connection between them to emphasize that
      they are in a subset together.

38. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      So one object had 2 subsets, while 2 objects had 4 subsets.
      What do you think, will 3 objects have 6 subsets?

39. ANIMATION A-4



         NARRATOR (CONTINUING)
      As you can see in this animation, there is one way to take
      no things, three ways to take one thing, three ways to
      take two things, and one way to take three things.
      That adds up to eight subsets.

40. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      Eight?  The sequence goes 2, 4, 8 so far.  Is it doubling
      every time? Let's see.

41. ANIMATION A-5



         NARRATOR (CONTINUING)
      Sure enough.  When there are four objects, there is
      one way to have no thing, four ways to have one thing,
      six ways to have two things -- four around the square
      and the two diagonals -- four ways to have three things
      and one way to have four things.

42. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      This is getting boring.  You know why?  Because it's
      taking so long!  It is doubling every time.  That was
      sixteen subsets, and the next one, seven objects, would
      be 32 subsets.  So we're not going to do that one.  At the
      2 frames per second we've been using, it would take over
      a minute.  We promised this would be short.

      So let's jump ahead.  What if we have 30 objects?
      That's not a very big number.  A classroom of 30
      students is a common thing.  How many possible
      clubs could you form out of a classroom of thirty
      students?  If we try to count them up I don't think
      we'll make it.

43. ANIMATION A-6

* * *



         NARRATOR (CONTINUING)
      It starts out easy -- there's one way to have a club
      with no students.  Thirty ways to have a club with one student.
      335 ways to have a club with 2 students.  This is going
      to take a while.  Almost 3 minutes just for the clubs of two
      students.  While we're watching the animation run, let me talk
      about how we figure this out.    There's 30 ways to pick the
      first student, and because they can't be in a club twice you
      exclude them from the second choice, and have 29 ways to pick
      the second student.  But that means  we've counted each club twice,
      so the formula is 30 times 29 divided by 2, which is how we get 335.
      As you can imagine, this is quite a pain to do for all the sizes of
      clubs.

      Luckily there's another way.  If it doubles each time, the
      general rule is that the number of subsets is just 2 to the
      power of the number of objects.  Call this N for number of
      objects.  The number of subsets is 2 raised to the N power.
      2 raised to the 30th power is over a billion (that's American
      billion, 1 followed by 9 zeroes).  At this frame rate it will take us
      17 years to watch this animation.  So get comfy!

      Just kidding.

   [ANIMATION IS INTERRUPTED]

44. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      Now you can see why we call it a combinatorial explosion. 
      Remember, we're dealing with data with up to 22,000 things to
      correlate in subsets.  Long before we get up that high
      we'll be dealing with more susbsets than there are particles
      in the known universe.  People like to use that phrase a lot:
      "more than the number of particles in the known universe." 
      But that's just peanuts to the combinatorial explosion.  These
      numbers are staggeringly bigger than that!

      Allow me to quote from Douglas Adams, in The Hitchhiker's
      Guide to the Galaxy.:

   [HOLDS UP BOOK]

      "Space is big.  Really big.  You just won't believe how
      vastly hugely mind-bogglingly big it is.  I mean, you may
      think it's a long way down the road to the chemist, but
      that's just peanuts to space."

      Well, as you know, space is mostly devoid of matter -- the
      known universe is mostly empty space.  But imagine if the same
      sized known universe was full of particles, as many as it could
      hold.  The number of subsets of 22,000 objects is mind-numbingly
      bigger than the number of particles the universe could hold.

      So my point, and I do have one, is that because of the
      combinatorial explosion there is no way we are going
      to try all possible subsets of what we are calling the
      columns or the dimensions of data.  So, we have to do
      something else.  Something smarter.

      Now that we've returned from our diversion through the world
      of subsets, our next concept is...

45. CARD YELLOW LETTERS ON BLUE BACKGROUND:

   correlation

         NARRATOR (CONTINUING)
      ...correlation.  One of the most important questions
      people can ask is, "why?"  We want to establish causality. 
      The first step in that process is to establish a correlation. 

46. CARD: FIGURE F-7

NARRATOR (CONTINUING) In this state space plot all of the data points occur pretty much on a line. This shows a strong correlation. Low values of one variable always seem to be associated with low values of the other, and high values of one variable always seem to be associated with high values of the other. What we are hoping to do is to find a small number of columns among the 22,000 that are highly correlated with the decision function, so we can throw away the rest. 47. CARD YELLOW LETTERS ON BLUE BACKGROUND: orthogonal NARRATOR (CONTINUING) This is related to the idea of being orthogonal. Now don't get confused. The word orthogonal is also used in geometry to mean at right angles. A flagpole is orthogonal to the ground. But here it has a different, more subtle meaning. It means roughly the opposite of correlated. Two variables are orthogonal if they are completely uncorrelated, if they don't affect each other at all. This means there's no causality between them. 48. CARD: FIGURE F-8

NARRATOR (CONTINUING) If you make a state space plot -- also called a phase plot -- of two orthogonal variables, it will look like a random scatter. These variables were actually plotted by rolling dice, and you can see how uncorrelated they are. 49. CARD YELLOW LETTERS ON BLUE BACKGROUND: linear NARRATOR (CONTINUING) Our next concept is linear. 50. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) This is a very important concept but it comes with a lot of baggage. The term "non-linear" comes around every 20 years or so and makes a big splash. It got a lot of attention in the late 70s and early 80s with the beginnings of chaos theory: a whole zoo of self-similar fractals, strange attractors and chaotic bifurcations. Unfortunately, an awful lot of people used the terms linear and non-linear without really understanding what they meant. Consider this simple linear equation: y = ax - b 51. CARD YELLOW LETTERS ON BLUE BACKGROUND: y = ax - b NARRATOR (CONTINUING) Remember that x is the variable and a and b are constants. This is a linear equation because the most complicated term is 'a' times 'x,' a linear relationship. 52. ANIMATION A-7 [Note: In this animation the narrator's finger should be represented on a dial or knob, or else editing should be used to incorporate live action of the narrator moving a mouse.]

NARRATOR (CONTINUING) The graph of this equation looks like this. You can see that it is a straight line no matter what values 'a' and 'b' have. You may recall the idea of slop and intercept from algebra. 'a' controls the slope of the line, and 'b' controls the Y intercept, the place where the line intersects the vertical axis here; we show a blue sphere at that point. 53. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) This shows how a linear equation of one variable looks. (By the way, in higher mathematics 'x' could also be a vector or a function.) 54. CARD: FIGURE F-9 [THE SUM OF TWO LINEAR FUNCTIONS]

NARRATOR (CONTINUING) An almost magical property of linear equations is that if you add two or more of them in any linear combination you always get another linear equation. Here we show visually how two linear equations are added to produce a third. 55. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) A lot of work has been done over a long period of time coming up with techniques and tricks for solving linear equations. At this point they represent a sort of double-edged sword. Whenever a problem can be reduced to a linear form it becomes much easier to analyze and solve. But most real-world systems cannot be described by linear equations. So we much chose between ease of solution and accuracy. In fact, saying linear equations are easier to solve is a vast understatement. Consider this chart from the book General System Theory by Ludwig von Bertalanffy, 1968. The chapter is called "The Difficulty of Solving Equations." 56. TABLE T-5

         NARRATOR (CONTINUING)
      Take a moment to look at this.  Down the left column are
      listed types of equations.  Of the other two columns to
      the right, one is for linear and the other for non-linear
      equations.  [PAUSE]
      
      Notice that all the non-linear equations are very
      difficult or worse.  Even the linear equations are pretty
      bad.  Everything to the right of the thick black line is
      pretty useless.

57. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

      Here's what the author had to say in his own dryly
      humorous style: "[these equations are], if linear,
      tiresome to solve even in the case of a few variables;
      if nonlinear, they are unsolvable except in special cases."

      So, to paraphrase an old song, we will use linear
      equations when we can and non-linear equations when we must.

58. CARD YELLOW LETTERS ON BLUE BACKGROUND:

   weights

         NARRATOR (CONTINUING)
      Building on the concept of linearity, our next
      concepts is weights.

59. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      Our best hope is that we can produce a linear decision
      function, with weights for each data dimension.  We
      use the term weights to mean a set of constants, like
      'a' in the last example, only a vector of them.  We call
      this vector 'w' for weights, and it consists of a set of
      constants named w-sub-1 through w-sub-22,000 or whatever.
      So our decision function d becomes this.

60. CARD YELLOW LETTERS ON BLUE BACKGROUND:

   vector form

   D = w.x

   expanded

   D = w1 x1 + w2 x2 + w3 x3 +     

         NARRATOR (CONTINUING)
      We use bold in the vector form to show that w and x are vectors.
      The w's and x's in the expanded form are the components of the
      vectors.

61. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND

         NARRATOR (CONTINUING)
      We can visualize the x vectors as a bunch of strips of cloth
      tied onto a string to become the tail of a kite.  The land on
      top of the weight vectors and as you see in this animation...

62. ANIMATION A-8

   [Note: Again, in this animation the narrator's finger should be
   represented on a dial or knob, or else editing should be
   used to incorporate live action of the narrator moving
   a mouse.]


NARRATOR (CONTINUING) ... each weight -- the red bars -- multiplies by its corresponding x -- the blue bars -- to make a weighted result -- the purple bars -- which are then all added together to make the decision function. The only challenge is to choose the values of the w's correctly. 63. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) Of course, that's the whole game. Assuming you can live with a linear result, and we know how much grief that will save us, the decision function we just gave is the holy grail of machine learning. How do you set the weights? From this point we'll be looking at ways that Biowulf's mathematicians solve this problem. The first important tool is... 64. CARD YELLOW LETTERS ON BLUE BACKGROUND: recursion NARRATOR (CONTINUING) ... recursion. 65. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) This is actually an idea that is actually easier to understand than it is to explain. Perhaps this old Mad Magazine cover will help. 66. CARD: FIGURE F-10

NARRATOR (CONTINUING) As you can see, recursion is a sort of magic trick that seems to create something out of nothing. But it actually just involves use of the magic words "and so on." In this case, Alfred E. Neuman is holding a hat that contains a rabbit, and the rabbit is holding a hat that contains a smaller Alfred, and so on. In this case the "and so on" only goes two levels down, but it's easy to see how it could go any number of levels. 67. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) So, armed with that concept we move on to... 68. CARD YELLOW LETTERS ON BLUE BACKGROUND: recursive feature elimination NARRATOR (CONTINUING) ... recursive feature elimination. 69. MEDIUM SHOT: NARRATOR IN FRONT OF NEUTRAL BACKGROUND NARRATOR (CONTINUING) This is one of the methods that Biowulf's mathematicians have refined. Perhaps the best way to explain it is to compare it to that recent hit TV show in the United States -- Survivor. Rather than trying to evaluate each contestant's survival skills alone, they tested them as a team. This may not be the way to identify the best survivor, but it makes it pretty easy to identify the worst. Similarly, it is easier to analyze a set of columns of data in our chart to see which is least useful for deciding the decision function. Then that column, or feature as it is called here, is eliminated and the same procedure is done on the remaining group, and so on. There's those magic words, which make it recursive feature elimination.
previous page next page

Last update 26-May-2001 10:41 PM by ABS.