## Wednesday, October 26, 2011

### Machine Learning 1- 9

ML is a reasonable place to go from probability, but I'm still a bit weirded out by having it so soon.

Someone else complained to me that Sebastian bobs his head too much when he talks.  I hadn't noticed it, but now that he mentioned it, I can't not.  Now I have passed on this little gift to you.

What is:  yes it is building models from data, but not just diagnostic models. We can also learn models that tell us consequences of actions that we use in search.

Let me guess, this question will be "all of the above."  Yep.

Stanley: This actually a good example of something that bothered me about robots in the mid-90s to early 2000s:  A lot of times what it seemed like we were doing was coming up with algorithms whose main objective was to overcome crappy sensors.  Look at the image from Stanley's camera- I can barely tell where the road is, so of course the robot is going to have a hard time.  The solution of overlaying with the laser data is great, until the cameras improve and we can find the road based just on that.  That is why I've always preferred working in simulation.

Taxonomy: That is a really good list, and a good point at that end, that this list encompasses a great deal and it would take years to fully understand all of what's written on that page.

Supervised learning:  Interesting, he gave us a warning this time that we haven't seen the answer to the question, but he wants to check our intuition.  I wonder if he received a lot of negative feedback about that.

SPAM: Something that should have been made clear from the outset.  you pick your dictionary first, and process all messages with that same dictionary.  This leaves you with very long, mostly 0 input vectors.  Notice the caveat at the bottom where he says "sport" and "sports".  That's very common.  It's called stemming: reducing all variations of a single word to a single word.  The idea being that since sport and sports and sporting and sporty all have similar meanings, it is useful to count them the same in the input vector.

Maximum Likelihood:  I know what maximum likelihood is, and I have no idea what he's talking about, entirely because he keeps changing his notation and his meanings of symbols.  Let's see if we can figure it out:
$$P(S) = \pi$$ means for $$\forall_i P(y_i = S) = \pi$$
The next bit means:
$$\forall_i P(y_i=S) = \pi \wedge P(y_i=H) = 1- \pi$$
I have never seen his notation to say this before.

The next line is a funny way to restate the previous one.  How can we use the value or $$y_i$$ to compute the $$P(y_i)$$? If we know the value of $$y_i$$, then the distribution is 0s and 1s for a particular i.  What he's trying to do is use $$y_i$$ as a switch, so he doesn't have to say =S or =H.  That is,:

if $$y_i=S, P(y_i=S) = \pi$$ and if  $$y_i=H, P(y_i=H) = 1-\pi$$.  This notation is drifting to perversion.

Now, the next line can be derived from the previous line, but it also can be derived from basic probability.  The probability of a set of independent trials is the product of the probability of the outcomes of the individual trials.

"we can also maximize the logarithm of this expression." True statement, not actually obvious how that works.  It is true because log is always increasing, so the in the existing function f(), if $$f(a)>f(b)$$ then $$\log f(a) > \log f(b)$$.  This is handy because dealing with logs is easier to take derivative.

I think I'll publish this post now to get it out.

#### 1 comment:

1. I find your posts on the course really helpful. Thank you!