## Tuesday, October 18, 2011

### Probability in AI

This is interesting, probability in the second week.  Usually that comes much later, and we roll into games now, followed by planning.  Very curious to see how this will work.

Oh, not just probabilities, but opening with Bayes networks.  Bold move.  One issue I have with this is that there is no attempt to connect this material with last weeks material.  A little bit of topic whiplash is inflicted on the student.  Some of this is a reflection of how many AI people see the AI world: a collection of loosely related topics and techniques for solving problems that require "smart".  Personally I think that is a mistake.  I'll put up another posting on the topic.

He let the 2^16 bit slide by.  I think that deserves more attention as it is the primary motivation for these things.  In classical probability, you can ask questions about conclusions given all the data, like, what is the probability that the alternator is broken, given that there is gas, and no oil, and no lights, etc., but to do that, you have to look at all possible combinations of the data: gas=true, oil=true, light=true, and gas=true, oil=true, light=false, and gas=true, oil=false, light=true, etc.  That's where the 2^n comes from, where n is the number of variables.  The key thing about Bayes nets is that it encodes information about the problem in the form of irrelevance.  Whether or not there is oil in the  affects whether or not it will start, not not necessarily whether or not there is gas.  This informations allows us to consider smaller tables

Probability/Coin Flip.  Starting really basic here.  Sebastian really likes this "quiz before he tells you how to do something.  Am I the only person driven nuts by this technique?  It always strikes me the the teacher is trying to show off how much more he knows.

Wait, we went from kindergarten probability to dropping words like "joint" and "marginals" around?  Define your terms!  Joint probability: a table describing the probability of outcomes of one event AND another event: P(X,Y)={P(X=true and Y= true, P(X=true and Y= false, P(X=false and Y= true, P(X=false and Y= false}.

He's going really fast.  Was a probs and stats course required for this class?  A friend in the math department once told me that it took them weeks to just get across what a random variable is, and we've just  jumped to week 6.  Ah, I see, yes, all this is a prerequisite and we're just doing "review" right now.  Sorry, back to the show.

Actually, I think this is going rather well considering we're already supposed to know this.

Bayes Rule:  big point right there that I think should be emphasized- the relationship between causal and diagnostic inference.  if A "causes" B (within some probability) then P(B|A) is the causal conditional probability, and P(A|B) is the diagnostic (because we're trying to determine what caused B to take the value it did- what is making the car not start).  The diagnostic is often the question we're after, but the causal is the value we know, because we can run experiments on it.  Take people both with and without cancer and give them the test many times, now we know P(+|C).  One reason Bayes rule is helpful is that it turns a question we don't know the number for (diagnostic) into one that we do (causal).

This number of parameters question has two answers.  Does he want the number of parameters in the system or the minimal number we need?  For instance we don't need to store P(~B|A)  if we store P(B|A).  The answer is either 3 or 6, let's see what he means... ah 3.

ew, what just happened to his hand? Is that a camera effect?

Ack, I hate that normalizer.  It's never made anything easier for me and it always confuses students.  At least he's explaining it; the usual approach is the just throw it out there with an "obviously we can omit the denominator and just normalize" comment.

The 2-test questions require that you understand conditional independence, which I'm pretty sure wouldn't be in a basic probability class.  Now he's explaining it.  I'd be pretty ticked off if I spent 10 minutes trying the quiz, only to be told afterwards what I need to figure it out.

I just realized that since I'm getting the questions right, I'm missing some of the material in terms of the answer explanations.  Maybe I should start getting them wrong.

Ha! they got me on the P(R|S) question.  I spent 5 minutes trying to derive the formula before I remember they're independent! lol.

Interestingly, I did  P(R|H,S)  a different way.  I computed the joint probability, which can be read off the network: P(R|H,S) = (eta)P(R,H,S).  The joint is just P(R,H,S) = P(R)P(S)P(H|R,S)

(yeah, so eta is useful when I can't type fractions. sue me)

P(R|H) is a difficult question.  I used the same trick of computing the joint:
P(R|H) = (eta)P(R,H).  P(R,H) = P(R,H,S)+P(R,H,~S) = P(R)P(S)P(H|R,S) + P(R)P(~S)P(H|R,~S)

Apparently I have deeply impressed Sebastian but perhaps that's not really fair :)

General Bayes Net: now we see the trick I used above.  Another important point is that the compactness of the network not just saves space, but means we need to gather less data to figure out what the numbers are.  That's actually more important than space.

So why do I get all the hard questions right but am unable to add up numbers to 13?

D separation is actually a really useful idea in daily life.  There are many times in meetings where we all agree on some fact B, but get lost discussing how A affects C.  In certain crowds you can just say, "we've established B, A and C are D-separated, let's move on."

In summary, this unit was way harder than than the first two.  I wonder if this is supposed to be the weeder section to flick off some of the less dedicated students.  I wouldn't blame them if it was.  On the other hand, I felt that the presentation quality was much higher than before.  It will be interesting to see if this is a function of Sebastian caring more about this topic, or just them getting their acts together.  I still think a little more professionalism in the presentation, with animations etc., would really improve things
$$\sum_\alpha$$

#### 1 comment:

1. Thanks again! I hope you continue with your analysis of the classes, your clarifications and explanations they are really useful.