## Wednesday, October 19, 2011

### Probabilistic Inference

Peter, that network is only familiar if you've read your or Judea Pearl's book.  The presentation doesn't really explain what's going on.  What do you mean by John or Mary "calls"?  For the record, the idea is that these are neighbors who call you at work when your home alarm goes off. This alarm could be caused either by a burglary or by an earthquake.

I don't understand this first question.  You just told us which were which, and you've even labeled them in the picture!  Obviously, you've just changed your definition of what these things mean and haven't told us.

Oh great, he's using different notation from Sebastian.  It would nice to be consistent.

Wait!  that's an outright error!

He says $$P(+b,+j,+m) = \sum_e \sum_a P(+b,+j,+m)$$  What he means is
$$P(+b,+j,+m) = \sum_e \sum_a P(+b,+j,+m,E,A)$$

After the last module being so much better, I'm really disappointed that this module is just as sloppy as the earlier ones.

ha ha!  moving around slips of paper?  Couldn't we use technology there?

What do you mean by "put the nodes together in a different order?"  There has been nothing so far that refers to the order of things in construction. Sebastian just assumed it was causal.  The idea of iterative construction of the network is rather non-intuitive.  We naturally want to build the network causally, so what we're doing here is showing that putting the network in a way that we didn't want to in the first place, results in a bad network structure.  And notice that through all that we didn't speed out our example in the slightest because it was already the best structure.

Quiz 9 is broken.  1 - 0.134 = 0.866 not 0.886.

So sampling will give us an estimate of a joint, but what does that have to do with a Bayes net?  It's presented as if sampling is a way to perform inference on a Bayes net fast, but what he showed was using sampling instead of a Bayes net to answer the same question.

"Sampling has an advantage over inference in that we know a procedure for coming up with at least an approximate value for the joint probability distribution, as opposed to exact inference where the computation may be very complex." What? Isn't a complex computation still a known procedure? He means that sampling get get us an approximate answer in a reasonable amount of time, whereas inference may not return in our lifetime.

In rejection sampling, why don't we just not generate the samples we don't want, rather than generate them and reject them?  That would actually work in the P(W|C) example he has there because C has no parents.   If C had parents, then its value would depend on them, so we can't just generate samples with -w.  See liklihood weighting coming up.

That Gibbs sampling discussion was really cursory.  He describes the process correctly, but doesn't tell us how to use it to answer these conditional questions we're talking about.  The key is if we want to know P(C|+r,+s), S and R are the "evidence" variables, so we don't vary them, and vary only C and W over time.  But wait, don't we have the same likelihood weighting problem that sampling C only reveals the prior?  No, that's what he left out.  We compute a new value for C based on its Markov blanket, in this case P(C|+r,+s).  But wait, that's what we're trying to compute!  yep, Gibbs sampling is no help here.  Not really much of an example is it?

I'm always a little bothered with the Monty Hall problem.  Not because I don't understand it, but because it just isn't related to anything else in the section.  We always present it because it's neat, not because it gives us any insight into AI.  I question our motives.  At least Peter tells us this up front.

As I said before, I'm a bit disappointed with this unit- it was too sloppy, and some important things were glossed over.