Tuesday, December 31, 2013

Calibrated Bayes

Can we, under certain conditions, interpret posterior probabilities in frequentist terms? That is, such that, under controlled simulation settings, 95% of our 95% credible intervals across our probabilistic evaluations turn out to contain the true value, or that 95% of the clades that are inferred to be monophyletic with a posterior probability of 0.95 turn out to be indeed monophyletic ?

If our posterior probabilities have this property, then they are said to be calibrated.

Of course, if we conduct simulations under parameter values drawn from a given prior and reanalyze the dataset thus simulated by Bayesian inference under the same prior, then calibration will obtain, but this is trivial.

A more interesting type of calibration is defined by the idea of conducting Bayesian inference on datasets simulated under arbitrary values of the parameters (but under the true model). Calibration then simply means that credible intervals are true frequentist confidence intervals, and that posterior probabilities are equivalent to local true discovery rates.

Note that there are often several ways to define what we mean by calibration. This is particularly the case for complex hierarchical models, where the limit between the likelihood part of the model, which should be matched between simulation and re-analysis, and the prior part of the model, which should not be simulated from because that would be too easy, can be ambiguous. But then this makes the question even more interesting.

The attitude of many Bayesians is to dismiss the concept of calibration as irrelevant -- probabilities are just not meant for that. Yet, for me, calibrated posterior probabilities represent a worthy prospect. Calibrated Bayes would greatly facilitate the interpretation of probabilistic judgements reported in the scientific literature. It would make it possible to compare the uncertainties reported by Bayesian and non-Bayesian statistical approaches on an equal footing. It would make Bayesian inference look globally more convincing to many scientists. So, in my opinion, we should definitely care about calibration.

In general, posterior credible intervals, and more generally posterior probabilities, are not calibrated. Nevertheless, they can be approximately calibrated under certain conditions. In particular, they posterior credible intervals are asymptotically calibrated under very general conditions. [P.S. I realize that results on asymptotic calibration are valid only for credible intervals on continuous parameters; I think it is not true in other cases].

My current feeling is that, for many practical situations that we currently face in evolutionary biology, and under reasonable priors, posterior probabilities are reasonably well-calibrated. I think that this point tends to be under-estimated, or at least under-explored.

The more general question of the frequentist properties of Bayesian inference is the subject of an immense literature (a good entry point is Bayarri and Berger, 2004). However, most often, the accent is on asymptotic consistency, admissibility, minimaxity, minimization of some frequentist risk, etc, which are mostly properties of point estimates, or at least properties related to how good the estimation is. On the other hand, the calibration of posterior probabilities is a slightly different question, concerned with the fairness of the uncertainty attached to the estimation. This more specific point appears to be a bit less explored or, at least, not so often discussed and reviewed in non-technical papers.

One possible reason for this lack of emphasis on calibration is that, traditionally, Bayesians don't really want to care about it because it is not supposed to be the way you should interpret posterior probabilities. And classical frequentists, if they care about the question, tend to emphasize  where posterior probabilities fail to be well-calibrated. Which is of course an important thing to be aware of, but still, one would also like to know the positive aspects. 

Another problem is that theoretical papers tend to be fairly strict in what they deem to be good calibration properties. Yet, from a more practical standpoint, I think that one would be content with calibration properties that are perhaps not very good, but at least reasonably good.

For instance, I said above that posterior credible intervals are asymptotically calibrated. The asymptotics is in fact relatively weak, as $1/\sqrt{n}$, where $n$ is the size of the dataset, but can be improved by using so-called probability-matching priors (Kass and Wasserman, 1986). In one-dimensional cases, the matching prior turns out to be Jeffreys' prior, but in higher-dimensional settings, things get quite more complicated, and a lot of work has been spent on the question (Datta and Sweeting, 2005). However, all this works sounds disproportionately complicated compared to what we might need in practical situations. In a post-genomic era, one can easily expect to have tens of thousands of sites, thousands of genes, or hundreds of species or individuals to analyze. In this context, using standard diffuse priors on the global parameters of the model (we will probably never implement complicated probability-matching priors anyway) may be sufficient for us to reach effective asymptotic calibration.

After all, in applied frequentist statistics, people often use quick-and-dirty methods for computing confidence intervals or p-values, and I think that everyone is content with that. Most such methods are valid only asymptotically, and even then, they may not be exact. In itself, claiming the right to make the same dirty deals as our frequentist neighbors is not necessarily the best argument. But I guess the most important point here is that approximate confidence measures are good enough for most practical purposes, as long as their meaning, however, is the same across both Bayesian and non-Bayesian statistical activity and can be assessed by objective methods.

Of course, there are certainly also many practical situations where posterior probabilities are not reasonably, not even qualitatively, well-calibrated. But then, we should better know when it is the case, why it is the case, and possibly, what could be done in order to obtain more acceptable posterior probability evaluations by frequentist standards in such situations.


Bayarri, M. J. & Berger, J.O. (2004). The interplay between Bayesian and frequentist analysis. Statistical Science, 19:58-80.

Datta, G. S., & Sweeting, T. J. (2005). Probability matching priors. Handbook of Statistics.

Kass, R. E., & Wasserman, L. (1996). The selection of prior distributions by formal rules. Journal of the American Statistical Association, 91:1343-1370

Monday, December 30, 2013

How to interpret posterior probabilities?

What does it mean, for instance, to report a posterior probability (pp) of 0.87 for the monophyly of a given group of species (say Archaea)? Intuitively, a stronger support than 0.82 and a weaker support than 0.95. But what does this "0.87" mean in the absolute? 

How to interpret a 95% credible interval of, say, 90 to 110 Myr for the age of the last common ancestor of placental mammals ? 

The subjective Bayes answer is that subjective probabilities make sense only if associated with utilities. In the rather asbtract situation of phylogenetic inference, utilities are difficult to define. Strictly speaking, a pp of 0.87 for the monophyly of Archaea would mean, for instance, that you are ready to bet up to 87 euros, given that you will earn 100 euros if Archaea turn out to be monophyletic and 0 otherwise. But this interpretation sounds a bit silly.

Another possible way to grasp the meaning of subjective probabilities is to imagine other less abstract situations to get a feeling of what they mean in these new contexts and then transfer your perception of the implied weight of evidence back into the initial context. A pp of 0.87 should be interpreted as a similar strength of support in any situation, a pp of 0.87 for the monophyly of Archaea means the same strength of support as a pp of 0.87 in favor of the monophyly of a completely different group in a completely unrelated study (say, mammals), or even, as a pp of 0.87 in the favor of the presence of positive selection in a given gene, or as a pp of 0.87 that a given gene is associated with hypertension in a genome-wide association study.

So, then, let us suppose that you make a Bayesian statistical analysis to assess whether a given gene is associated with hypertension. This is a preliminary analysis, upon which you could decide to further investigate the case, using now experimental methods, but this requires you to invest time and money. If you estimate that your loss if you investigate the gene but then it turns out that the gene is not associated with hypertension is about 10 times greater than your loss if you don't investigate but then miss an important gene that turns out to be associated with hypertension, then, rationally, you should further investigate only if the posterior odds are 10:1 (or equivalently, pp = 0.91). A posterior probability of 0.99 would mean that you should basically consider a false positive at least 99 times more expensive than a false negative, etc.

By considering such hypothetical cases (and, I guess, ideally, by practising Bayesian decision making in real life), you progressively calibrate your mind to Bayesian probabilities (and utilities).

All this sounds a bit abstract and introspective, but after all, the semantics of p-values is also fairly hypothetical. I guess that, apart from their nominal frequentist calibration, the essential property of p-values is that they have the same meaning in all situations, and thus we can progressively calibrate our perception of the strength of evidence associated with p-values through our practical experience with using them.

Similarly, Bayesian probabilities are supposed to have a homogeneous meaning, and thus, as you are getting more experienced with the Bayesian paradigm, you will progressively get a good feeling of what strength of evidence a given pp is supposed to imply.

There is a ghost, however, looming over all this discussion: can we, at least under certain conditions, interpret posterior probabilities in frequentist terms ?

Saturday, December 28, 2013

Maximum marginal likelihood

In my last post, I mentioned the possibility of comparing alternative diversification models, conceived of as alternative priors on divergence times in a molecular dating context, using Bayes factors.

As an alternative to Bayes factors, one could integrate the likelihood over rates and times and maximize the resulting integrated (marginal) likelihood as a function of the diversification parameters (e.g. speciation and extinction rates, let us call them $\theta$). We can then use standard likelihood ratio tests to compare diversification models. Numerically maximizing and calculating the likelihood is a bit tricky, but this is just a technical problem, not a conceptual one.

Finally, for a given diversification model, and once an estimate $\hat \theta$ of its parameters has been obtained by maximum likelihood, divergence times can be inferred based on the plug-in posterior distribution, i.e. the posterior distribution on divergence times obtained by fixing the diversification parameters at $\hat \theta$.

This type of approach is sometimes called maximum marginal likelihood, or ML-II in the literature (Berger, 1985). It may also be called empirical Bayes, thus emphasizing the fact that the estimation of divergence times is still Bayesian in spirit, although now based on an empirically determined plug-in prior. However, for me, "empirical Bayes" has a more general meaning -- after all, the prior on divergence times is also empirically determined in the fully Bayesian approach. For that reason, I prefer the "maximum marginal likelihood" terminology.

Maximum marginal likelihood or full Bayes, which is best ? I am not really sure.

Conceptually speaking, the maximum marginal likelihood approach does not invoke any second stage prior on $\theta$, which can be seen as an advantage if one is not too sure about the choice of the prior nor about its philosophical meaning (subjective? uninformative?, etc). Maximum marginal likelihood may also be better for comparing models, as Bayes factors tend to be sensitive to the prior.

Computationally speaking, on the other hand, I find it easier to put a prior on $\theta$, because this makes the MCMC easier to implement.

Finally, there is a classical argument saying that the fully Bayesian approach has the advantage of integrating uncertainty about $\theta$. However, this again depends on the second-stage prior, a point that certainly requires further discussion, but that's for another day.


As for now, we can draw an interesting parallel here. Indeed, all this looks very much like another likelihood approach used in population genetics for estimating the scaled mutation rate (let's call it again $\theta = 4 N_e u$) using intra-specific sequence data (Kuhner et al, 1995). In that case, the sequence data are assumed to be generated by a mutation process along a genealogy, which is itself described by Kingman's coalescent. The probability of the sequence data given the genealogy is integrated over the coalescent and then maximized as a function of the scaled mutation rate $\theta$ (all this by Monte Carlo).

Here also, once $\theta$ has been estimated, one can infer the age of the ancestor of the sample by calculating the plug-in posterior distribution over this age (Thomson et al, 2000). Here also, instead of maximizing with respect to $\theta$, one can decide to put a second-stage prior on $\theta$ and then sample from the joint posterior over $\theta$ and the coalescence times (this is what is done in Beast, for instance, Drummond and Rambaut, 2007).

Altogether, there is a very close parallel between the two situations:
species / individuals
substitutions / mutations
phylogeny / genealogy
prior on divergence times / coalescent
diversification parameters / scaled mutation rate
divergence time of the last common ancestor / age of the ancestor of the sample.

The hierarchical structure of the two models is virtually the same. In both cases, we can either maximize the marginal likelihood with respect to $\theta$, or put a second-stage prior on $\theta$ and sample from the joint posterior over $\theta$ and divergence or coalescence times.

I find this interesting because the two situations are so similar, and yet we tend to consider them as essentially different. In particular, we call the birth-death process a prior on divergence times, and we consider divergence times as parameters of the model. On the other hand, coalescence times are not usually considered as parameters, and I doubt that we would call the coalescent a prior.

I do not think that the status of the intermediate level of the model (the distribution over divergence or coalescence times) should depend on wether we use a maximum likelihood or a Bayesian method for estimating the upper level of the model (the parameter $\theta$). Instead, I believe that these distinctions are merely cultural. They are just a historical accident: one of the two methods is originally a Bayesian approach, which could be modified and turned into a maximum marginal likelihood procedure. The other is a frequentist maximum likelihood method that has only secondarily been endowed with a second-stage prior and implemented in a Bayesian software program.

In any case, all this shows that, in everyday life, there is much more overlap between Bayesian inference and maximum likelihood than the traditional vision in terms of a fundamental opposition between frequentists versus subjectivists would seem to suggest. At the end, the differences are rather subtle.

Finally, this little comparison allows me to anticipate on another problem that will be the subject of a future post: how do you determine the number of parameters of a model?

Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis (1985 ed.). New-York: Springer-Verlag.

Drummond, A. J., & Rambaut, A. (2007). BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology, 7, 214.

Kuhner, M. K., Yamato, J., & Felsenstein, J. (1995). Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics, 140(4), 1421–1430.

Thomson, R., Pritchard, J. K., Shen, P., Oefner, P. J., & Feldman, M. W. (2000). Recent common ancestry of human Y chromosomes: evidence from DNA sequence data. Proceedings of the National Academy of Sciences of the United States of America, 97(13), 7360–7365.

Friday, December 27, 2013

Two sides of the same coin

One of the things that people want to do with phylogenies is to estimate parameters or test hypotheses about species diversification processes (Nee et al, 1992). The idea has been revisited recently, for testing for the presence of diversity dependence (Etienne et al 2012), or for estimating variation in speciation or extinction rates over time (Morlon et al, 2011, Stadler 2011).

In order to study diversification processes, however, one should first estimate a time-calibrated phylogeny.

Time-calibrated phylogenies are usually estimated using Bayesian methods.

However, a Bayesian method requires a prior on divergence times. The choice of the prior on divergence times has a strong impact on divergence times estimation. It will therefore potentially also have a strong impact on the outcome of the test of alternative diversification processes.

More fundamentally, some priors on divergence times have themselves an interpretation in terms of an underlying diversification process: the birth-death prior (Yang and Rannala, 1996), for instance, amounts to assuming constant speciation and extinction rates. Thus, testing diversification models based on a time-calibrated phylogeny that has itself been estimated assuming a given diversification model is either circular (if the two diversification models are identical) or contradictory (if the models are different).

So, what should we do ?

Obviously, what we could do here is, use the alternative diversification models that we want to test as alternative priors on divergence times. We can then compare the resulting alternative models, for instance using Bayes factors. By doing so, we will simultaneously (1) integrate the uncertainty about divergence times in our comparison of alternative diversification models (while avoiding the circularity issues mentioned above) and (2) infer divergence times under several diversification models, typically deciding to keep those obtained under the best fitting model.

Priors which can be interpreted in terms of macro-evolutionary processes are what I would call mechanistic priors. They are not meant to be uninformative priors, like the uniform prior on divergence times (although I am not totally sure that the uniform prior on divergence times is really uninformative), nor subjective priors.

Trying to derive mechanistic priors is, I think, an interesting and constructive answer to prior sensitivity issues. It is a risky business, because mechanistic priors tend to make strong assumptions and therefore potentially lack robustness. On the other hand, doing this is potentially more insightful in the long term.

Also, it naturally leads to an integration of different levels of macro-evolutionary studies. In the present case, what we obtain is an elegant statistical formalization of the idea that molecular dating and diversification studies are in fact two sides of the same coin.


Etienne, R. S., Haegeman, B., Stadler, T., Aze, T., Pearson, P. N., Purvis, A., & Phillimore, A. B. (2012). Diversity-dependence brings molecular phylogenies closer to agreement with the fossil record. Proceedings of the Royal Society B: Biological Sciences, 279:1300–1309.

Morlon, H., Parsons, T. L., & Plotkin, J. B. (2011). Reconciling molecular phylogenies with the fossil record. Proceedings of the National Academy of Sciences, 108:16327–16332.

Nee, S., Mooers, A. O., & Harvey, P. H. (1992). Tempo and mode of evolution revealed from molecular phylogenies. Proceedings of the National Academy of Sciences of the United States of America, 89:8322–8326.

Rannala, B., & Yang, Z. (1996). Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. Journal of Molecular Evolution, 43:304–311.

Stadler, T. (2011). Mammalian phylogeny reveals recent diversification rate shifts. Proceedings of the National Academy of Sciences, 108:6187–6192.

Thursday, December 26, 2013

The Bayesian agent as a heuristic model

The commonly accepted justification of Bayesian inference, in terms of subjective personal probabilities, coherentist arguments, dutch books and all that, is mostly due to de Finetti, Savage and Lindley (see e.g. Goldstein, 2006 for a review).

The subjective Bayesian synthesis represents an impressive theoretical achievement. Yet I think it is dangerous to see it as providing foundations for Bayesian applied statistics. Such a foundational perspective, which amounts to embrace the subjective Bayesian paradigm a bit too wholeheartedly, corresponds to the kind of ideological attitude I was alluding to in the previous post (in fact, perhaps any foundational argument is ideological).

Rather, I prefer to see the subjective Bayesian theoretical synthesis as a model. By this I mean a prescriptive logical model of how a rational agent can conduct inference and make reasonable decisions in the presence of uncertainty. By identification with the rational agent, we can then use this model as a heuristic tool for us to figure out how, as a gambler, as an economic agent or even as a scientist doing fundamental research, we could, well, conduct inference and make reasonable decisions in the presence of uncertainty.

However, in spite of all its theoretical and axiomatic justifications, this is just a model, with all the idealization and the pragmatic considerations that come with it. In fact, the idealization becomes more than apparent when comes the time to define our priors and utilities: there, we immediately realize how much using the Bayesian paradigm in practical situations requires a fair dose of pragmatism.

A heuristic model is like a tool in the hands of a craftsman: something that you recruit temporarily for a specific task and that you use concomitantly with other tools. Something that requires some external know-how, some sense of the context.

The word 'heuristic' is also important here. The heuristic power of the subjective Bayesian agent model is great, making it possible to derive fairly sophisticated probabilistic models formalizing and addressing delicate scientific questions. However, again, it's only a heuristic tool. In a second step, one should perhaps complement this tool with additional arguments to back up what we have done.

Complementing the heuristics with additional arguments is even more important in the context of scientific research, for the following reason. Subjective Bayesian inference is supposed to be a model of a rational agent conducting inference and making private decisions in the presence of uncertainty. Using it to formalize, for instance, one's private gambling activity (a bit like Nate Silver, 2012) is relatively unproblematic: as long as it's all about my money, I don't need to give anyone any justification of why I used this or that particular prior or utility.

When we write a scientific article, however, we enter a new logical context: that of public scientific reporting. In this context, personal degrees of belief have no legitimate existence. I am not supposed to report on how much I am personally willing to bet on the monophyly of a clade (it would be idle anyway: we won't be able to determine whether or not I won my bet). Instead, I am supposed to give some meaningful and objective measure of the statistical support in favor of the monophyly of the group of interest.

This last point gives us a hint at what kind of argument we should develop in order to back up our Bayesian heuristics. Fundamentally, the only way to attach a clear, objective meaning to a statistical procedure is to refer to its operational properties (i.e. to how it behaves in practice under controlled conditions). And I have the impression that, in a statistical context, operational somehow means: frequentist.

All this is not new. For instance Rubin (1984) emphasizes the need to complement the classical Bayesian procedure with other non-strictly Bayesian tools (like posterior predictive checks). See also Cox (2006), suggesting that Bayesian procedures, "if formulated carefully, [..] may provide a convenient algorithm for producing procedures that may have very good frequentist properties".

Yet, from various recent discussions and readings, I have got the impression that we are still too often stuck in an "either frequentist or subjectivist" dichotomy, which suggests that these questions are worth revisiting.


Cox D., 2006. Principles of Statistical Inference.

Goldstein M., 2006, Subjective Bayesian data analysis: principles and practice. Bayesian analysis, 3:403-420

Silver N., 2012. The signal and the noise: why so many predictions fail -- but some don't. Penguin books. USA.

Rubin D.B., 1984. Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12:1151-1172.

Wednesday, December 25, 2013

Breathing some fresh air outside of the Bayesian church

In a recent post, Larry Wassermann asks whether Bayesian inference is a religion. He goes on suggesting that, in itself, Bayesian inference is not a religion. However, he says, there is a minority of Bayesian people who tend to behave as if they were belonging to a sect. Apart from the fact that they tend to be very cliquish and aggressive, they obviously consider their statistical paradigm as absolute truth and are unwilling to entertain the idea that Bayesian inference might have flaws.

I have good reasons to agree with Larry Wassermann on what he says: simply because I used to be one of those "thin-skinned, die-hard" Bayesians not so long ago. Since then, I have lost my faith, in part because of all the dirty things I have seen (and done!) over several years of applied Bayesian data analysis, but also because, on a more philosophical front, I have become more pragmatic and more relativistic, even for logical matters. To be clear: I still find Bayesian data analysis useful in practice, and I will not stop using it before long. But it's just that I do not anymore believe in Bayesian inference as a system.

In any case, because I used to be one of those believers in the Bayesian Truth, I feel entitled to add a few words here. Unlike Larry Wassermann who seems to believe that the partisan attiude of a minority of Bayesians has nothing to do with the content of the theory, I personally think that there is no smoke without fire. If some Bayesians tend to behave as if Bayesian inference were a religion (whereas non-Bayesian statisticians rarely do that with their own paradigm), perhaps this is because the philosophical underpinnings of Bayesian inference somehow gives them a predisposition to behave that way.

And indeed, Bayesian inference often claims to be a coherent and complete theory of plausible reasoning. As such, if taken literally, it works like a closed system of thought.

Just look at how it is supposed to work. First, you have a prior, which you do not freely invent, but which you find out by introspection. Second, your mind, inasmuch as it is rational, is compelled to follow the laws of probability as its only guide for rational thinking in the presence of uncertainty. Any new empirical observation automatically triggers an update of your probabilities according to Bayes' rule. Third, given your posterior probabilities and your utilities (which you also found out by introspection), you choose the option that will maximize your posterior expected utility.

Altogether, a Bayesian does not seem to have much opportunity to think outside of the Bayesian box. Instead, at all steps, he is supposed to stay within the logical boundaries defined by his paradigm. As van Fraasen (1989) puts it, by doing so, he will "live a happy and useful life by conscientiously updating the opinions gained at his mother's knees, in response to his own experience thereafter" .

In this sense, Bayesian inference is, perhaps not exactly a religion, but at least something like an ideology: something that is meant to take control of your rational mind, without leaving any room for you to think outside of the paradigm.

The frequentist school has a very different, and much less invasive, philosophical perspective on the question of the relation between the formalism and the state of mind of the statistician. This is very clear, for instance, in Neyman's writings, when he develops his idea of behavioral induction, as opposed to inductive reasoning. In his own words, "to accept a hypothesis H means only to decide to take action A rather than action B. This does not mean that we necessarily believe that the hypothesis H is true". In other words, the aim is to develop statistical decision procedures that have well-defined operational characteristics, not to tell you what you are supposed to believe.

I think I prefer this more agnostic philosophical stance. We need some space between our logical formalisms and our personal thoughts, some space for us to breathe. The current orthodox interpretation of Bayesian inference does not really allow for that.

Of course, all this is true only if you take the standard view literally. In practice, there are much more open and more pragmatic stances with respect to the Bayesian statistical formalism (see in particular the view often expressed by Andrew Gelman, e.g. Gelman and Shalizi, 2012). In real life, most applied Bayesian statisticians do take some freedom and regularly allow themselves for some fresh air and some free-thinking outside of the system. That's also what I have done over the years, and it is probably the only way for us to arrive at sensible results anyway. But then, our philosophical stance should reflect the possibility of doing so. Otherwise, we maintain ourselves in an uncomfortable state of cognitive dissonance, between what we do in practice and what we say we do. 

More fundamentally, by maintaining some distance between the formalism and our thoughts and, more generally, by questioning the commonly accepted view(s) of Bayesian inference, I am sure that we will gain some more interesting insights about its practical meaning.


Gelman and Shalizi, 2012, Philosophy and the practice of Bayesian statistics, British Journal of Mathematical and Statistical Psychology, 66:8-38.

Neyman J., 1950. First course in Probability and Statistics, New York, Holt.

van Fraasen B., 1989, Laws and Symmetry, p. 178.

Tuesday, December 24, 2013

Let's give it a second thought

This is my first attempt at blogging.

After ten years of applied Bayesian work in phylogenetics and in evolutionary genetics, I feel the need to step back and re-think the whole thing.

Undoubtedly, since its introduction in phylogenetics in the late 90's, Bayesian inference has become an essential part of current applied statistical work in evolutionary sciences. The success of Bayesian inference has several causes: computational (the possibilities offered by Monte Carlo), but also conceptual: personally, I would tend to emphasize the flexibility and the modularity offered by the Bayesian paradigm for designing more complex models.

On the other hand, there are still many problems, computational, theoretical and even foundational. There is of course the usual question of the choice of the prior, but there are many other open ends. Just think about how to conduct model comparison, selection and test: to me, this represents a particularly problematic aspect of current Bayesian applied work.

On a more foundational note, although the question has been discussed at length in the phylogenetic literature of the early 2000's, it is still not quite clear how we are supposed to interpret posterior probabilities in practice. It is not even clear in what respect much of what we are currently doing in evolutionary sciences really conforms to the 'official' Bayesian subjectivistic philosophy.

Finally, we tend to oppose frequentist and Bayesian inference and to regard them as two mutually incompatible paradigms. However, in practice, there appears to be quite some overlap between the two approaches. Some keywords here: empirical Bayes, statistical decision theory, calibration. The empirical Bayes paradigm, in particular, offers an interesting synthesis, which I would like to explore in depth in future posts.

Thus, my dear Fellows, Bayesian cooks or distinguished Frequentists alike, please keep an eye on this blog and do not hesitate to join the discussion.