April 2015


Reading more of Edwards’s book on Likelihood. He’s a very clear writer, so it’s super easy to follow along. He was a geneticist so it’s less heavy on the underlying foundational theory and bigger on the examples and practicality. It’s actually quite insightful, even compared to Royall’s book on the same topic. That may be because Royall is much more polemic than Edwards, but either way I feel like I’m learning a lot. Also reading Jackman’s Bayesian analysis book. He has a familiar writing style to me. Very conversational, which to me makes his arguments easier to follow.

Also- have had 2 reviews on the winnower piece, so I should get started on updating it and incorporating their feedback.


Thinking about what to write for my next blog post. I think it’ll be about looking at likelihoods. They are so important to understand if you want to understand bayesian inference that it just makes sense t0 write a post explaining how they work. I wrote a little R function that graphs likelihoods for the binomial function and spits out likelihood ratios. I want to add a bit that highlights the relevant points on the likelihood function but that might be too hard for now. We’ll see. Here is the code so far:

## Returns the likelihood ratio for p1 over p2
## s is number of successes in n trials.
#p1 is parameter value 1 and p2 is parameter value 2 (for specific comparisons)
LR <- function(s,n,p1=.5,p2=.25){
Ratio <- dbinom(s,n,p1)/dbinom(s,n,p2)
curve((dbinom(s,n,x)/max(dbinom(s,n,x))), xlim=c(0,1), ylab="Likelihood", xlab="Probability of success", las=1)


Worked a little on the structure of the likelihood post. It’s still changing more and more as I think about what to write, so maybe I should just get started writing and restructure later 😛


Got started on the likelihood post. It’s coming along pretty well. I have written a function that now plots the likelihood curve, places dots on the hypotheses of interest, and draws lines from point to point. I stayed up way way way too late working on this R code, but man is it cool as hell. I took some pages from the clean graphs compendium, written by EJ Wagenmakers and Quentin Gronau. Adding the dots was surprisingly easy, but adding the extra lines gave me some hiccups. Eventually I got it though!


More work on the likelihood graph, and read a little more from books by Royall and Edwards. I don’t think I will include information in the post about likelihoods’ error rates or the universal bound, since that will just add more complexity than it is worth. It’s just so hard to explain without adding in a lot of math symbols that I don’t yet know how to insert to my blog. Also, I don’t know how to write a function for calculating the error rates so it really doesn’t help anyone do anything. Once I can figure out the function I may post and share it, but with so few people actually using pure likelihoods it might not be worth the trouble. Maybe once my job ends next month and I have loads of extra time I can jump in and figure it out. For now I’ll keep the post focused on what likelihoods are, why they are inherently relative in nature, and what they look like. My first real blog post in quite a while, it feels good to put something together like this again. It’s a lot of hard work but it is really fun. I’m challenging myself with this R code and writing on a subject that is a little esoteric.


Reading more of Edwards. He is not a fan of Bayesian inference. He does not think it is kosher to place probabilities on hypotheses unless they are based on a physical chance setup. Like mice genotype pairings or some such thing. It’s an interesting thing to read someone who is diametrically opposed to what you believe, but I’m more about bayes factors which are intimately tied to likelihoods.

Also- might have broken my foot today. Not exactly stats diary related but it hurts like hell.


Foot update: Doc said it might be broken. Got some xrays done but didn’t hear results yet. It still hurts like hell, unsurprisingly.

Coursera courses started again this week but I have been a bit busy. I really should get started tomorrow though if I want to have any chance of finishing the assignments on time this week. I’m taking “getting and cleaning data” and “exploratory data analysis. This might be a rough month since I’ll be finishing up my job at the same time but I think I can get this stuff done. I just need to work on time management and going small bits every day. Starting tomorrow…. or thursday…. Okay okay I’ll start tomorrow.


Coursera update: So the exploratory data analysis class starts week 1 off with a peer assignment. Quick start! I’ll likely end up doing most of the work for both classes this week on Saturday because this has been a hectic week so far. Can’t be that hard, the plots look easy enough. It’s all about making time to do the work!


Lots and lots of stuff happening today. Uri Simonsohn posted a blog critiquing default bayes factors. Jeff Rouder and Joe Hilgard wrote a reply. Lot’s of discussion about it on twitter. I am not sure I understand Uri’s critique. Or, no, I understand it but I don’t actually get it. He says default bayes factors are prejudiced against small effects. Bayes factors aren’t prejudiced for or against anything. They take the models that have been specified and they see which is better supported. If you feed in a model that doesn’t make sense then the BF isn’t prejudiced, you just need to specify a model that makes sense. If a cauchy prior with .7 scale parameter (the default) isn’t relevant for your purposes then don’t test it. If it is, then do. I hardly think the bayes factor can be criticized for failing at something that it doesn’t claim to do.

If your model says small effects are not very likely, and you observe a small effect, you get penalized. As you should! You said it has small probability and yet here it is in all of it’s small effect glory. If small effects are relevant to you then assign them higher probability! God I feel like this point is being missed so hard and so often.

Uri also says, “I am not interested in how well data support one elegant distribution.” First of all, you are sort of committing the same fallacy that you criticize in the next line. Data cannot support “one elegant distribution”, in the same way they cannot “support the null” in isolation. They can only support one distribution vis-a-vis another distribution. This is the key! Anyone who says they have supported the null in isolation or the alternative in isolation should stop what they are doing, take a month off to read Royall’s book, and then start over. Although I fear the point of model comparison is so far over their head they could climb the empire state building and it would still be out of reach.

Also- More talk on reproducibility and replication on blogs as well. Simine Vazire replied to one of Sam Schwarzkopf’s recent posts. Brent Donnellan wrote a reply too. Then Sam wrote another reply. Lots and lots of discussion. Daniel Lakens is happy and thankful for his blogging tweeps.


Richard Morey wrote a great post explaining why bayes factors for small effects will look like bayes factors for null effects. It’s extremely clear and I think everyone should read it.

I saw a talk today by Konrad Talmont-Kaminski, and he was asked how philosopher’s can contribute to scientific dialogue (science vs religion, in this case). His main point was that philosophers can emphasize two main things: Learn to analyze arguments, and one must start by understanding the claims made by the other side. Otherwise you risk arguing against strawman arguments or trying to introduce evidence that the other side does not value or consider valid. Very interesting talk, and I think it applied to the current replication dialogue and the current bayes vs. traditional inference dialogue. I think Sam Schwarzkopf would be especially grateful if people tried to understand his claims rather than taking his critiques at surface level.

Similarly, we have people writing blogs and papers in which they criticize bayesian inference for things neither its proponents nor the theory claim to do. It leads to long-winded posts/papers critiquing strawmen “bayesian” arguments. Uri’s post was an example of this. People with seemingly surface level understanding of a particular issue, but are otherwise incredibly intelligent, can make themselves look silly by arguing against points nobody is actually making. And then other people who trust their judgment, because they are actually brilliant in other contexts, take their conclusions at face value and assume they know what they are talking about.


Thinking of taking this month off of Coursera. I need to wrap up a lot of things at my job and I just don’t think I’ll have the time. But once I have days and days available to just study and work on this stuff I can pick it up again.


Daniel Lakens wrote a popular post today about how to efficiently collect data using sequential designs (frequentist and bayesian). I liked the post. I do wish people would stop simulating from “true” values and calling misleading BFs errors, but a small price to pay if people are at least talking about them. Also, a minor qualm I had is that I wish instead of plotting type 1 & 2 error rates he plotted the full curves. I think that plotting full curves really shows how important minimizing probabilities of obtaining weak evidence really is. If you plot the first look, d = 0 curve, then you’ll see a huge mound of probability in the anecdotal range. You don’t really see it that clearly from binning BF < .3 and BF > 3.


Talking about making errors again with likelihood-based measures of evidence (BFs included). We are measuring evidence. We are not making choices of which 1 is true or false. We are collecting data, looking at the data, and seeing which hypothesis is better supported. If we get misleading evidence here we still have not made an error so long as we are correctly appraising the strength of the evidence. If we get weak/inconclusive evidence we are not making an error. If H1 predicts observation x with probability .5, and H2 predicts observation x with probability .5, and x obtains, then x is not evidence for one over the other. If we think this is an error, then we should reevaluate the predictions made by H1 and H2, otherwise our method is behaving just as we should expect it to.


Posted a new blog today. “Understanding Bayes: A Look at the Likelihood”. I’m really proud of this one because I wrote my first real (sharable) R function and had to learn a bit about LaTex to write the equations. I think it turned out really solid.


Lots of nice comments saying they liked the post from yesterday, I hope it can be of use to someone. Dr R seems to take this post as a declaration that p vlaues stink and we should use bayesian inference. And while I do think they stink and I do think we should switch to Bayesian inference, that was not what the post was about at all. This post had a simple goal: Introduce the concept of likelihoods to people who might not have ever learned about it. In order to do bayesian posterior estimation or testing hypotheses with bayes factors one must incorporate likelihoods. This post also gives an R function that will allow people to play around with likelihoods and experience how the curve shape and values of the likelihood ratios change with different data and hypotheses.

He seems to hate alternative hypotheses. He also uses a strange definition of power in his arguments. I don’t quite get it. Power is a pre-experimental concept, useful for planning experiments. If the goal of the experiment is to accept 1 hypotheses or another with minimum risk of making an error, then one should maximize power for fixed alpha and proceed with Neyman-Pearson testing. But in this case one must specify an alternative hypothesis! Hopefully he can point me toward a reference that can clarify what he means, because as of now it makes no sense to me and we are just talking past one another.


Discussion continues on the likelihood post. I don’t like that it devolved into Dr. R’s confused rants about how inappropriate bayesian inference is. But I like to have open discussion on this blog as long as it is civil, and it definitely still fits that mold. It’s hard to engage in that conversation though because I feel like Uli is arguing against straw man arguments. For example,

“equate p with BF and you have a continuous measure of evidence, so there is just no difference between significance testing and Bayesian hypothesis testing”.

What???? The point everyone is making is that you can’t equate them because they are fundamentally different. And then,

The key difference is how BHT [bayesian hypothesis tests] and SST [statistical significance tests] deal with evidence where the null-hypothesis may be true. There is a temptation to use BHT to prove the null-hypothesis to be true, but I think this is impossible.

That was my entire point! Everything is relative! I only said it, oh I don’t know, at least 6 times? “Likelihoods are meaningless in isolation”, “Only by comparing likelihoods do they become interpretable”, “everything we need to know about the support the data lends to one hypothesis vis-a-vis the other”, “Again I want to reiterate that the value of a single likelihood is meaningless in isolation; only in comparing likelihoods do we find meaning.”, “We need to be careful not to make blanket statements about absolute support, such as claiming that the maximum is “strongly supported by the data”.”, “It cannot be said enough that evidence for a hypothesis must be evaluated in consideration with a specific alternative.” When someone doesn’t even read what you write it isn’t worth engaging in discussion.


Lots of good posts coming out. Rasmus Baath wrote a very cool post on the Bayesian bootstrap. Richard Morey gives guidance for reporting and interpreting confidence intervals. Felix Schonbrodt talks about grades of evidence for likelihood ratios (and bayes factors).


Thinking about universal bounds with regard to likelihoods. They are very interesting. Bayes factors do not satisfy the universal bound when considering composite hypotheses according to Royall (2000) and Sanborn & Hill (2014). However, the simulations in Sanborn & Hill are not exactly valid, in my opinion. They go like this: 1) define the two hypotheses under consideration. One is a point null and the other is a composite hypothesis (distributional). 2) Select a true effect size δ that generates the data. 3) Run simulations and tally the amount of times that the experimenter concludes in favor of one hypothesis or the other. Rouder points out that this is not a fair test of bayesian hypotheses and I agree. Here is why: For bayesians, we only have probability distributions that say what we believe about parameters. This means that there is no reason to think of finding the true δ that is underlying the data generation. Bayesians do not condition on any single unknown truth. All we have are our beliefs and the data. Define your beliefs however you will and then conditionally update them based on your data. Assuming a true unknown effect size, and then tallying the “errors” makes no sense. There are no errors, there are only our beliefs and our data and we do what we can with them through bayes theorem.


The likelihood post has gotten nearly 1400 views in 5 days! Wow. I honestly was pretty proud of that post but I didn’t think it would be nearly that popular. Holy moly. What to write next? Maybe one explaining priors and how likelihoods update into posteriors. Hmm but that’s kind of boring. Maybe an introduction to the terminology? That might be more fun. What are the confusing bits? Well there is the difference between prior probabilities/odds and prior distributions over parameters. Likelihood and marginal likelihood. I guess the symbols can get confusing for people not used to them (they were for me!!). Improper priors I guess count. Conjugate distributions would be good. I guess plots of the different distributions could help. Figure 4.12 in Dienes’s book, contrasting precision of different prior distributions. Basically any of the figures in that chapter. One-sided prior distributions vs 1 sided p value tests. Objective priors vs subjective priors.


I was inspired by a twitter conversation to write my latest post about how silly 1-sided p-values are if you try to use them as evidence. Check it.


Richard Morey, Rink Hoekstra, Michael Lee, EJ Wagenmakers and Jeff Rouder distributed their updated CI fallacy manuscript today. SO MUCH BETTER than the first circulation. The didactic style is much more warm and open than before. The first draft felt combative in a way. This one simply lays out the information and then at the end says, “look, bayes is better so use bayes”. I’m paraphrasing of course! But really, I thought this paper was excellent and should be required reading. The part tearing down omega squared confidence intervals was a real treat. Adding in the extra part (and taking out the relevant subsets confusing bits) means more researchers can relate to the material and not feel like Morey and colleagues are simply nitpicking or playing with toy examples. .


I decided to put together a folder with some of my favorite bayesian papers. It doesn’t have all of the classics but I think if someone were looking for a place to start with bayesian inference this folder might help them out. Hopefully publishing companies won’t jump on me for sharing. This is a good faith effort after all.


A big first for my career today: I was invited to by a peer-reviewer! So exciting. I can’t go into details (reviewer confidentiality and all that) but I think I can say that it is for the Journal of Mathematical Psychology. I definitely did not expect that when I woke up this morning but man did it bring my day up when I got the invitation. I feel like I’m a real scientist now contributing in a real way. Not that blogs aren’t real contributions, they certainly are. But there is a sense of legitimacy missing sometimes. But this could be a big boost for my CV, I don’t know anyone who has been an official reviewer before they even start grad school. A proud moment for me, and a chance to be thankful for the people who read my blog and seem to think highly of me. Thanks to all who are reading.


Made a plan to chat with Chris about Mizzou, probably will do that tomorrow. Didn’t have time for stats today (I know, weird) because it was just too too busy. We talked about a new coding rubric for a project at work, so I guess in a twisted way that is stat related. But not really. Boo.


Talked to Chris today about what Mizzou is like, and I really appreciate that he would take an hour to chat on the weekend. Seems like a nice place and a nice department.


Worked on my review a little more. Reviewing is hard! But I will take this as a challenge to meet, as a stepping stone for a start of a career. It does make me feel validated when I can point out errors in a manuscript written by really smart people. 😛


Doing some reading up on bayesian hierarchical models today. Reading the chapter by Rouder, Morey, and Pratte here. I feel like there just isn’t enough time in the world to learn everything I want to learn. My job ends this week though, so I should have plenty of time going forward. I wonder what my next blog will be. I want it to be something fun!


More work on my review today. No figures are labeled!! Very frustrating. Also, Daniel invited me to work on a pretty neat project. Details maybe to come soon, but for now I can say I can see some neat bayesian implementations he could do.


More work on the review. Not much to report. Oh, also put together a nice little folder on Bayesian history here. In case you are interested in the fascinating history of this topic.


Dr. R wrote a blog post today in which he denies the utility of bayes factors. But it is hard to take the arguments seriously when there are many glaring errors about basic terms in bayesian inference. For example, thinking that p-values are a part of bayes theorem. No, that part that looks like a p-value is actually a likelihood, as I explained to him here. Thinking that you can find the probability of the data under the alternative hypothesis by dividing the bayes factor by the p value. Thinking that prior probabilities are parameters to be estimated. No no no. He asked some bayesians to review and comment, but if the fundamentals of the argument are so off, then what is the point? This excerpt should make it clear that the rest is not worth reading. I’ve added emphasis in a few spots.

The problem for estimating the probability that the hypothesis is true given an empirical result depends on three more probabilities that are unrelated to the observed data, namely the probability that the hypothesis is true, P(H0), the probability that the alternative hypothesis is true, p(H1), and the probability that the data would have been observed if the alternative hypothesis is true, p(D|H1) [clearly this conditional is related to the observed data]. One approach to the problem of three unknowns is to use prior knowledge or empirical data to estimate these parameters. However, the problem for many empirical studies is that there is very little reliable a priori information that can be used to estimate these parameters.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s