Going to visit Irvine this summer! Very exciting stuff. I’m going during the Math Psych conference, so I hopefully can meet all the cool bayesians who will be there from around the world. I’ve never been to California before, so I may go explore on my own for a week or two before the conference starts. I’m going to try to go hike the national parks and check out the beach 🙂 Flights out that way from Austin are cheap, especially if I can choose whatever days I want to go out and come back.
This piece by Terry Burnham (here) has been going around twitter. A result a while back showed that harder to read questions led people to score higher on tests. Quick bits:
The original paper reached its conclusions based on the test scores of 40 people. In our paper, we analyze a total of over 7,000 people by looking at the original study and 16 additional studies.
Roughly 3 years later, Andrew Meyer, Shane Frederick, and 8 other authors (including me) have published a paper that argues the hard-to-read presentation does not lead to higher performance.
It highlights the real world importance of the replication movement. Policy actions were influenced by small sample, unreplicable research. We have a responsibility as researchers to provide evidence for our hypotheses that is strong if it could possibly be influencing children’s lives.
Topic of estimation vs model comparison came up on twitter today. I like model comparison personally, but they sort of do different things. And you run into all sorts of trouble if you try to use intervals for inference! I’ll have to write a longer explanation to give Frederik an example of where trouble brews.
So much stats I need to read. So much to talk about on twitter. So much to write about! All of it bayesian.
Finally ordered some books I’ve been wanting. Kruschke’s second edition DBDA, Lee & Wagenmakers’s Bayesian Cognitive Models, and McGrayne’s theory that would not die book. Have read snippets of some of these before but it will be good to have them in hand for the summer of bayes! Also did some serious work on my review, it’s nearly done! Woot woot.
The summer reading list is ready. Check out my Brain Bayes Factors page to see it! I plan on rotating through each book chapter by chapter. First goal is to finish Barber and Edwards’s books, and then transition into full bayes mode with Lee & Wagenmaker, Jaynes, and Kruschke. This summer I should have time to write reviews for the the books in the priors section.
Dr. R posed this question today on twitter:
Q1. What is the Bayes-Factor for 9/10 sign. results with p < .05
(A) .10, (B) 1 (C) 10 (D) 100 (E) > 1 billion
Now if you know bayes you know this question makes no sense. And if not you can read the replies to that tweet to see why. Eventually he makes the analogy of significance tests as coin flips, but even that leads to complications.
The downside of bayesian stats is that one has to actually think. Eventually Dr. R will realize this, and he wills see why many of his one-off criticisms are misguided. As Edwards, Lindman, & Savage (1963) write:
The Bayesian approach is a common sense approach. It is simply a set of techniques for orderly expression and revision of your opinions with due regard for internal consistency among their various aspects and for the data. Naturally, then, much that Bayesians say about inference from data has been said before by experienced, intuitive, sophisticated empirical scientists and statisticians. In fact, when a Bayesian procedure violates your intuition, reflection is likely to show the procedure to have been incorrectly applied.
Rolf Zwaan posted an interesting blog today (go read it). There was also a rather lengthy twitter conversation. And then there was another one that had me chiming in. Interestingly, Rolf decided to do sequential testing using Frick’s method that has never really been used. (Probably for a reason.) Oddly enough, Rolf reports both Bayes factors and sequential (unadjusted) p-values at each of his data collection batches (perhaps for education purposes?) but does not report effect sizes at every batch. The first and last batch have nearly identical effect size (9% vs 10%). His evidence starts off relatively strong for the null and then gradually walks its way over to about the same strength evidence for the alternative.
This is a textbook case of what Richard Morey explained a little while ago. Almost a perfect example. It’s uncanny. When sample sizes are small, small observed effects support the null as they should. As sample size increases and the effect stays the same, the evidence gradually becomes divergent from the null and supports the alternative. Rolf implied that he was hesitant to put much stock in the bayes factor during his interim analyses, but surely had they come out differently they wouldn’t even be interim. Had his data come out different (read: strong for the alternative) he would have stopped data collection and simply taken them as they were. Seems like Rolf has a bias here. “Take bayes factors seriously when it’s convenient for you” is not a real philosophy, is it?
[Rolf and I chatted in the comments of his post and came to an understanding. I don’t think he has a bias anymore!]
Will it ever stop? (this) I’ll say it again for I don’t know how many times now. People who are rather intelligent and yet acquire a shallow, surface level knowledge of something and begin to start fighting against it against people who actually understand it are doomed to make a fool of themselves.
Dr. R wrote another long-winded post today (here) in which he argues that Bayesians have been calculating Bayes factors incorrectly. He makes the mistake of thinking that one can simply multiply a set of Bayes factors from independent studies to obtain a cumulative Bayes factor. Will this ever stop? Does he really think that he has found some fatal error in Bayesian inference that invalidates all of the work these people have done? Come on. It’s just sad to see him wasting all of this effort. See the entry from yesterday again, I’m tired of writing it over and over.
Also- Turned in my review today. Signed it! OPEN SCIENCE. Knowing that I would be signing my review made me adjust my language before submission. Reduce the snark, mainly. It is liberating to know that I’ve started my career off on a good foot, without feeling the need to hide behind anonymity. I think I will do this for every review that I can. It really did make me rethink what kind of writing I wanted to have attached to my name. I definitely took longer than most people probably take, but give me a break it’s my first one. 😛
Reading Senn’s (2011) paper, “You may believe you are a bayesian but you are probably wrong.” Interesting stuff. I think he poses an interesting question. To be a true bayesian, one must be okay having the same posterior distribution for a parameter when they’ve used a beta (40,10) prior with binomial data (10,40) as when they have a beta (25,25) prior with binomial data (25,25). That is to say, one must be perfectly happy ending up with the same posterior distribution coming from wildly different priors and wildly different data. But why is this a problem? If the data serve to shape our beliefs, then the fact that different beliefs have been shaped by different data into the same final belief is not that weird. In fact, we are operating as expected!
Richard Morey nails it in this comment. If you think you can read a few papers by psychologists and completely understand Bayesian inference you’re wildly mistaken. In fact, anyone studying this stuff should especially realize how little they know after reading psych papers because the papers necessarily gloss over details in order to reach a broader audience.
Also- See the entry for 5/8 again.
Soooo my comments mysteriously “disappeared” from Dr. R’s latest post… How nice. Let’s all start deleting every comment on our blogs that we don’t like. Oh well, hopefully he got the message.
Also- book progress is being made. I finished chapter 5 of Edwards’s book but can’t read any more because I don’t have enough math under my belt yet to follow along. I’ll come back to it later when I’m smarter 🙂 I’ve also read 2 chapters each of Kruschke’s book and Lee & Wagenmakers’s book. Pretty good so far. I think my main challenge with the Lee & Wagenmakers book is going to be using JAGS instead of WinBUGS. Shouldn’t be too bad though, and it will be a good learning experience! [5/12 edit-just noticed the website provides JAGS code too. Nice!]
Working on writing a shiny app for my next blog post. It should be a fun learning experience! There are shiny lessons here that seem fairly comprehensive. Luckily I have Fabian’s shiny app from my likelihood post to work from as a starting point. Made it through 4 lessons so far. Pretty awesome! Lots to customize.
Jeff had an interesting project idea. An app that will let you build free-form priors and then fit a spline and then calculate a bayes factor. Very cool, I’ll have to think about how I could do that.
Spent most of my day working in R. Super fun stuff! I’m writing the programs for my next blog post, and I think they’ll be pretty awesome. I’ve pretty much finished the code for the beta priors and binomial data, and now I’m going to figure out how to do the normals. The normals will be much harder I think, but now I have a basic structure to work off of already so that will help a lot. Then once I have the normals code written and working I need to port it all to shiny. I had considered doing it all in shiny from the start, but I don’t think that would be the most efficient way for me to get it working. I think getting the code to run and produce plots is first priority, and then port it over and make the user interface.
I still feel like I’m pretty slow at writing these scripts, but it’s a learning experience 🙂 Once I have more practice with this it will probably become 2-3 times faster. But then again, they will probably become more complicated so they may be just as slow 😛
So John Kruschke and Torrin Liddell posted a paper to SSRN titled, “The Bayesian New Statistics: Two Historical Trends Converge”. Kruschke also posted an excerpt to his blog so that readers could comment. My general impression with the paper was: meh, nothing new here. But I don’t think that was the point, so perhaps my reaction isn’t quite fair.
The paper goes like this: characterize some history, layout the problems with classical statistics, and then highlight the pros of bayesian estimation. Everything you want to know is in the posterior. Cool examples from hierarchical modeling. Then a quick review of bayesian hypothesis testing. “Unfortunately, the general problems with hypothesis testing discussed in the beginning of the article apply both to NHST and to Bayesian hypothesis testing”. Ends by appealing to estimation thinking.
So here’s my impression. I think the intro and general takedown of classical methods was solid, except for the definitions of confidence intervals. The way they describe it (all values inside 95%CI are not rejected at .05 level) is only one possible type of CI. I always think of the example from Richard Morey (paraphrasing), “If I draw an integer at random from 1-20, and any time it is 1-19 my CI is the entire real line and if it is 20 then it is 0, this has coverage of .95 by definition. But it is not useful.” I think they do a great job highlighting the benefits of estimators (the analysts, not the technical term) going fully bayesian. The review of bayes factors, while generally negative, is more of the “of course, that’s the point” kind of things I’m getting tired of. Here are some examples:
In other words, the value of [the BF] can change dramatically depending on the particular form of the alternative prior distribution… It is therefore crucial to use a theoretically meaningful and informed prior distribution for the alternative hypothesis.
Second, the BF does not indicate the posterior odds, and users must remember to take into account the prior odds of the hypotheses.
It is important to use an alternative prior distribution that expresses a theoretically meaningful distribution that previous data could have generated. … It is important to incorporate the prior probabilities of the hypotheses and not rely only on the Bayes factor to make decisions. Perhaps most importantly, do not make a black-and-white decision your goal and ending; also estimate the magnitude and uncertainty of the parameters…
I agree with that stuff. Surely I’m not the only one that all of that is obvious to? The piece of commentary on bayes factors is less of a criticism and more of a reminder not to use them like an idiot.
This has become a nice little twitter thread. Do researchers need to make decisions? Should researchers be defining utilities to decide what studies to run? They certainly could be doing full decision analysis, but I imagine right now people are deciding based on mostly intuition. And that makes sense to me. But these actions or decisions are, as EJ implies, personal for the researcher and their program. Should that be the focus of the analysis (as in Neyman-Pearson testing)? Other people make other decisions after reading your study, and to do that all they really need to know about your individual study is the evidence that came out of it.
When reporting and interpreting results from your study, nobody really cares if you personally are planning to do more studies after this one. Nobody really cares if you are weighing the costs/benefits of a conceptual replication, or if you are thinking another study might improve the probability your paper gets published or get you another grant. Or whatever. The only thing that matters to anyone else is the evidence they have now so they can use it as they please.
If people want to make decisions based on your evidence, then by all means they should do as they like. There are about infinity ways they might choose to do that. The only way to actually interpret data as evidence correctly is by bayesian methods, so we should all be reporting them in our results section. 😉 Feel free to make decisions about running studies (or about taking less showers to avoid being lonely, or putting a box on your desk to promote creative thinking) however you please, there’s no wrong way to do that. I personally have never made a decision using a full Bayesian analysis. I’ve run most studies because someone told me to run them 🙂
If you want to include a section in your paper’s discussion with a decision analysis that other researchers might use to weigh if they want to jump in, then that would actually be really cool. And really useful. I don’t know if I’ve ever actually seen that done. Perhaps more people should do that! But this personal decision plan doesn’t have any bearing on the evidence you’ve accumulated so far, and that’s really what I think other people are interested in when they read your paper.
Also- started reading “The theory that would not die.” It’s pretty good. Very well written, and I like the writing style a lot. Very interesting to read the preface and see that the author (McGrayne) was in close contact with many prominent bayesians throughout the book’s production. For me that lends a good bit of credibility to a book by an author I haven’t read before.
Had a good bit of time to read in the car today, so now I’m about 1/3 of the way through McGrayne’s book. It’s fascinating stuff! We’ve made it through Bayes, Laplace, Turing, Good, Bailey, Savage, Fisher, Neyman, and on to Lindley. This book is absolute gold. Anyone interested in this topic should read it to gain an appreciation for how hard bayesians used to have it.
Also- Add another entry to the “That’s the whole point!!” bin.
I came up with a better way to scale the likelihood in my binomial data calculator I’m building. Instead of scaling the maximum to match the posterior maximum, I’m going to instead just plot the likelihood as a posterior distribution that started as a beta(1,1) prior. This means the likelihood will effectively be replaced by the beta(1+k,1+n-k) posterior. This can work because the posterior takes the exact shape of the likelihood using a beta(1,1), so it provides a natural reference scale. Effectively, “This is the result if you had used a uniform prior and ‘let the data speak for itself'”, so to speak. Jeff suggested to use the marginal likelihood, as that will provide the natural updating factor. I think that’s a neat idea actually, I’ll have to think about how to actually do it though! Still learning as I go. Hopefully I can start the normal data calculator soon; that’s where the real fun will be.
I’m a little over halfway through McGrayne’s book. Still loving it.
Also- worked on my program a bit, and I’ve added the normal (2 sided prior) program. Now I think I am ready to port to shiny and write a blog. Then I can keep adding features like uniform prior, half-normal, cauchy prior, etc.
There was a big big retraction that just came out. I’ll repost my fb comment here:
You may have heard of the study from last year where people would have a conversation and interact with a gay canvasser (a person running the study), and in turn had extremely positive shifts in attitudes towards gay culture and gay marriage. It was recently covered on This American Life.
Turns out it was all faked. All of it. One of the authors just made up all of the data. The report is damning, and the retraction memo on the last page is quite clear:
Devastating. This was an uplifting study, but now it seems so obvious that it was too good to be true (huge, long lasting attitude shifts from just a 20 minute conversation on a controversial topic? Nobody ever changes their mind like that with other topics). But hindsight is 20/20, and it is really quite difficult to tell when a result actually *is* too good to be true.
Keep working on this normal function. I’ve found out that this calculator doesn’t work so well with small values for the hyperparameters so I’ve got to figure out how to fix it. Also need to figure out how to set the x axis dynamically so that it looks pretty. Work in progress!
Also- found this excellent, excellent, excellent reading list on Fabian Dablander’s github. I considered sharing on twitter BUT you never know what people want shared. This page also said it was a work in progress so I’m hesitant to share without asking first. I know I wouldn’t want people sharing my program-in-progress that I’m working on without at least letting me know!
Wrote a blog after a reader emailed me. Check it out!
Did some more reading of McGrayne’s book today. This is so great.
More reading of McGrayne. Still love it. Can’t recommend it highly enough. She has a real way with words and I think even non-stat-nerds would enjoy this book!
Thinking more about my next blog today. Not much to report.
No stats today. Taking a break to go to the lake 🙂
Michael Inzlicht writes to twitter asking for advice: When do you start to worry about just-significant p-values? Then people start wondering when we should be confident in asserting biased reporting. I think it is kind of funny that we take p-values with a grain of salt, but then we look at their distributions like they mean anything. If p-values were meaningful we’d take them seriously already! Trying to find rules for when we conclude biased reporting has occurred is just plain silly, especially when it comes in the form of a ghetto p-value. “How frequently would we find this pattern of p if the null is true/ bias in reporting/ true effect?” It’s all just so ad-hoc I can’t stand it.
A family friend has been in the hospital, so I’ve been spending most of my time there. Not much statistics except for a bit of reading of McGrayne. Almost done with the book! I did my first reading for pleasure at 4am one of these days, as I stayed up for my night shift with my friend in the ICU. He’s doing ok, so please nobody worry after reading this 🙂