Wow, starting the ninth month of this diary.
A really big article came out on the JEPS blog with an interview of Jonathon Love, lead developer on JASP. A great interview and it seems to be getting a lot of coverage. People really hate SPSS, and they seem to love JASP, so it looks like a revolution is coming! Hopefully JASP can keep kicking ass (I would bet on it).
Dave Harris kindly presents the bayesian RPP data in a density plot.
This is a nice video interview with Brian Nosek about reproducibility and the implications of the RPP.
There’s been a lot of discussion in the ISCON and psych methods fb groups about the reproducibility project (of course). I’m going to link them here so I can find them again but I’m not going discuss them. Link 1, 2, 3, 4, 5.
Sanjay Srivastava shared his new blog today, in which he asks how much credibility we should give to the idea that hidden moderators can explain the “failures” to replicate. He makes a convincing case that they could not, unless we grant that the way we study these effects is entirely fickle and uncontrolled (footnote ***** in the post).
Andrew Gelman wrote an awesome blog post responding to a recent news piece that said psychology was not in crisis. He crushed it. Hard. Great comments on that post too.
H. Colleen Sinclair wrote a great post about her experience working on the RPP. She says that she and many other authors fully expected their assigned study to replicate.
Interesting take on why results are often spurious, on the Reach and Touch lab blog (the author is Tobias Heed).
And I think I can finally be in the club!
I agree with Sam that sharing data and code will be one of the biggest improvements we can make to current scientific practice. People will be more careful in general when they have to reproduce their code in a readable format, and then people can implement their own personal priors in the analysis if they have the code and come to their own conclusions.
Simon Farrell asked me a few days ago to re-analyze the RPP data using more default priors. Here is the result. The numbers generally agreed, but there were more extreme BFs in the replication BF set since the priors are so precise.
Andrew Gelman wrote up that crappy Psych Science seeing/feeling blue study. Totally demolished. I can’t say I’m surprised that Psych Science is still pumping out crappy papers, but I am a little disappointed that people don’t see this kind of thing and recognize it for what it is (garbage).
Just saw this very nice post from Dan Mirman on his experience with one of his papers being a target of reproducibility in the RPP. He says “My experience with the reproducibility project was that they were extremely careful and professional.” Of course! The people in the RPP were mostly expecting the original results to replicate, and I’m not surprised at all they were cautious and engaging with the original authors.
I also downloaded a few random papers: an interview with Zellner, Haldane’s paper where he formally lays out the haldane prior, Kass and Wasserman’s 1996 paper on defining priors, and a few reviews of Jeffreys’s book.
I tinkered with my posts at the Winnower, to try and see if I can fix some of the formatting. While it is nice that they allow you to copy/paste the url straight from your blog, their program introduces a lot of formatting errors. Hyperlinks disappear, figures get rearranged, etc. Frustrating. I posted one of my blogs from earlier this year, the one on p-value paradoxes. Here’s the link.
“Pierre Laplace” posts on why one should not “think like a Bayesian and check like a frequentist”.
I feel this quote by Diaconis (1991) about Parapsychology applies to a lot psychology these days (feeling/seeing blue, etc). One difference is that, while low, my priors on these kinds of effects are nowhere near as small as psi.
I also found this awesome access policy by the Annals of Human Genetics (on the front page to one of Jeffreys’s papers). I wish every journal did this. Old papers from over 50 years ago have no business being paywalled, let alone for over $30-50 dollars per article.
Scott Alexander responds to a NYT article that claims psychology is not in a crisis. His main message is the same as the title of the piece: “If you can’t make predictions, you’re still in a crisis.” 100% agree with that sentiment. Prediction is essential, and the advent of expectations of quantitative prediction will be the turning point for psychology’s turnaround. Unfortunately, he makes an elementary omission here: “The first concern is that it ignores publication bias. One out of every twenty studies will be positive by pure chance” The rate is 1/20 only if you are studying an effect that does not exist. So “one out of every twenty studies” should have this added on: “in which we are studying nonexistent effects”.
Ed Yong also responded to that piece, saying it was irresponsible to sweep the problems under the rug.
Yesterday and today I went on a massive download spree from projecteuclid. So much so that my IP address was blocked from downloading any more articles (multiple times). Apparently they think if you download 100 pdfs from their site in 2 days that you’re not doing so with good intentions. Well, in fact, I plan to read all of these papers at some point. So stop blocking me!!! All I want to do is read awesome things, is that so bad?
Today was a holiday, so I didn’t do much work. Ooops.
And also this from Felix Schonbrodt (in german, I used google translate).
I shared this really interesting piece of trivia on twitter: “Bayesian statistician Harold Jeffreys effectively used the h-index 50 years ago to grade his cycling skills.” Edwards writes to Nature after their piece on the H-Index saying Jeffreys used this decades ago to keep track of his cycling prowess. Pretty cool!
Also, this awesome quote from Howie (2002), page 118. “Like an alcoholic uncle, [Bayes’s theorem] could be turned to in emergencies but was an object of shame nonetheless”
Interesting post from Neuroskeptic. While I appreciate the argument, he is definitely wrong about the average power being 92%. The average power is way lower, because published estimates are always off. But perhaps that is the point.
Troubling: “With more noise in the process, journals will be more likely to judge submissions using the reputation of the author’s university as a proxy for quality, with troubling consequences for early career researchers at lower-ranked institutions.”
Spent the day writing. Not much to report. Working on the paper with Daniel, and working on the data with Joachim.
This was very relevant to my application period coming up.
“Is college tuition too high?” Unequivocal yes.
EJ shares his new course materials, “Good science, bad science”. Looks great! Pablo Bernabeu finds a funny bit, “The data can be generated automatically (for instance using R) or Stapel-style (by hand)”. OUCH.
This twitter thread was somewhat frustrating. I think, despite the quips about clarity, we were not making ourselves clear in the conversation.
Daniel asks, “I’ve never read a convincing criticism against the correct use of p-values. If you have, I’d love a link to the paper.” My answer is to read Royall. There’s a lot different discussion threads on that tweet.
Also doing some reading of the Fisher correspondence collection. Some really great stuff in there. I love that quote from Jeffreys. He was so clever. It’s really really cool to read direct correspondence between authors. Skip the formalisms present in papers and hear their unedited thoughts.
Speaking of clever, I hope to one day have titles as clever as Frank Lad.
Reading Lindley’s 2004 paper, “The wretched prior” and, while interesting, it’s very shallow. I think it was just an invited paper because it just touches on a little history and surface level commentary. Oh well. It has a great quote, “Objectivity is merely subjectivity when nearly everyone agrees.” Awesome.
Very neat article by Joseph Berkson. It’s interesting. He has a crude version of a likelihood-type argument. Low p -> ev against H0 only if low p is frequent under plausible H1. Also, equivalently, for high p. Reminds me of Daniel Lakens’s post from a few months ago. It’s similar in spirit, but Berkson is less technical. He is really just throwing out the idea, and I think he is one of the first to actually propose a critique like this. I’m also fairly confident that Berkson’s argument on page 332 surrounding table 1 is a prototype version of his “interocular traumatic test”.
There’s this interesting (strange, snarky) blog post by Smut Clyde (a pseudonym, surely) about that feeling/seeing blue paper. He is funny, and the comments are too.
“We never imagined as we ran regressions on every possible combination of variables that one of them might lead to a publishable paper.” Also known as, what, you’re supposed to have a hypothesis before you test it? — (commenter Yastreblyansky)
That post points to the pubpeer discussion for that paper, and it does not look good for the authors. Calls for retraction, starting on roughly the same day that the paper was featured on Andrew Gelman’s blog (see entry from 9/3).
Also- A cool little google forum from the late 90s /early 00s about stats. Searching for Bayes reveals some very interesting conversations. I really love that google did this whole group viewing setup thingy.
Interesting post from “Judge Starling” writing about the reproducibility project for cancer biology. He quotes Jeffrey Settleman, “You can’t give me and Julia Child the same recipe and expect an equally good meal.” And we are back to the Jason Mitchell defense. Skills of the replicators called into question, advocating the need for a magic touch to get results. It held no weight when Mitchell said it and it holds no weight when Settleman says it. There just simply is no reason to take a result seriously if only you can demonstrate it.
Another interesting thread from that google forum. What is a non-informative prior? I like reading these threads. It’s interesting to see how people with stats, engineering, and applied backgrounds approach these questions. One nice quote from Robert Dodier,
>The use of the phrase “noninformative prior” in an applied context
> is starting to make my skin crawl! [quoting a previous commentator]
Well, I don’t know that it’s revolting or disgusting, but it does
suggest there is room to build more information into the model.
All this business about priors is sort of like brushing one’s teeth.
You’re supposed to brush after every meal, and use dental floss too.
Not everybody is motivated to do all that. But as much as you can do,
is good for you, and doing nothing is definitely suboptimal.
I love that. Not everyone can be bothered to set up informative priors, but when you don’t you should recognize that you are working in a suboptimal fashion. I also liked this bit, also from Dodier, “Someone whose opinion I respect (Geoff Hinton) likes to say, “More data is always the best prior.””
I read a few more threads on that google group. Lots of interesting stuff!
I also watched a few Richard Feynman videos on youtube where he is talking about quantum mechanics and how there is inherent appeal to probability in the mechanics (heisenberg principle, for example). Gave me something to think about re: personal probabilities. Is this probability that is present in quantum mechanics a personal probability? Or is it a property of the world? It also was interesting to see that Feynman says one cannot talk about something being either red or green before you measure it. You can only say that once you measure it, it can be red or green. Seems analogous in a way to thinking about statistical parameters. Some say there is a true parameter out there that we seek to measure, and its value is independent of our attempt to measure it. We simply try to measure it as best we can. But quantum mechanics appears to be saying, no there is not a parameter out that has a true value that you are simply trying to figure out. It would suggest there is a measurement you can make and that leaves you with possible parameter values. Thinking about this makes my head hurt.
Tim van der Zee shared a neat new blog entitled, “how to science?” I 100% support students starting blogs. It’s such a great way to get your ideas out there and make connections. His blog is kind of a “here’s my journey” piece about what he is currently thinking about stats. I also had a few comments with commenter Bill on the blog. We obviously disagree on the value of Fisher-interpretation p-values, and that’s ok. I know a lot of people disagree on my interpretations of priors and posterior probabilities.
The Proceedings of the Natural Institute of Science (a parody science journal/news site, with acronym PNIS) wrote a brilliant piece where they “interviewed” Mark Leary of the APA. I shared my favorite quote on twitter, and apparently I am not alone in my appreciation for this kind of humor.
Gregory Hickock noticed something strange, that the reproducibility project isn’t actually open access. They must have just had it unlocked for a short period of time when it came out, because I was able to access it with no trouble but not anymore.
My copy of Jeffreys’s Scientific Inference (1937 edition) came today! It is so old, and yet seems to have never been read. I’m excited to have such a great old book to read and add to my bookshelf.
Not much else today, studying for GRE (verbal section, ugh).
Oh, I also found Xenia Schmalz shared a new blog a few weeks ago that I missed. It’s a warning to remember that correlation is never causation. She uses some examples that I had not heard of before to make her points that I found to be quite clear. Great post.
Today I read Berkson’s 1930 explanation of Bayes theorem. I shared a few parts that I thought were particularly good. They don’t seem original now, but for back then they would have been new to just about anyone reading them. I love reading old papers, the examples they use are so simple and clear.
Today was a writing day, working on the super secret bayes project mainly.
I also worked on a super secret other project, that is way way more secret than the bayes project. Sorry, but I’d get into a lot of trouble if I spilled these beans. But I can say that it involves discussion of stimulus sampling and fallaciously treating stimuli as fixed effects.
Daniel says on twitter: “In a ‘criticism’ on the
#ReproducibilityProject, social psychologists show they still don’t understand statistics: [link]” and I think he is exactly right. That piece is just sad. Shame on you, Wolfgang.
Brent Roberts shared his new blog post, titled, “The new rules of research” and I think it’s pretty good. He proposes new rules, Bill Maher style, for psychological research: “The peer-reviewed paper is no longer the gold standard” and “Don’t trust anyone over 50” being just a couple.
Working on the super secret bayes project again.
Jeremy Fox, on his blog Dynamic Ecology, writes: “Post-publication review is still only for the scientific 1%, PubMed Commons edition” he continues, “Under post-publication “review”, the vast majority of papers don’t get reviewed in any meaningful way”. Interesting. I’ve done a little post-pub review, either in the blog or in this diary. I think the biggest thing I take away from it is that a LOT of crap makes it through pre-publication peer review, so we had better start catching it in post-pub peer review or nobody will.
Took a practice GRE verbal test today, scored pretty well. Around 80% or so answering under recommended pace. So I signed up to take the test in roughly 3 weeks (oct 13). Shouldn’t be too hard to beef up on some vocab by then and score pretty well.
Gelman posted today on something I found really interesting. I often don’t include his posts in my diary unless they are really memorable or stick out to me. Today was a discussion of Morey’s blog from a while back, the frequentist cases against significance testing. Morey takes Neyman and turns him against Fisher. In the comments on Gelman’s blog, Mayo strictly does not agree with Morey’s interpretations, but I find her arguments really sorely lacking. I mean, you can literally read Neyman and Fisher and they say explicitly where they disagree. To say they in reality didn’t disagree is crazy when they avidly argued with each other in and out of print.
Also, Andrew & Sabrina nail it today: “Psychology’s real crisis: stupid goddamn metaphor priming studies” I’ve been saying this too! It’s so annoying to see studies take a cutesy metaphor and try to turn it into a priming study. There is absolutely no underlying theory and it just reeks of disingenuity — they are playing the game and we all know they don’t actually buy this stuff. Nobody can be that naive can they?
I had the most frustrating time trying to book my flight to Amsterdam today. The bank would block the purchase and send me a fraud alert, I would contact them and tell them it was legit and they’d remove the alert, and then I’d go to purchase the tickets again and they would block it again! This went on and on for hours. So frustrating. In the end I got it though.
Reading the book I ordered a few days ago on the plane ride today. It’s really more of a proceedings than a book, but it has 5 articles and then a long discussion. There are papers by Jerome Cornfield, Bruce Hill, Dennis Lindley. Seymour Geisser, and Colin Mallows. On the plane I finished Cornfield’s paper, read the entire discussion, and got about halfway into Lindley’s paper before we landed. So really good stuff in there. The discussion was particularly great.
Today I am taking the day off. I am in Las Vegas and I walked ten miles around the city all day. So I didn’t really have time for much stat thinking/reading/doing. I mean, I guess casinos are like the ultimate applications of probability and statistics, but I’m really just here to play video poker at the bar and get comped drinks. And also to see the shows. Saw a great magician today, Mat Franco, who had some really awesome tricks.
Another day off 🙂
Reading that book again on the flight home. Thinking I’ll write a post about binomial vs negative binomial sampling and use a neat little example from Cornfield’s chapter.
Working on my research proposal for the NSF grad student grant. The main theme: tackling publication bias. Pub bias is arguably the most lethal phenomenon with respect to confidence in psychology. And another theme is statistical communication. What good is developing all these cool bayesian methods if there is nobody who knows about them?
I also tried to use an online binomial calculator, and it gave me a p-value of 1.2….. What the hell? This is egregiously bad.
Reading more threads from that old statistics google group today. This thread is particularly interesting. And this one. Herman Rubin has some really good zingers. “Statistics has been called the religion of medicine, & the way that p values are used is only justified on religious grounds.” I really should be working on other stuff, I really should. But today I just couldn’t muster the motivation. Sometimes you just have to take a day to let your brain veg out watching football.
Also, “Pierre Laplace” blogged again. It looks like he is starting a series about Jaynes’s work. I am excited for this, since I find Jaynes to be an extremely fun read.
I think I’ll steal this phrase: “we are safe to assume all these experimental setups are bunk and produce stochastic pseudo-significant effects which may go either way.” So good.
Result: Non-significant p-value. Claim: No effect. Will things like this ever end? No, the problem doesn’t go away with a large sample. If something has a logical deficiency then just increasing your sample size won’t help.
Working on projects today. Working on the outline for the secret Bayes project. I now have 9 projects on my plate, and I might have spread myself out a little too thin this month. Well, 8 if you don’t include taking the GRE in 2 weeks.
GRE studying and working on the super secret Bayes project and grant proposal. The proposal has to be reworked somewhat, since we won’t know who is reviewing it and if they’ve seen a similar grant before. So now it’s the same basic idea but more about implementation, distribution, and use than development of the ideas. Which is okay 🙂 I think it will end up pretty solid, but it does need to be condensed a lot. That’s the thing with rough drafts though is that it’s easy to write long ones, shortening them is the hard part.
I added a new hub for the Understanding Bayes series (link). This basically explains what it is, why I’m doing it, and what people can expect for future topics. I think this will be useful for new visitors to my blog because they will see the tab at the top after the posts are long lost to the ever-updating (except this month) home page. It will also be handy to point at when people ask me questions about Bayes.
A neat new arXiv preprint was posted today, about Bayesian model averaging. I haven’t had a chance to read it in too much depth, but it seems cool. *thumbs up*
Finally made my way to the JASP project (did I ever mention this?). I can’t believe it took me this long to get to it. I feel like I was the single holdout who hadn’t completed his assigned analyses. Feels really good to be able to cross this off my tasks list, though.
Worked a bit on the grant proposal. Joachim had some great comments that helped condense the material.
Managed to not study for the GRE today. Nice going, idiot.
Also there’s this nice page of latex tips (link). Very timely for me, as I’m currently learning this!
Some good GRE study today. Took a practice test, scored 56/60. Not bad but room to grow.
Also this latest paper by Jeffreys is THE BEST. He is so clear, the examples are so simple, and yet he identifies the unstated assumptions in the criticisms of Bayes and shows that they are actually assuming Bayes all along. He first gives a simple derivation of Bayes theorem that I had not ever seen before. He covers risk, and utilities (informally), the possibility of excluded alternative hypotheses, dissociating action and inference, His discussion of Eddington’s penny example is truly genius.