Another fairly long discussion today on twitter. I ended up getting caught up on a point and talking past someone, and when I finally realized it I felt a bit silly. But that’s life. My point was legit though! Just not exactly related to the conversation. I’m going to start another bad research euphemism (as if we need more): Questionable research conclusion (QRC). Applicable whenever someone accepts a null hypothesis after a nonsignificant p value test.
A paper came out last from from Alec Beall and Jessica Tracy, titled “The puzzling attractiveness of male shame.” This is another one of those studies that tries to relate ovulation (crudely self-reported) to a pattern of “behavior” (ratings of faces in this case). The sample is a batch of MTurkers and the two groups, high vs low conception risk, have Ns of 29 and 45 respectively. Now, don’t get me wrong, their claims could be dead on center and seem to be motivated by a theoretical position. But there is just nothing you can learn here from 74 MTurkers who only vaguely remember when their last ovulatory cycle started. Any real signal has to overcome the usual measurement and sampling errors, and then on top of that it must overcome the crude ovulatory cycle reporting. Just an utter waste of time and money doing these studies. [Hal Pashler and I just asked the journal to explain themselves, stay tuned]
Also- surely the sample of participants are only rating a possible sample of face stimuli, right? The stimuli should be treated as a random factor in their analysis (ala Jake Westfall and colleagues’ recent papers). Otherwise the alpha rate can be hugely inflated and the findings won’t generalize to another study that uses different, yet similar and relevant, stimuli. Add this to the poor measurements above and you have a straightforward example of wasted resources.
Today Ulrich Schimmack posted a blog entry titled, “Bayesian Statistics in Small Samples: Replacing Prejudice against the Null-Hypothesis with Prejudice in Favor of the Null-Hypothesis”. He reviews a recent adversarial collaboration that came out recently (covered in some detail here) investigating whether or not horizontal eye movements can improve someone’s memory. He finds it odd that the researchers didn’t use a two-tailed prior instead of a one-tailed positive prior for their Bayes Factor, since that would have scaled down the evidence for the null since the effect was negative. But like I commented on that post, all of the researcher’s agreed before the study started that the effect should be positive under this theory, and furthermore it would be nonsense to test a two-tailed model that nobody claimed to be at least a reasonable representation of someone’s belief/theory. Model testing is only valuable if the models being tested are principled and theory driven. Plus, using the data to posit a hypothesis that is then tested on that data is double-dipping, and that is cheating for everyone (bayesian or not).
Also interesting was Richard Morey’s comment explaining how Bayesian power can be unintuitive. Since the stopping rule was set to be “sample until passing BF = X”, the power was much higher than the small sample suggests. Since Bayes Factors tend toward the “truth” as N grows, these designs and stopping rules actually can have extremely high power at the same time as having the potential to stop with small sample sizes. This biases effect size estimation, but sometimes the goal is to measure support for a theory vis-a-vis another, not necessarily to get the effect size precisely correct.
How to calculate a bayes factor on an odds ratio? Wagenmakers to the rescue. He and his postdoc will have a paper out soon about it, and currently it is calculable using the BayesFactor R package.
Also, added many new papers to my “to read” list. Some about Bayesian Hierarchical modeling, some about other bayesian potpourri.
A cool new paper came out in Psych Science, by Julia Shaw and Stephen Porter, about implanting false memories. Really cool, you can see my tweets about this here. I progressively realized the stats were a bit incoherent, but this is like those old pavlovian experiments: the fact that it works at all is amazing, independent of the stat nuance.
Read Jeff Rouder’s “Optional stopping, no problem for bayesians” paper again. As I said the last time I read it, using a simulation with a known, fixed, “true” effect to evaluate bayesian methods, in which “true” effects are random variables, makes so little sense to me.
Also, read Jeff Rouder and Richard Morey’s 2012 regression BF paper today on my plane ride. What an amazing paper. Honestly, this paper may be a candidate for the one that most shapes my statistical outlook. Just a real gem.
A new paper came out prescribing “calibrated bayes factors”. As Richard Morey and colleagues (Wagenmakers, Rouder) point out, these “calibrated” bayes factors are total nonsense. They lead to absurd results that nobody can accept, such as “calibrating” H1 and H2 as identical, thereby implying a constant BF of 1 for all possible data. Just silly.
Reading Edwards, Lindman, and Savage’s 1963 paper that introduced psychology to Bayes. A really long paper, and some of it shows its age, but it’s a perspective changer. It starts off by explaining the “principle of stable estimation,” the idea that if your prior distribution on the parameter of interest is not in strong contradiction to the likelihood implied by the data, then it is usually overwhelmed by said data and we are brought to a consensus. The paper spends a lot of time justifying this, and has a lot of heavy stat notation that mostly can be skipped without missing the main points. It then goes on to bayesian hypothesis testing and the likelihood principle. A hard paper to get through, and very long, but well worth the effort.
Stephen Heard wrote a piece today titled, “In defence of the p-value“. I don’t think it is a very strong defense, honestly- but I may be biased against p. He first acknowledges the gigantic literature critiquing p-values, and then brushes it aside: “Now, the fact that this literature exists is understandable. After all, it’s rather fun to be iconoclastic”. Sorry, this isn’t just rebelling against the establishment for its own sake, but many principled critiques of a method that many think is unsound. He lays out a general list of what p-values are not or cannot do. Having read Royall’s 1997 book, and Gigerenzer’s papers, his list has some misconstruals of p-values (or maybe not misconstruals, at least it isn’t clear what school of p he belongs to; they are misconstruals for Neyman-Pearson, that’s for sure). I think I can link to this post and cover most of it. Long story short, you can’t mix and match NP inductive behavior with Fisher’s significance tests a la carte and expect to get anything statistically coherent.
[update] I commented (link above) asking Stephen what school he sympathizes with, and he has confirmed #2 from my post (but he isn’t sure 2 and 3 are really that different. I can see where he is coming from).
Wrote a new blog entry, quoting Edwards, Lindman, and Savage’s 1963 paper introducing bayes to psychologists. What a phenomenal paper.
People say Bayes factors will be abused just like any statistic is abused. What is stopping folks from conducting their tests using the most generous prior after the fact? Well, if someone uses a parameter prior that is obviously overly generous, then everyone will notice that. It’s obvious because priors are public. And then reviewers can simply ask for a robustness check if they think the prior is having an overly strong influence on the results. (of course, it should have some influence, but robust results shouldn’t qualitatively change because of a few tweaks to the prior).
Also- reading Neyman’s 1957 paper, and he lays out his philosophy of statistics. He frames it in Von Mises’s attempted, “limit of relative frequencies” terms. And funnily, he says, “This attempt is frequently viewed as not entirely successful.” Yes, quite. At least he admits that the critique is there and doesn’t brush it under the rug! I am more of a proponent of de Finetti’s and Savage’s and Lindley’s personal probabilities.
More good comments on the R-index blog. Bayes is beautiful!
Also, continuing on Neyman’s paper. Not sure I’m loving his arguments.
Good twitter convo while I was half-asleep this morning. I think bayes can be confusing and if one isn’t versed on the likelihood principle and other such concepts, it does sounds kinda fishy. Or maybe not fishy, but kinda ‘off’ or counterintuitive. But these principles only run counter to intuitions implanted through primarily frequency stat education. The beauty of Bayes, to me, is that it separates strength of evidence from the probability of obtaining misleading evidence. That is a key difference between a bayes factor and a p value and I love it.
Also, what’s up with Jerzy Neyman full block quoting from a german paper. I mean, I get that he is quoting Gauss, but come on!
Presented the Dienes paper today. It went pretty well I think, a better reception than I expected. I was expecting pushback on it being so subjective, but there wasn’t much criticism of that kind. A good point was raised, asking how using cutoffs of 3 and 1/3 escape the tyranny of the .05 cutoff. It’s a really good question and I think it could use it’s own blog post, but I’ll do this quick. There is no mandate in Bayes to use hard and fast cutoffs, and in fact some people say we shouldn’t use cutoffs at all (here is a good discussion on Richard Morey’s blog).
Also- I gave the blog the makeover it desperately needed. I hope the readers like it!
A snapshot from my calculus book for the holiday. I think this book does a great job of explaining the concepts without getting bogged down in the heavy stuff, and it is really witty and fun to read.
I think it would have been so awesome to see Neyman, Pearson, and Fisher argue in real time on twitter and blogs. I think they would benefit a lot from the current climate because it would be really socially awkward not to address each other when they are tagged on the same twitter thread, they would basically be forced to speak to one another!
Reading the articles I snapshotted in the above entry by the stat founders. Fisher comes off as a real curmudgeon and Pearson is very matter of fact. Neyman has an interesting writing style and I love how he goes point by point through Fisher’s criticisms and nails down the inconsistencies. I still don’t like the concept of “inductive behavior”, but I like alternativeless significance tests (a la Fisher) even less.
More thinking about treating stimuli as fixed factors after reading the paper Chris Chabris linked the other day by Wells and Windschitl. It threatens any validity of the experiment. Tal suggests that the stimuli fixed effect fallacy may play a part in unscrupulous piloting, waiting to run an experiment until finding a stimuli set that “works”. I hadn’t thought of it that way before, but it makes sense. Inflating alpha rate by cherry picking beneficial stimulus sets. Think I might write a blog post about this. Also wondering how one would meta-analyze crossed participant/stimulus random effects; I don’t know of a method for it but I’m still new to this literature. Also, Tal let me in on a secret that I can’t tell people about (I assume, he DM’d it to me) but it’s gonna be good 😉
Also- added my open blogging page at the top. Check out my works in progress! Even if for some the progress is only that I had the idea.
Thoughts about last night/this morning’s twitter conversation: Stopping rules seems to be a hairy subject. While it is true (as Tal pointed out) that sampling until BF >X introduces bias into the posterior, it is still a valid posterior. As I state whenever it comes up, the probability of obtaining (biased) evidence does not influence the evaluation of the strength of that evidence. A posterior obtained by updating bayes rule is a valid posterior, by definition. The long-run properties can and should be examined, and if estimation is the goal then eliminating bias is important, but the evidential value contained in the data is the same regardless of stopping rule. Richard Royall does a great job laying out this concept. Since stopping rules are not included in the likelihood function, they do not impact the evidence. Only data at hand are relevant. Simple.
Also- Daniel brings up that some say H0 can never be true, so type 1 errors can’t be possible. Maybe. But even if we grant that, then there is the problem of what Gelman calls Magnitude errors and Sign errors. That is, saying an effect is big when it is small (or vice versa), and saying an effect is positive when it is really negative (or vice versa). Wells and Windschitl called this last one the Diametric error, and they explain how these errors arise when stimuli aren’t properly sampled.
I think it’s time to learn R. I am going to embark on the data science specialization on coursera, a 3-4 month journey that should be pretty fun. My main goal is to become a functional R user and be able to hold my own using it for data analysis. In march I am going to take the 2 intro classes together and then after that I may start taking 2 or 3 main modules, depending on the difficulty. I think I would qualify for their financial aid (I make almost no money), otherwise it’s $470 dollars for the verified certificate. Ouch. In order to prep for these courses I’m getting started on learning R with Dan Navarro’s “Learning Statistics with R“. I don’t think the stats will be new to me, but that’s good because I want to be able to entirely focus on the R.
Re-reading Rouder and Morey’s default regression paper. I think I’ll write a blog post this weekend about how to use this method and give some examples. I’ll leave out the derivation and big equations since I think that turns people off, and really focus on using the method to richly analyze data. I think this is the paper that people getting into actually using Bayes stats should read, since it gives some great exposition to how priors are chosen and why Cauchy priors are so valuable. It also has a section at the end that addresses certain critiques of Bayesian methods.
Talked to a colleague about Bayes and when it adds value. Was asked to send along good reading that explain how to use it and when it adds to / changes classic interpretations. I think I will send along Rouder and Morey’s 2012 BF regression paper and probably the 855 ttest paper (can’t remember the author list).
Started learning R today. I’m using Ryne Sherman’s free R class (available here) and I’m also reading a free book that was recommended. I am going to be starting the Coursera series on data science in about 10 days, so this will give me a small boost there. R isn’t so hard, but the commands will take me a long time to remember by heart (I expect).
Learning R with Ryne Sherman’s course. It’s great, although a little hard to follow at first. But once you get rolling it’s easy to follow along and I feel like I’m starting to just scrape the surface of seeing what can be done with R. Currently working through the graphing/plotting lecture and I’m learning how to make cool histograms and plots. Ryne has been super helpful in answering my questions, and he gave me some really good advice.
R is very intimidating but once you get the hang of it the challenge is just recognizing which functions to use and looking them up. I’m honestly amazed at how many ways there are to plot data, and how flexible R is in customizing plots.
Also- here was my first attempt at using R with organic data. I plotted the simplest of simple histograms of my word counts for diary entries. Follows a really nice curve!
So a journal just banned significance testing. Is this really happening? I’m not sure that it’s a good idea. I want people to stop using significance tests and confidence intervals because they recognize the absurdity of the test, not because we outlaw them. But, maybe that’s just me being naive. Tal said, “[p] has been absurd for 100 years. every statistician thinks s/he knows the secret to weaning ppl off it”. Well, I guess I should get in line then!
Making my way into R some more. It’s so cool! Ryne’s course gives a good taste of all the different things one can do with R, and I feel like I’m learning a lot just by following along. About to start on the inferential stats lecture, which means I’m that much closer to being weaned off of SPSS. Although, JASP isn’t a bad option if I still want point-and-click stats.
Met with an Advisor and got some career advice about quant programs, where to apply, etc. He made it sound like where you go is everything for your career. That might be true, but I might still be too naive to accept it. I want to go somewhere I can learn the cool stuff, the useful stuff, not just the popular stuff. I have a lot to think about after that conversation because it went contrary to some of my expectations.
Also- I love that people are sharing their posters on the OSF. It makes me feel like I’m there (SPSP) even when I’m not! Chris and Joe are blazing a trail for open science, and I love where it is going.
Started writing my regression bayes factor blog today. It still has a ways to go, but it’s really fun to work on something like this. At first I was having a hard time figuring out what I was adding to the simple BayesFactor package manual that Richard Morey has already written. I think I’ve figured it out. I will give an introduction to why regressions are so useful. Then I will introduce bayes factors and why they are awesome. Then I will get back to regressions and explain why bayes factors make sense to use for regressions. Then I will start the how-to section, going step by step largely adapted from Morey’s BF manual. The added value will be that this will be a guide accessible to anyone who hasn’t ever use R before. Each step will explain what we are doing and why, and then after each step I will interpret the functions we just computed. This feels like there will be sufficient added value to warrant the blog post. (I hope people read it! Maybe I’ll cross post to the Winnower.)
Worked again on my draft of the regression bayes factor tutorial. I think it can be good. I just need to introduce bayes factors in a way that makes sense, and then make sure the code that is embedded in the page works in a purely copy-paste fashion so that people can follow along easily.
Also, there was a commentary piece (here) on that nhst ban that was going around on twitter today. I still think the ban is silly!
Working on that blog post again. I am having trouble with the structure, but I think I have a good outline now. I think I’ll start with an intro to Bayes factors now and then make the transition to using them in regression. Don’t know how long this post will be. Hopefully not more than 3000-4000 words.
Finished the first draft of my winnower piece today. It took me a while but I think it’s an okay first draft. I don’t have word on my new computer yet, so it’s hard to get the formatting correct. I went through the background of bayes factors and how the solve the null hypothesis asymmetry problem, then I explain how to use them in R, and then I very briefly touch on simulating posteriors. I don’t really add anything new to this, but as far as I can tell the background of bayes factors and the use of the BayesFactor package have not been written up together before. The regression bayes factor paper only talked about using the online calculator, and the online guide to the bayes factor package has no background or explanation of how to use R. This fills that spot by giving theoretical background and rationale for the procedure while holding a non-R user’s hand so they can get the most out of the method.