Lots of twittering today. Topics: What is p-hacking? What is p-failing? Garden of forking paths. Embrace priors! Is saying someone p-hacked equivalent to accusing them of fraud? Is ignorance of the law an excuse? Lots of conversations and don’t feel like linking to them all.
Coursera data science track starts today. I’m currently learning the basics of the basics about R and all the current tech. Hard to say much about it since I just started, but the mobile app is fantastic. You can download all of the videos and watch them offline, which means I can watch them on my bus ride every day and get about 3 hours of work done every week on the bus. Very very handy.
Took the first quiz for the data scientists toolbox. It’s week 1 so it’s not hard at all. You get three tries, and only the highest one counts. They give you weekly quizzes and assignments but if you are late they also give you 5 free late days to use, so it seems like there is some wiggle room. Going to start my R programming course tomorrow (probably). Taking them at the same time might be hairy, but hopefully it’s not so bad. It will only get worse when I start taking 2-3 of the higher courses at once, so I better get used to it now.
Pretty interesting system they have at Coursera. So far so good.
Was working mostly on my winnower draft, and I think I have a good product right now. It can still be massively improved, but I think by the weekend I will have it in a state where it can be submitted. That way I can get feedback, reviews, etc. while still being able to update as I feel like it needs. I need to review the figures and code and make sure it is all up to par. I don’t know if it will be fully APA format, since it may have better flow if there aren’t figure captions for every screen grab. But, we’ll see.
More work done on my winnower draft. Had a friend read it over, and I think he really helped me solidify the intro. He knows nothing about stats or psychology, so if he can follow it without much trouble then I’m confident that anyone in psych could. I added more at the end about posterior estimation, which I think strengthened (and balanced out) the paper. Estimators and hypothesis testers can coexist, and certainly different folks can have a proclivity to choose one over the other. If we want these methods to be adopted then we need to give the people what they want. Some want bayes factors and some want posterior estimates, and the best part is they can have both.
Winnower manuscript is ready for submission I think, but my computer can’t get the formatting right for the website. I emailed support and hopefully they can help me out. It’s pretty freaking long… Hopefully it isn’t too bloated, I did try to stick to the important stuff. It cites all the good stuff: Dienes, Rouder, Morey, Wagenmakers, Royall, Vanpaemel. Hopefully I add enough substance to warrant writing something so long.
Not much statistical on my mind today. Tried to format my Winnower piece but didn’t have time.
Finished editing and formatting today. Finally submitted! Feels good, it’s a relief to just get it out. The nice thing about the Winnower is that it can be edited before I assign the DOI, so I can come back to it in a few weeks with new perspective and give it some updates and fix the sure-to-be-there errors. Oh yeah, here is the link.
Coursera update: Finished week one of the R programming course. I think the lectures leave a lot to be desired, but the fact that it implements swirl is cool. But there are no running examples through the lectures to keep it focused. It goes through all of the functions in matter of fact way, which makes sense, but it doesn’t invite you to program alongside the lecture videos so it’s hard to really absorb the material. The quizzes are the only real place you get to practice using R if you don’t elect to use swirl. However, if you are in this to learn as much as you can then there is no reason not to do the swirl exercises. They say in the lecture videos that it is totally optional, but I think if you really are in this to learn R as best you can then it is not optional. Honestly, you could probably just do the swirl and not even listen to the lectures. But the quizzes have lecture specific material, so that would not behoove you.
Been reading and rereading a few articles the last few weeks, figured I would catalog them. Lad’s 2006 comment on the Berger and Goldstein articles. Yu etal and their 2014 decision heuristics paper, Rouder’s reply, and then their rejoinder. Gelman and Rubin’s 1995 comment on Raftery’s paper. Trafimow’s 2003 Bayes paper, Lee and Wagenmaker’s 2005 response, and then Trafimow’s reply, and finally the Wagenmakers and Lee postscript. R T Cox’s 1946 paper, “Probability, Frequency, & Reasonable Expectation.”
The new ManyLabs data came out (here is the link to Nosek’s announcement). Very very interesting. It doesn’t look good for most of the effects tested. Most aggregate estimates are quite small, and ALL of the interactions are nil. Well, that is quite interesting indeed. Stroop is as good as it always is, if it were not there would be serious problems. Notice the green triangles on the top graph. Those indicate the original reported estimates for those effects. ALL effects were smaller than the original estimates, and not just slightly. We are talking full-scale reduction.
Coursera update: Home sick so I finished the coursera data scientist week two. Super easy quiz. 5 questions worth 20% of a course grade seems strange, but I suppose it is the first step into the data science track and so is basically a “how to install software” guide. Gets you set up on git and github, downloading packages in R, and using command line interface (mainly for git). I had never used git, github, or command lines before, so it was all new to me. But overall a pretty easy workload and super straightforward. Git and github seem like the most useful things ever.
Thinking more about the Sanborn et al. reply to Rouder’s optional stopping commentary. Here is what I was tweeting. It seems that there is a really strong desire to evaluate methods based on the sampling distribution of the statistic. Sanborn et al. make the claim that we should care about the sampling distribution of the Bayes factor because we need to satisfy researchers’ frequency stats intuitions. When BF and p disagree, then we must have a way to resolve the argument. And the way they suggest is to appeal to frequency stat intuitions. But those intuitions are ill-founded, and get researchers into trouble. I mean, come on, those “intuitions” are half the reason people can’t wrap their heads around p-values! If it were really so intuitive then people wouldn’t categorically misunderstand these things. Then when they say something to the tune of, we must find common ground to resolve our differences I want to stand up with brazen naivety and shout, “Math and logic do not need consensus! Appealing to what makes you comfortable is not an argument!” That won’t change people’s minds but sometimes I think they need to hear it. And I’m still young enough to wear my principles on my sleeves.
de Groot has some great writing. I think I will try to read more of him. He writes with such clarity and each piece of his I read has an impact on my thinking. I like this quote: “Although they can serve as convenient and useful approximations in some estimation problems, they are never appropriate for tests of significance. Under no circumstances should they be regarded as representing ignorance.” He is addressing the challenge of uniquely defining a prior that represents pure ignorance. A purely ignorant prior is not defined by a diffuse prior because the math makes it possible to get any arbitrary amount of support one wants. That means these priors do not work in hypothesis testing situations. In estimation it can work somewhat well in some situations, but it still is not representing ignorance. It merely implies that the specifics of the distribution chosen are not particularly important in some cases.
Downloaded a TON of articles. Very exciting, can’t wait to read them all. Articles by Kadane, DeGroot, Bernardo, Masson, the entire discussion of Diaconis and Freedman (1986) which is like 10 articles. It’s time to branch out and challenge myself, and I think this will really help. Bernardo is all about objective bayesian inference as far as I can tell and I don’t particularly subscribe to that, so I’m sure his arguments will give me a lot to think about.
Coursera update: The R programming course is HARD. Wow. I bet if you had previous programming experience before it wouldn’t be so bad, but for someone who is a noob it is super tough. The lectures are straightforward and easy to follow, and so is the swirl practice series, but then the homework assignments are 100x harder and have no guidance. The community site/forum has some people who can help you out, but you are only allowed to post 3 lines of code max or else you violate the honor code. That means you can basically only get help on a teeny tiny part of your problem. I get that they don’t want people posting answers in the forums or excess helping. But as someone who had no idea where to start, even after watching the lectures, I had a lot of trouble. I get the feeling that the course is not meant for a total programming noob, but when they say the requirements are the toolbox course (which is a joke) and some knowledge of basic programming will be ‘useful’- I think they mean to say ‘extremely important.’
Took a detour to start reading T. X. Barber’s book, Pitfalls in Human Research: Ten Pivotal Points. This a good book. It was a precursor to the current “crisis of confidence” movement in that it identified many problematic procedures in psychology. Very quotable, I might do a post that runs through the different problems at some point, but I don’t have time to write out any long quotes now.
Coursera update: Week 3 of the data science toolbox complete, except for the course project. This class is easy, and I’m sure it was designed to be. This week was the first lecture series that goes beyond software installation and into substance. What is data? What kinds of questions should we ask? How do we deal with data to answer those questions? What is inference/prediction/description/etc.? Overall, easy course. I don’t know if it is worth the $30 dollars they charge, but I don’t have any money so they didn’t charge me for it. Don’t get me wrong, this is a great service that they are offering and I understand that they have to keep the lights on somehow. I think this class in particular would be more reasonably priced at $15 or less because of how little work is involved on the part of the student. To be fair it does say 1-4 hours per week on the course page, so I should have known what to expect. Overall it’s worth the time investment but not worth $30. The R course, on the other hand, is a ton of work and I think paying money for it is more justified. I’m sure the more advanced courses will end up being similar.
Downloaded a lot of papers again today. This time it was all of Brad Efron’s papers that I thought I might have a chance understanding. Which wasn’t many, but that’s okay. This guy has a lot lot lot of papers. And a lot lot lot of citations of those papers. But that’s what happens when you invent a highly used statistical technique (the bootstrap). He has some cool papers, can’t wait to start reading them.
Coursera update: watching week 3 videos for the R course, which focuses a lot of it’s time on the apply family of functions. Some cool stuff there. I need to finish up these videos so that I can get started on the week 2 (peer graded) homework assignment. Seems easy enough but I’m sure there will be some snags. I like the idea of peer grading because it let’s you see how other people who also barely know what they are doing are completing this stuff. The bad thing is that there is no chance for your grades to be changed if you feel you have been graded unfairly. That kind of sucks, but they do suggest that you be lenient to your classmates since we are all in this together (and some people in the course are not native english speakers).
Also, at the suggestion of Fabian I have started to read up on the Savage-Dickey density ratio method that EJ introduced to psych a few years ago. Jeff Rouder says Richard Morey has implemented better calculations for it, so that will be a natural follow-up. Plus, I really like this group’s (EJ, Jeff, Richard) didactic style. Notation and math where needed, but always explained because they know they are writing for a psych audience usually. This is in stark contrast to, say, Jose Bernardo’s or Jim Berger’s papers that are formula heavy and they expect that you are able to keep up without hand holding. Well, frankly I am happy to have my hand held at least until I have a few more years of this stuff under my belt.
When I get to something that feels overly difficult (e.g., Bernardo/Berger) I remind myself that I only started on this path a year or so ago. I started from scratch with Bayes all by myself, and I am proud of where I am at this point. When I realize how far I have come in that year I feel a lot better. I remember reading Royall’s book (at the suggestion of EJ) and not understanding any of the notation. But I slugged through it and came back later (~5 months) and understood most of it until it got calc heavy. But I’m also slowly relearning calculus (very slowly it seems) so I’m sure that hurdle will be hopped soon enough.
Got a few new articles by LJ Savage today. I’m slowly filling up my folder with the work of all of the important bayesians.
Also, Daniel Lakens posted a new blog today about how probable different p values are. This post feels like it is just waiting to discover likelihoods. He is discussing the probability of finding p values within a certain range given different hypotheses and then comparing the pr(D|H1) to pr(D|H2). This approach is a poor man’s likelihood, and a homeless man’s bayes factor. For example,
If you have 95% power (…) and you observe a p-value between 0.04 and 0.05, the probability of observing this p-value is 1.1% when the alternative hypothesis is true. It is still 1% when the null hypothesis is true (…)
If he were to simply graph the likelihood function, he could “see” the relative support for all possible parameter values. Then he could compare any two parameters directly instead of circling around power values.
Coursera update: finished up week 3 lectures and quiz for R programming course. Easy enough, and the different apply functions are super cool. I imagine not knowing these functions leads to bloated code and inefficient time use. May update later today with more progress on the weekly assignment.
Also, downloaded many of Dickey’s articles. One step closer to having all of the important bayes foundational papers.
Read a great paper by ET Jaynes today. He examines how Bernardo / Zellner / Siow / Jeffreys would agree with Laplace. Turns out, Laplace has everyone beat! That guy was amazing. Laplace’s answers encompass everyone else’s answers, in that his answers answer the same questions but also offers more information not provided by the others. Jeffreys does better than Bernardo, and nearly as well as Laplace. Jaynes then asks if model selection and estimation ask fundamentally different questions. He comes to the conclusion that they do not. See this tweet for a teaser.
That teaser became a very long discussion here and then it broke off. Talking again about the reasonability of strict nil hypotheses. I think they’re possible, some don’t. Rasmus says their impossibility can be logically deduced a priori, but I am not convinced. He may be right, and I could have been just missing the point. I’d be interested in hearing more, but I think we ended up talking past each other a bit. Or maybe I was just talking past him. I do that sometimes it seems. I’m stubborn and hard headed 😛 I think I made a good argument though, so read the links if you’re interested and tell me if I sound like an idiot!
Downloaded many of AWF Edwards’s articles. He wrote a metric ton of book/article reviews. I guess he had a lot to say, or many people valued his opinion (probably both).
Coursera update: Finished the course project for the data scientist toolbox. Super easy. Set up R Studio, set up Github, set up a markdown file. Super simple stuff. The peer grading is easy as well. You are required to grade 4 peers but it is so short and simple I helped out and graded 20. The week 3 R programming assignment was easy as well. You just swap out some parts of a code template and add comments. Then you upload to github and you’re done. I decided I would use the github app that they have for macs. I don’t feel particularly comfortable with the command line so it is good and helpful for me. So far I would say the only real challenge of the R course is the week 2 (first programming) assignment. That thing was super hard, but I think it was just a steep learning curve.
Reading Royall again. I learn something new every time I come back to his book. He is super clear, but his book does have some typos which could make it confusing if you don’t know what he means.
Reading the exchange between Cornfield and Savage that supposedly made Cornfield jump on board fully. It’s enlightening to read the early thinkers on this topic because so much of my thinking I take for granted as being obvious. It’s really only obvious in hindsight. I may post a blog with a quote from savage that is especially insightful. Stay tuned.
Simonsohn’s paper came out in Psych Science today, the small telescopes one. Interesting concept, I was enamored when the preprint was going around but I’ve since lost enthusiasm for it. Probably mainly because I (partially) think like a bayesian now. The whole process is convoluted.
Speaking of convoluted. R-Index blog has a new post today. Schimmack explains a method of thinking about post-experimental power in an interesting way. I think the whole concept of power is devoid of meaning for making inferences, but if someone makes a good case I might change my mind. Aw, who am I kidding. “Birnbaum doth declare, sufficiency and conditionality shall guide you towards the light. And the likelihood principle will shine bright.”
Read over Royall’s 2004 paper again today. Daniel asks why nobody calculates sample sizes this way, and I don’t know why they don’t. It’s very intuitive to try to minimize the chance that you obtain weak or misleading evidence.
Sam wrote a very interesting post today. He does not buy Wagenmakers’s power fallacy. I suppose I do though! Also, his bayes factor distribution graph actually shows that the method and sample size used in the study actually have minuscule chances of strong misleading evidence. 2 or 3 % is pretty good in my opinion! Of course, a fair chance of weak data is to be expected from small samples. But even still, 60-70 percent of bayes factors from this kind of experiment would be informative and that’s not as bad as he makes it sound.
Sam doesn’t buy my argument that error rates shouldn’t influence our interpretations of strength of evidence. I think he has a different interpretation of what an “error” is than I do. I’m glad to encounter differences in opinion like this because it really makes me think about if my view makes sense. First thing first: defining errors. Can we make errors if we are simply evaluating evidence? Surely data can be misleading but as long as we interpret the data appropriately (relatively strong, weak, BF = 20, BF = 1, etc.) we are immune to errors.
Second: what would count as an error if we grant that we could make one? Is coming away from an experiment saying, “I obtained weak evidence, so I didn’t learn anything” an error? Sam seems to think so (“In the end, if a BF is likely to mislead you into believing the wrong thing it still seems inadequate for solving our problems” despite 2-3% chance of strong misleading evidence), but I don’t see how. In fact, I think Sam is making an error when he says that saying “I didn’t learn much” is an error! Declaring ignorance is not an error, how could it be? My perception is that he is thinking of this in terms of type 1 and type 2 errors (you can tell by his use of “power” with quotations around it). Thinking about errors is fine for bayesians (I don’t think about them much though, honestly), but thinking about strict type 1 and 2 errors is a mistake. There are no fixed alphas, betas, or evidential thresholds. Evidence can be misleading, but obtaining weak evidence is not an error!
Coursera update: Finished the final lectures and final quiz for the R programming course. The topic of this week was simulating data and using R Profiler. R Profiler seems cool but I haven’t had a need for it yet. The simulation codes are pretty cool though. The bread and butter of many papers is to show examples through simulations so you’ve got to know it to keep up. Still have to do the last programming assignment (worth 25 points).
Coursera update: Finished the final assignment for R programming. Hard stuff, but not impossible. Once you manage to get the first program working for the assignment then the other 2 are just reconfigurations with slight tweaks.
Working on a new blog post. I am writing about likelihoods and why they are so cool. It’s also a good chance for me to practice writing functions in R. I’ve written a very simple one that lets the user plug in binomial data and it spits out the likelihood curve, along with a likelihood ratio result for two parameter values. Pretty fun stuff.
Also- lots of discussion happening over at the BayesFactor blog. Richard Morey has had a correspondence with Greg Francis in the comments about the test of excess significance and done some simulations.
I made a comment on Richard’s post. I won’t type it all here or provide a summary, you’ll have to go read it yourself at Richard’s blog 😛 [update 3/31: comment disappeared…. 😦 eaten by the html monster.]
Sam Schwarzkopf posted a preprint of his paper that was sadly rejected recently. I thought it was a cool concept, and I hadn’t really read about bootstrapping before so it was pretty informative to me. I think it could have used a bit of an intro as to why bootstrapping is a desirable procedure. He had a sentence or two about avoiding model assumptions and assigning priors, but it read to me like he assumed a lot of background knowledge from the reader with regard to bootstrapping. A single, solid paragraph at the beginning detailing the what bootstrapping is and the pros and cons would go a long way. Bootstrapping seems to be pretty handy if you have reason to think your modeling assumptions won’t hold. I don’t know enough about it to really be able to interpret the results Sam presents. Perhaps a topic to look more into.
Also, started a new page here called “Brain Bayes Factors” where I plan to write book reviews. Check it out if you are looking for reading material!