Dear collaborator,

Over the years I have collected a lot of data that I consider very valuable. This includes in particular behavioral, MEG, MRI, and genetic data on dyslexic and control adults and children. I have already published some analyses of most of this data. But this is rich data, that can be used for more analyses, some of which were not even imagined at the time of conception, but may lead to important discoveries.

Some of these analyses I intend to carry out with my team. But maybe you have a great hypothesis that can be tested on my data, that I didn’t think about. I’d be happy for one of us to test it. Or maybe you master a great new analysis technique that I don’t, and that might generate important insights. I’d love to hear about it and let you have a go at it.

I am also acutely aware that my datasets have a limited size, and therefore statistical power. The problems with small datasets in dyslexia neuroimaging research have been discussed at length in a review paper that my team members and I recently published (Ramus et al. in press). Therefore, I am very keen to pool my data together with other compatible datasets, in order to increase statistical power. We have already started doing that with some collaborators (Landerl et al., 2013; Becker et al., 2014; Jednoróg et al., 2015; Płoński et al., 2017). Of course I am keen to pool my data with yours as well. Or perhaps you have created a data repository for our community, where my own data would play a useful role, and might be analysed as part of a very large dataset, by you, me, or other researchers. I am open to that too.

The only condition that I require for the use of my data is that any analysis including it should be preregistered.

Preregistration challengeCredit: Open Science Framework

If you are not familiar with the concept, please read any of these articles:

Preregistration is the new standard used in my team, for all new projects before data collection, and for all new analyses of old data, before the start of the analysis. This is also the new standard that I intend to require from any collaborator who uses my data.

Why? My team members and I have already explored quite extensively our data. It’s fine to continue exploring it, but the more we do, the higher the risk of finding patterns that are statistically significant just by chance. I don’t want to publish false positive results based on my data. I may not be able to completely prevent it, but at least I’d like to minimize the risk. For this purpose, we will constrain our explorations by registering very precisely the hypotheses that we are testing and the analyses that we are planning. The fact that the data is already collected is not a problem for preregistration, as long as we declare it and the preregistration is submitted before the analysis is started.

Writing a preregistration is the perfect opportunity for you and I to discuss your specific hypothesis, your analysis method, and get all the nitty-gritty details just right before you get into the data. The more detailed the preregistration, the better (yes, even the parameters of pre-processing pipelines, and all the covariates we need to include in the analysis). You may initially find that it takes a frustratingly long time to write a satisfactory preregistration. But from my experience, you make up for it at the time of data analysis (just unroll the plan – or even better, run the script that was written in advance) and at the time of paper writing (the methods section is already written). Thus, do not consider my requirement as a burden. Over the last year, I have found this self-imposed constraint to be immensely useful. I really think that this is improving the quality of our work by obliging us to fully think through our analyses, while limiting open-ended explorations and thus the risk of false positive results. It will improve the quality of yours too.

So, if you’d like to use my data for new analyses, the best way to get started is to send me a draft preregistration with your hypothesis and analysis plan. We will then work together on it to adjust it to the specifics of my data, and turn it into the best possible analysis plan. Then we will post it on an appropriate website such as the OSF (https://osf.io/), using one of the available templates (https://osf.io/zab38/wiki/home/).

If you’d rather not burden yourself with such a cumbersome procedure, and prefer to dig into the data immediately, try a number of analyses and then decide which one is best, that ‘s fine too. Just find other data to play with.


References

Becker, J., Czamara, D., Scerri, T. S., Ramus, F., Csépe, V., Talcott, J. B., … Schumacher, J. (2014). Genetic analysis of dyslexia candidate genes in the European cross-linguistic NeuroDys cohort. European Journal of Human Genetics, 22, 675‑680.
Jednoróg, K., Marchewka, A., Altarelli, I., Monzalvo, K., van Ermingen-Marbach, M., Grande, M., … Ramus, F. (2015). How reliable are grey matter disruptions in specific reading disability across multiple countries and languages? Insights from a large-scale voxel-based morphometry study. Human Brain Mapping, 36, 1741‑1754.
Landerl, K. *, Ramus, F. *, Kristina Moll, Heikki Lyytinen, Paavo H. T. Leppänen, Kaisa Lohvansuu, … Gerd Schulte-Körne. (2013). Predictors of developmental dyslexia in European orthographies with varying complexity. Journal of Child Psychology and Psychiatry, 54, 686‑694.
Płoński, P., Gradkowski, W., Altarelli, I., Monzalvo, K., van Ermingen‐Marbach, M., Grande, M., … Jednoróg, K. (2017). Multi‐parameter machine learning approach to the neuroanatomical basis of developmental dyslexia. Human Brain Mapping, 38(2), 900‑908. https://doi.org/10.1002/hbm.23426
Ramus, F., Altarelli, I., Jednoróg, K., Zhao, J., & Scotto di Covella, L. (in press). Neuroanatomy of developmental dyslexia: pitfalls and promise. Neuroscience & Biobehavioral Reviews. https://doi.org/10.1016/j.neubiorev.2017.08.001