- Neuro... what? What on earth is neuroinformatics… you mean, bioinformatics?
Indeed neuroinformatics takes inspiration from bioinformatics, which refers to the combination of omics data (genomic, transcriptomic, proteomic…) and its understanding with machine learning tools (well, and other computer science disciplines). Bioinformatics has brought enormous advances for instance in cancer research, which was itself one of the driving forces of this further development. Probably Bioinformatics was a discipline crystallizing during the Human Genome Project. Genetic-related data sets are large. In several cases they are also public, an important point related to what we are discussing today.
But let us start with the size. The data size made several techniques and computer science technology obsolete so a new research field was born. Analogously the combination of neuroimaging data including EEG, which is large as well, and informatics has been denoted as neuroinformatics. I came first in contact with the neuroinformatics community through the conference of the International Neuroinformatics Coordinating Facility (INCF) in Leiden last August. I had the impression the community is building around the concept of Computer Science for Neuroimaging Big Data.
©neuroelectrics
I attended there a very interesting talk by Michael Milham, Director of the Center for the Developing Brain at the Child Mind Institute. It dealt with the small translational value of machine learning methods as described in the literature when trying to bring them into the clinical domain. He was mostly referring to the study and discovery of biomarkers based on fMRI data, but I think this could be applied to any neuroimaging modality. For instance Milham has authored some papers on how to transform machine learning performance measures, which we have discussed in older posts, into useful measures for clinicians, e.g. by incorporating disease prevalence to sensitivity and specificity. Particularly he underscored the importance of the effect size in any conclusion derived from a data analysis approach. As you may know effect size is related with sample size, i.e. with the number of data records you are including in your study.
Once you have the data … Jim Bezdek has defined pattern recognition as the discovery of structure in data. Pattern recognition was the precursor of data mining and therefore of Big Data. Let your data talk! This is the old mantra of Computational Intelligence practitioners, who let their models be driven by data. This data-driven approach is opposite to the model-driven approach, where you start with an a priori model and try to validate its predictions with the data. Therefore working with Computational Intelligence requires you to get data to drive your models, but where to find it? Data is fundamental for this type of approaches. The Neuroinformatics (as well as other communities not only in research but also in innovation) have recognized the huge value of data. There is an ongoing effort on neuroimaging data sharing.
The US government launched through the NIH already in 2006 the Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC), which includes several data sets. On the other hand ICNF, which is though a private organization, is building such a repository.Other health related databases are included in Open Data initiatives all over the world: the US, Europe, and the United Kingdom. In my opinion open data initiatives are important for epidemiological studies. More related data is originated in public health agencies, which are committing more and more to an Open Data approach like the US Department of Health and Human Services, the Health and Social Care Information Centre in the UK, and the World Health Organization. Even private companies are targeting data sharing platforms like Quandl, or visualization platforms like GapMinder, which includes some interesting plots of health data as well.
But you might be wondering, are these data repositories really BigData? Let us leave the discussion for another post, or, well if you like you can start commenting on this issue below …