To cif or not to cif, that is the question...
I spent a disheartening afternoon staring at some of the new extensions of mmCif. mmCif is the data dictionary for the RSCB protein databank. http://www.rcsb.org/pdb/ It was designed by and for Xray crystallographers. Unfortunately, other people are trying to use the data in ways that were unforseen by the committee that designed the data dictionary.
Bad database design seems to be a hallmark of bioinformatics projects. Most biologists don't understand the importance of a good data model, or how to anticipate how the data might be mined or used. People who know how to model data don't have the domain expertise to understand what they're modelling. So, dictionaries like mmCif become entrenched. They're great for the handful of domain experts who want to capture every aspect of their experiments, but painfully hard to use in any sort of development project.
Well, enought ranting. The moral of the story is, if you're a biologist, get to know a databaser before you do your next project. If you're a databaser, talk to a biologist one day - they might just need your help.
10:59:58 PM
|