Integrated Software for Integrative Structural Biology

Conference
Date: 15-Aug-2017 7:43 CEST to -

Contact: Chris Morris

 

 

Structural biologists use a variety of software tools to help their work, from data collection, through the creation of structural models, to finding biological significance in the results. Some of these tools work together well, with seamless data transfer and a consistent user interface. Others do not, often because they have been developed separately, by groups that are part of different subdisciplines of structural biology, e.g. X-ray crystallography and nuclear magnetic resonance spectroscopy. 
Now structural biologists are targeting mesoscale structures including the macromolecular machinery of the cell. Increasingly often, they combine different techniques in a single large research project, aiming to create multiscale models. This raises the challenge to software developers of working together to create an integrated and extensible toolset that supports a range of experimental techniques, as well as modelling and simulation methods.
Such a toolset will also allow synergy between researchers beyond planned collaborations, by ensuring for example that a model that has been deposited in a public database can easily be reused within an investigation that is based on complementary techniques.
This workshop will discuss progress towards these goals and challenges along the way. The workshop is timely as there is now a strong drive for structural biologists to look beyond their own subdiscipline. The European ESFRI projects, such as INSTRUCT (http://www.structuralbiology.eu/) and ELIXIR (http://www.elixir-europe.org/), are encouraging multi-disciplinary approaches, and there is also a desire to fit individual experimental results into a systems view of the cell or organism. These scientific drivers must be supported by a suitable software environment. While this is widely recognised, there is as yet no coherent effort in this direction.
The workshop will consist of invited talks from leading computational structural biologists and modellers, supplemented by talks selected from responses to a Call for Papers. We take this approach in order to ensure input from younger software developers. Areas to be covered in the Call include (but are not limited to): * work to connect existing software packages* novel software to support combined techniques* formats for structural data* the "last mile" problem: securing community take up of innovative tools* position papers about future challenges
Submissions will be valued more if they report not only successes but also problems that remain to be solved. We will also schedule significant time for discussion.

Individual techniques are well supported by existing software, for example CCP4 [1] and ARP/wARP [2] for macromolecular X-ray crystallography, CCPN for NMR spectroscopy [3], Xmipp for electron microscopy [4], SWISS-MODEL [5] for protein structure Homology Modeling, Gromacs [6] for molecular simulation, and many others. 

Software for interdisciplinary studies is more patchy, and usually covers specific cases. For example, HADDOCK [7] is primarily a docking program, but can use restraints taken from a wide variety of experiments such as changes in NMR chemical shifts or mutation data. Fitting atomistic models from MX or NMR into lower resolution electron microscopy maps is of increasing relevance, as the technique of cryoEM gains in importance, and there are a number of software tools dedicated to this, for example VEDA (Jorge Navaza). The molecular dynamics flexible fitting (MDFF) method implemented in NAMD [8] can be used to flexibly fit atomic structures into EM density maps.

Different experimental techniques relate to each other through aspects of the structural model. This could be coordinate data (e.g. from MX or NMR), distance restraints (NMR or FRET), volume data (EM, SAXS/SANS or low resolution MX), or features (segmentation of EM volumes or tomograms). A pre-requisite for interdisciplinary software is a common understanding of structural features. For example, how to interpret multiple side chain conformations (e.g. from high resolution crystallography) or ensembles of models (e.g. from NMR or from modelling). 

Interdisciplinary studies using multiple softwares can be aided to some extent by standard data formats. Despite some well-known limitations, the Protein Data Bank (PDB) format is widely used for representing atomic coordinate data. The MRC format for cryoEM volume data is derived from the CCP4 format for electron density, and in principle these are inter-operable. Nevertheless, formats diverge and work is needed to maintain interoperability.

Moving beyond file formats, there is a need for ontologies that cover broad areas of structural biology. There is a need to include metadata for different experiment types, as well as the raw experimental data. One of the most comprehensive data format in actual use for structural biology experiments is the pepcDB data interchange format: 

http://pepcdb.pdb.org/PepcDB/help/pepcdbhowtoprep.html

The problems of working on a multi-disciplinary structural biology project, using a diverse set of software tools, is well known, but it is not clear how to address this problem. The workshop will seek to identify the specific areas where progress can be made, and discuss possible solutions. Questions for consideration include:

* multi-disciplinary software versus data conversion software to transfer between stand-alone packages

* the need for standardisation of data formats or ontologies

* validation and comparison of results between different techniques

* ensuring easy availability and user-friendliness of software

Chris Morris

Martyn Winn

Alexandre Bonvin

Jose Carazo

Keith Wilson