What is MFIB?

Mutual Folding Induced by Binding (MFIB) is a repository of protein complexes for which the folding of each constituent protein chain is coupled to the interaction forming the complex. This means that while the complexes are stable enough to have their structures solved by conventional structure determination methods (such as X-ray, NMR or cryo-EM), the proteins or protein regions involved in the interaction do not have a stable structure in their free monomeric form (i.e. they are intrinsically disordered/unstructured).

What constitutes being intrinsically disordered/unstructured?

It is requisite that all interacting protein regions in MFIB complexes should be intrinsically disordered/unstructured. This means that in their monomeric state these protein regions lack a stable tertiary structure and thus their structure cannot be determined. In accordance, structures presented in MFIB were checked to exclude protein regions that have a solved monomeric structure in PDB. However, this condition leaves room for proteins with a wide range of different structural properties.

Some protein regions, such as the ACTR domain of the nuclear receptor coactivator 3, are near-random coils. This means that in their isolated monomeric form protein segments alternate rather freely through a wide range of different conformations, exhibiting the 'most disordered' state. In contrast, the binding partner of ACTR (NCBD domain from CBP) is a molten globule meaning that although not fully stable, it does contain a significant amount of residual structure (stable and near-stable secondary structural elements) in its unbound form. The interaction of ACTR and CBP forms an ordered complex where the two different kinds of disordered proteins stabilize each other, and in accordance, it is included in MFIB (MF2201001).

An other – quite extreme – example of such non-stable structural elements in monomeric form is presented by the nucleoside diphosphate kinase. The native monomeric enzyme has no stable structure, but it forms a stable hexamer with six identical chains in interaction. However, a single amino acid mutation (the P105G substitution), which affects a loop implicated in subunit contacts, yields a protein that reversibly dissociates to folded monomers. This means that the monomeric kinase subunits are on the verge of order and hence mark the other, 'least disordered' end of the spectrum. However, as this behaviour still fits the criteria of unstructured monomers forming an ordered complex, the native form of nucleoside diphosphate kinase is included in MFIB (MF6110001).

What evidence is needed for complexes to be included?

The primary requisite for the inclusion of a complex into MFIB is to have experimental data that proves that all constituent protein chains that take part in the interaction only adopt a stable structure as a result of the complex formation. There are two principal ways this can be shown:

  • First, in some cases all protein chains (or at least the interacting regions of the proteins) have been shown to be intrinsically disordered in their monomeric form (such as for MF2201002). This is usually a challenging task, as disordered proteins are difficult to handle experimentally (they are generally unstable being prone to be preferentially targeted for proteolysis). Furthermore, this approach can only be applied for heteromeric complexes, as the constituent chains of e.g. a homodimer cannot be studied in their monomeric form under native conditions (as they would dimerize).

  • Second, in some cases the folding of all participating chains was measured simultaneously in the context of the complex. In essence, the basis of these measurements is that a protein complex is dissociated by varying some external factor (most commonly the temperature or the concentration of some denaturing agent, such as urea or guanidine hydrochloride). The structure content of the protein complex (or the solution of monomeric proteins after denaturation) is monitored throughout the process. If the tertiary structure of the monomers disappears exactly when the complex is broken up, the complex is thought to form via a mutually coupled folding-and-binding process, where the folding of the monomers only happens upon interaction. This approach is particularly useful and common for homo-oligomeric complexes (such as for MF2110017).

For each entry in MFIB there should be enough experimental evidence for including it. This means that either there is evidence for the intrinsically unstructured nature of all participating protein chains, or there is evidence for the structured complex itself to arise directly from the interaction of unstructured monomers. In some rare cases both types of evidence are available for a complex, as in the case of MF2201001.

While the majority of entries in MFIB have direct experimental evidence supporting the disordered nature of all interacting chains, in some cases disorder evidences were assigned from proteins bearing a high level of homology. It has been shown that in case of ordered proteins a 30% sequence identity means true homology for the overwhelming majority of cases, and in the case of sufficiently long alignments the adoption of the same fold (Rost, 1999, PMID:10195279). No such systematic study has been conducted concerning protein disorder. However, it is safe to assume that if 30% identity is generally sufficient for two ordered proteins to share the same fold, the significantly higher level of identity/similarity guaranteed by belonging to the same UniRef90 cluster or bearing the same Pfam object should be sufficient for belonging to the same structural class (ordered or disordered).

Why aren't there complexes with DNA/RNA/other macromolecules?

The primary focus of MFIB is the collection of complexes where the folding of each participating macromolecule is linked to the interaction that stabilizes the complex. While there are proteins that only fold upon the interaction with DNA/RNA or other molecules (such as lipids or the membrane itself), such complexes are not included. The primary reason behind this is that protein-protein interactions are markedly different from protein-DNA or protein-RNA interactions and we opted to keep MFIB specific to the former. Furthermore, while in complexes where proteins fold with the help of DNA (like many dimeric transcription factors) or RNA (like many ribosomal proteins), the structure formation of the proteins is linked to the interaction, but the DNA/RNA partner usually already has a stable structure prior to the complex formation.

I know a certain complex fits the above criteria, but it still isn't included in MFIB. Why?

During the construction of MFIB several databases were integrated (like PDB, UniProt, Pfam, IDEAL and DisProt) to provide a means for the systematic collection of protein complexes with mutual synergistic folding. The results of this collection were manually curated and complemented with extensive literature searches to widen the coverage of MFIB as much as possible. However, undoubtedly many complexes would fit MFIB but are not included yet. If you know such a complex, please let us know at mfib(at)ttk.mta.hu so we can include it.

Are proteins in MFIB disordered on their entire length? Or can they contain domains?

Many proteins are modular and contain domains that act mostly independently from each other in a structural sense. In accordance, the inclusion in MFIB only requires that the region of the proteins that directly take part in the interaction be disordered in their monomeric forms. Other regions of the interacting proteins that do not form part of the complex can be either disordered or ordered as they do not have a primary effect on the interaction covered by MFIB.

While MFIB only concentrates on the intrinsically unstructured regions of the interacting proteins, it indicatates the extent of other regions of the same proteins as well. This is found as the 'UniProt coverage' for each protein of every entry. This value describes the fraction of the whole protein that directly contributes to the interaction (and hence is visible in the corresponding structure).

How are MFIB accessions generated?

Each MFIB entry is assigned a unique accession, which is composed of the letters 'MF' at the beginning, followed by 7 digits. The 7 digits form a randomly assigned number that guarantees the uniqueness of the accession.

Why are certain PDB structures modified?

All protein complexes that are included in MFIB have a solved structure deposited in the PDB. However, in some cases the original PDB structure does not (or does not only) show the biologically relevant, core interaction. To remedy this, in these cases we generated a modified PDB file. A description of the transformations made on the PDB is given for each entry where relevant. These transformations can be the omission of protein chains (to reduce possible duplicity present in the PDB structure, such as for MF2120011), the generation of protein chains (based on the biomatrices described in the PDB file, e.g. for MF3140001), or truncations of protein chains (to only include regions of proteins that mediate the highlighted interaction, e.g. MF2120024). For each entry the modified PDB files are available for download and are displayed in the embedded structure viewer.

How are classes and subclasses defined/assigned?

MFIB entries are grouped into classes and subclasses. As of now, 9 classes and their constituent subclasses - 119 in total - are defined.

Each complex in MFIB is assigned a class and a subclass during the manual annotation and curation step. This means that there is no automated categorization and no algorithm to group a certain complex into any class or subclass. The grouping is done by the curators of the given entry; however as almost all groups and subgroups represent a structurally well-defined set of complexes, the grouping is near-trivial in most cases. The only class that does not represent a structurally homogeneous set of complexes is 'Homooligomeric enzymes'. In contrast to other classes the elements of this class cover a wide range of structures; however, the function of the constituent complexes provides a firm basis for classification.

Can I use MFIB for my work?

MFIB is freely available for use in academic works - we only ask to cite MFIB if it has a substantial contribution to your project. Please use the reference below:

Erzsébet Fichó, István Reményi, István Simon and Bálint Mészáros:
MFIB: a repository of protein complexes with mutual folding induced by binding
Bioinformatics. 2017 Nov 15;33(22):3682-3684
PMID: 29036655
doi: 10.1093/bioinformatics/btx486

If you would like to use MFIB in a non-academic environment, please contact us at mfib(at)ttk.mta.hu