What is MFIB?

Mutual Folding Induced by Binding 2.0 (MFIB 2.0) is a repository of protein complexes for which the folding of each constituent protein chain is coupled to the interaction forming the complex. This means that while the complexes are stable enough to have their structures solved by conventional structure determination methods (such as X-ray, NMR or cryo-EM), the proteins or protein regions involved in the interaction do not have a stable structure in their free monomeric form (i.e. they are intrinsically disordered/unstructured/unstable).

What constitutes being intrinsically disordered/unstructured?

It is requisite that all interacting protein regions in MFIB complexes should be intrinsically disordered/unstructured. This means that in their monomeric state these protein regions lack a stable tertiary structure and thus their structure cannot be determined. In accordance, structures presented in MFIB were checked to exclude protein regions that have a solved monomeric structure in PDB. However, this condition leaves room for proteins with a wide range of different structural properties, i.e. random coil-like extended chains, pre-molten globules and molten globules all classify as intrinsically disordered proteins (IDPs), even though they represent different levels of compactness and secondary structure content.

What evidence is needed for complexes to be included?

In MFIB 2.0, three different evidence levels were introduced: direct and indirect evidence and insufficient evidence (candidate).

The primary requisite for the inclusion of a complex into MFIB with “Direct evidence” label is to have experimental data that proves that all constituent protein chains that take part in the interaction only adopt a stable structure as a result of complex formation. There are the following principal ways to conclude that a protein complex goes through mutual synergistic folding:

  • First, in some cases all protein chains (or at least the interacting regions of the proteins) have been shown to be intrinsically disordered in their monomeric form (such as for MF2201002). This is usually a challenging task, as disordered proteins are difficult to handle experimentally (they are generally unstable being prone to be preferentially targeted for proteolysis). Furthermore, this approach can only be applied for heteromeric complexes, as the constituent chains of e.g. a homodimer cannot be studied in their monomeric form under native conditions (as they would dimerize). For protein complexes having disorder evidence for all chains, MSF is supported by direct evidence, therefore they were labeled as such.

  • Second, in some cases the folding/unfolding of all participating chains was measured simultaneously in the context of the complex. In essence, the basis of these measurements is that a protein complex is dissociated by varying some external factor (most commonly the temperature or the concentration of some denaturing agent, such as urea or guanidine hydrochloride). The structure content of the protein complex (or the solution of monomeric proteins after denaturation) is monitored throughout the process. If the tertiary structure of the monomers disappears exactly when the complex is broken up, the complex is thought to form via a mutually coupled folding-and-binding process, where the folding of the monomers only happens upon interaction. These complexes usually show two-state folding/unfolding behavior, or if there are more than two states, none of those is a folded monomer. This approach is particularly useful and common for homo-oligomeric complexes (such as for MF2110017). For protein complexes showing two-state folding/unfolding behavior, or a lack of folded monomeric forms along their folding paths, MSF is supported by direct evidence, therefore they were labeled as such.

  • Third, in some cases the authors of the article describing the complex claimed that certain features of the complex structure clearly imply that the isolated monomers would not be stable. Author statements claiming that the hydrophobic core of the dimer extends through the monomer-monomer interface fall into this category. Complexes with such author statements were also labeled with “Direct evidence”.

In many cases there is no direct evidence that the individual chains are disordered, or that the complex loses all structure content at the same time when the interaction is broken up. In such cases characteristic features of the complexes can still serve as indicative evidence for MSF. Complexes showing at least 3-4 of the following features (mainly based on author statements but also on observations of the curators) received the “Indirect evidence” label:

  • Very large relative interaction surface
  • The buried surface is mainly hydrophobic
  • Highly intertwined/interdigitated/intimate complex structure and/or extensive domain swapping
  • Beta sheet augmentation or helix packing/coiled-coil/helix bundle forming interactions can be observed between segments of the monomers
  • Functional sites of the complex (active site, cofactor-binding site, etc.) lie on the monomer-monomer interface, therefore the complex would lose its function if dissociated to individual monomers
  • Only oligomeric species could be detected in solution by dedicated biophysical methods (such as gel filtration, analytical ultracentrifugation (AUC), dynamic light scattering (DLS), SDS-PAGE), monomeric species could not be detected even at low protein concentrations.

Finally, there are some cases (~5% of the MFIB 2.0 database) where the properties of the structure (ratio of molecular surfaces, types of atomic contacts etc.) indicate that the complex is likely formed through mutual synergistic folding. The authors of the article describing the complex structure may make some statements that support this, but there isn’t enough direct or indirect evidence to conclude on MSF. At the same time, the curators could’n identify any evidence against MSF. These complexes received the “Insufficient evidence (candidate)” evidence level label.

How similar structures were collected?

It has been shown that in case of ordered proteins a 30% sequence identity means true homology for the overwhelming majority of cases, and in the case of sufficiently long alignments the adoption of the same fold (Rost, 1999, PMID:10195279). When we scanned the PDB for candidate entries, we found many structurally similar cases, bearing the same sequence domain(s). We clustered all entries using Foldseek (with 0.3 TM-Score threshold) and PFAM annotations. The resulting clusters were sometimes manually adjusted when a subgroup in the cluster had additional elements or domains. Clusters with more than one structure were assigned with a representative domain. In addition to automatic collection of these structures, all entries in these clusters were manually checked to avoid the sporadic error of sequence and structure searching algorithms.

How are classes and subclasses defined/assigned?

MFIB entries are grouped into classes and subclasses. As of now, 12 classes and their constituent subclasses are defined.

Each complex in MFIB is assigned a class and a subclass during the manual annotation and curation step by the curators (the classification is not automated). While almost all classes represent a structurally well-defined set of complexes, some classes do not represent a structurally homogeneous set of complexes, such as 'Homooligomeric enzymes' and ‘Bacterial toxin-antitoxin systems’. In contrast to other classes the elements of these cover a wide range of structures; however, the function of the constituent complexes provides a firm basis for classification.

Class/subclass definitions are more general than representative sequence domain definitions based on clustering.

How surface and contact properties are calculated?

Several unique structural features of MSF complexes can be used to discriminate them from other types of protein complexes (see more at ‘What evidence is needed for complexes to be included?’). Many of these can be automatically calculated from the PDB structure. We selected those features that best discriminate MSF complexes, and we display their distribution over the database (see “Surface and contacts features” in the Evidence panel), as well as the calculated values of the given structure on each page. The selected features are

  • buried/surface (ratio of buried area/all surface area)
  • interface/all (ratio of interface area/all surface area)
  • nterchain/all contacts (ratio of interchain atomic contacts/all atomic contacts)
  • interchain HH (ratio of interchain hydrophobic-hydrophobic/all atomic contacts)
  • interchain HP (ratio of interchain hydrophobic-polar/all atomic contacts)
  • interchain MM ratio of interchain main-main (or backbone-backbone)/all atomic contacts)

For a detailed description of how these properties were calculated please see PMID: 31415767. The only change in the way how these features were calculated for MFIB 2.0 is that Voronota was used to assess contact and surface values.

Why aren't there complexes with DNA/RNA/other macromolecules?

The primary focus of MFIB is the collection of complexes where the folding of each participating macromolecule is linked to the interaction that stabilizes the complex. While there are proteins that only fold upon the interaction with DNA/RNA or other molecules (such as lipids or the membrane itself), such complexes are not included. The primary reason behind this is that protein-protein interactions are markedly different from protein-DNA or protein-RNA interactions and we opted to keep MFIB specific to the former. Furthermore, while in complexes where proteins fold with the help of DNA (like many dimeric transcription factors) or RNA (like many ribosomal proteins), the structure formation of the proteins is linked to the interaction, but the DNA/RNA partner usually already has a stable structure prior to the complex formation.

I know a certain complex fits the above criteria, but it still isn't included in MFIB. Why?

During the construction of MFIB several databases were integrated (like PDB, UniProt, Pfam and DisProt) to provide a means for the systematic collection of protein complexes with mutual synergistic folding. The results of this collection were manually curated and complemented with extensive literature searches to widen the coverage of MFIB as much as possible. However, undoubtedly many complexes would fit MFIB but are not included yet. If you know such a complex, please let us know at mfib(at)ttk.hu so we can include it.

Are proteins in MFIB disordered on their entire length? Or can they contain domains?

Many proteins are modular and contain domains that act mostly independently from each other in a structural sense. In accordance, the inclusion in MFIB only requires that the region of the proteins that directly take part in the interaction be disordered in their monomeric forms. Other regions of the interacting proteins that do not form part of the complex can be either disordered or ordered as they do not have a primary effect on the interaction covered by MFIB.

Are MFIB complexes undergoing mutual synergistic folding on their entire length? Or can they just contain MSF-based (sub)domains?

In MFIB 2.0 we included structures with multi(sub)domain monomers where only one of the (sub)domains undergo mutual synergistic folding, but the other(s) do not. It is explicitly stated in the bottom part of the entry pages in “evidence coverage”, if the full length of the complex was proved to undergo MSF or only some parts of it. Partial MSF was annotated for 20% of the entries.

Why are certain chains not considered in the PDB structures?

All protein complexes that are included in MFIB have a solved structure deposited in the PDB. In all cases we used the most probable oligomeric state as defined by PDBe. However, in some cases the original PDB structure contains additional elements (DNA, other protein chains) that are not relevant for mutual synergistic folding. To remedy this, we removed such elements. The “Note” section in the structure summary contains information about which chains were considered during the annotation (such as for MF2120007).

Can I use MFIB for my work?

MFIB is freely available for use in academic works - we only ask to cite MFIB 2.0 if it has a substantial contribution to your project. Please use the reference below:

Erzsébet Fichó, István Reményi, István Simon and Bálint Mészáros:
MFIB: a repository of protein complexes with mutual folding induced by binding
Bioinformatics. 2017 Nov 15;33(22):3682-3684
PMID: 29036655
doi: 10.1093/bioinformatics/btx486

If you would like to use MFIB in a non-academic environment, please contact us at mfib(at)ttk.hu