Determining Membrane Orientation of Transmembrane Proteins

Description

Background

The TmDet algorithm, developed in the early 2000s by the Protein Structure Research Group at the Institute of Enzymology (now the Institute of Molecular Life Sciences), is capable of determining the relative positioning of transmembrane proteins within a membrane, solely based on the 3D coordinates of the protein's atoms. A significant advantage of the algorithm is its ability to function well on low-resolution structures where only the backbone atoms are visible or where parts of the structure are missing. The initial version could also annotate protein chains, determining the locations of transmembrane regions, re-entrant loops, interfacial helices, and beta-barrel interior segments based on the membrane's position.

When the method was developed, only a few hundred transmembrane protein structures were known. Since then, thousands of structures have been solved, exhibiting significantly different structural properties compared to the first ones. Structures of transmembrane proteins in curved membrane regions, such as mechanosensitive ion channels, have emerged, along with complex membrane structures embedded in double membranes, like bacterial transport systems or connexins in epithelial cells. The advent of the AlphaFold structure prediction method introduced a new challenge: how to determine the membrane localization of a poorly modeled transmembrane protein where the algorithm has folded globular domains alongside the membrane region. Since the TmDet algorithm was designed to detect actual membrane localization, it was unable to determine membrane regions in these cases due to the incorrectly localized globular domains.

Aims

To address these issues, we developed a new version of the TmDet algorithm that can:

  • more accurately annotate segment types after determining membrane localization
  • detect curved membrane surfaces
  • determine the location of both membranes of transmembrane proteins embedded in double membranes
  • determine the potential membrane plane in poorly modeled protein chains, identify membrane-localized but non-membrane-belonging chain segments (false positive membrane segments), and identify membrane-belonging but modeled outside the membrane segments (false negative membrane segments), thus providing an estimate of the model's quality without comparing the structure to a real structure

Method

The program first determines the relative position of the membrane and protein, and then annotates the protein chain regions based on this, determining the sequence boundaries and type of each region. When determining membrane planes, the algorithm first searches for a rotation axis of symmetry between chains with identical amino acid sequences, and if found, it checks if the symmetry axis can be the normal vector of the membrane plane. If so, the algorithm continues with annotation. If membrane planes cannot be identified along the symmetry axis, or if no symmetry axis is found between the chains, the algorithm searches for the best membrane definition by rotating a normal vector at discrete steps in 4π directions at the protein's center of mass, as in the original algorithm. In the case of a curved membrane, the algorithm places the center of a sphere at different discrete points on the normal vector and tests whether the protein can be placed in the resulting spherical shells.

The first step in annotation is to determine the type of chains in the protein (non-tm, alpha or beta). Then, depending on the chain type, either the identification of the outer segments of the beta barrel and the pieces inside it is performed based on the secondary structure and the externally accessible surface; or in the case of alpha-helical chains, the identification of loops, interfacial helices, and transmembrane helices is performed based on vectors fitted to the secondary structures and the relative coordinates to the membrane.

Fragment analysis currently works only on one chain. Its essence is that after detecting globular, compact units based on the protein's structure, the protein is cut into these pieces, and the TmDet algorithm is run separately on each piece, and based on the results obtained for each piece and the assembly of compatible membrane definition units obtained for these, TmDet gives the correctly localized and incorrectly localized segments, as described earlier.