Comparison of E.coli and H.pylori networks using XINViewer

CH391L, Fall 2004

Eva-Maria Strauch, Razvan Surdulescu (c) 2004

Table of Contents:

Introduction --top--

Recent improvements in genome scale approaches like two-hybrid screens, protein chips, or mass spectrometry have generated enormous amounts of protein-protein interaction data. The current emerging problem is to cross-link and also verify these interactions in order to construct high-quality protein interaction networks.

In this project, we examined the DIP data containing experimentally identified protein-protein interactions for E.coli and H.pylori. We developed a graphical tool for displaying these networks in order to facilitate their analysis, and we programmatically identified clusters of proteins that might have salient biological meaning. We look for two high-level types of clusters that tend to appear in protein interaction networks: hubs and cliques. A hub is a cluster that has a central node from which radiate edges towards outer nodes, much like the spokes in a wheel. A clique is a cluster where every node is fully connected with all the other nodes in the cluster. We attempt to compare clusters in E.coli with clusters in H.pylori, as well as identify potential missing proteins from these clusters in order to augment the networks.

Methodology --top--

  1. We developed the tool XINViewer and used it to display protein-protein interaction networks of the proteins of interests of Helicobacter pylori and Escherichia coli.
  2. Two example proteins, the membrane protein SecG and the protein ftsZ participating in the cell-division apparatus, were analyzed to identify possible new interaction partners in E.coli based on the cluster found in the H.pylori.
  3. The protein sequences of H.pylori were obtained through Entrez Protein database on the NCBI web page.
  4. The protein sequences were compared to the E.coli genome through the provided sequence alignment program on the Colibri database (this database exists exclusively for E.coli).
    • FASTA includes an additional step in the calculation of the initial pairwise similarity score. This allows multiple regions of similarity to be joined to increase the score of related sequences, and thereby allows proteins with high short conserved sequences. You can read more about this in Improved tools for biological sequence comparison (Proc. Natl Acad. Sci U S A. 1988 April; 85(8): 2444-2448).
    • BLAST was used for the confirmation of the results.

Benefits of XINViewer --top--

The protein-protein interaction XIN data from DIP comes in raw XML format. This format is difficult for a human to read, especially when it comes to searching for groups of related nodes that have some desired property. Since the XIN data represents a network (a set of nodes with links between them), displaying it in graphical form makes it far easier for a human to manipulate, visually notice patterns, and in general work with the network. This was the original impetus behind developing XINViewer.

The network is laid out in a radial fashion, where the selected node is in the center, its neighbors are laid out in a circle around it, their neighbors in another circle and so on. This makes it easy to focus and see the characteristics of the center node, as well as possible paths that radiate out from the center node to various degrees. Additionally, the user can zoom in and out of the network, which makes it easier to see either global features of the network or local features.

When searching for text in the network, that node that contains that text is highlighted in the network. The user can immediately see both the node as well as all of its neighbors. This provides far more information than simply searching for text in the XIN file, where the node neighborhoods are not visible.

When searching for hubs or cliques, these clusters are also highlighted in the network. The user can immediately see their layout relative to other nodes, the neighbors-of-neighbors, and other revealing features. Furthermore, a particular hub or clique can be extracted from the original network into its own separate window where it can be more easily seen and manipulated.

Lastly, detailed statistics about the network can be easily computed, such as the average diameter of the network, the histogram of average path lengths and clustering coefficients, etc. This information can be very helpful in ascertaining, for example, if a protein-protein interaction network is a small world network or not, etc.

Application: Cluster comparison between E.coli and H.pylori --top--


The goal of this project is to compare clusters in H.pylori with clusters in E.coli. In particular, the clusters were examined around SecG and FtsZ.

The second example on FtsZ may lead to an interesting possible functional connection between the cell division machinery and a (so far) poorly characterized periplasmic chaperone: PpiD.

Starting material --top--

The dataset for the protein-protein interactions in H.pylori and E.coli were obtained from the Database of Interacting Proteins (DIP). The DIP database provides information on interacting protein networks in either tab-delimited text files or in the XIN format (which is basically XML).

For graphical display, we developed a tool called XINViewer, written from scratch for this project, in Java. For additional information, see the installation instructions and the user manual for this tool.

It is important to note that the existing H.pylori data set was obtained mainly through a high throughput screen like the yeast-two hybrid screen ( Rain et al., nature 2001), whereas the network between E.coli proteins is a collection of different individual experiments (so the E.coli network will appear much more sparse).

I. SecG: Part of the general secretion machinery (Sec-pathway) --top--

The major route of protein translocation in bacteria is the so-called general secretion pathway (Sec-pathway). This route has been extensively studied in Escherichia coli and other bacteria. This transporter complex is build by multimeric protein which are spanning through the inner membrane of E.coli. The core of the translocase consists of a proteinaceous channel formed by the protein complex of SecYEG and the peripheral adenosine triphosphatase (ATPase) SecA as molecular motor.


Fig. 1 the general secretion machinery in E.coli
(from Trends in Microbiology 9(10), p494)
SecG, SecE and SecY build the minimum set for translocation with SecA as the motor using ATP as fuel. Additional Sec proteins may contribute for different kinds of substrates, e.g.SecD.

The preprotein is represented by a black line, with the gray region showing the signal sequence. Steps 1-3, targeting. SecB, the Sec-system-specific chaperone, passes the preprotein to the SecA and possibly already to the translocation channel. SecA then binds to the membrane. SecY, SecE and SecG form the transporter complex. Now, the preprotein gets fiddled through the membrane like a thread through a needle.

Translocation is thought to occur in a step-wise fashion with a step of 20-30 amino acid residues at a time Therefore the whole process is repetitive. Little is known about the process, which occurs on the periplasmic side of the membrane (the inner membrane space), leading to the release and/or folding of the substrate protein.

Here is a fragment of a relevant cluster for H.pylori as displayed in XINviewer:

Fig.2 Protein interactions with SecG in H.pylori

The first protein to look at was SecG which is part of the general protein secretion machinery in the cytoplasmic membrane in most bacteria.

The small cluster of neighbors around SecG contains:

Here is a fragment of a similarly relevant cluster for E.coli as displayed in XINviewer:

interaction cluster around secG in E.coli

Fig. 3 protein-protein interaction network around the cytoplasmic membrane transporter protein SecG in E.coli
Obviously, there are fewer protein interactions reported in E.coli, but the reported proteins have been characterized.

Mainly, the data for the protein interaction network of E.coli was obtained in individual experiments, unlike H.pylori which was obtained through high through-put technology.

The small cluster of neighbors around SecG contains:

Possible drawbacks and Problems --top--

The first problem is that we have many false positives in the database. For example uvrB is a major component of the ABC endonuclease complex involved in DNA repair; it has been described in the context of the nucleotide excision repair (NER) mechanism. Therefore, it is really unlikely that it is part of the general secretion machinery.

A similar example, is SecD in H.pylori which shows interaction with a elongation factor P, another unlikely interaction.

Therefore, any conclusion gathered through this dataset has to been drawn carefully.

Another factor which needs to be taken into account is that both datasets are not complete. DIP claims to update its data sets every 3 months, but given the example of SecG in E.coli, it can be seen that as of yet not all known interaction partners have been reported. The literature shows that is interacting with at least SecE, SecY and possibly also with SecA.

There are several other databases available online which could be considered, but they are using different file formats, so it is relatively difficult to integrate and/or overlap these in the context of this project.

The protein sequences of the SecG cluster-elements in pylori were compared to the E.coli database Colibri. Only proteins with a high enough similarity (based on the e-value) were considered for further analysis.

Sequence similarities--top--

The following results were obtained through sequence alignment by the FASTA program.

The H.pylori protein sequences neighboring SecG were compared to the E.coli genome.

H.pylori proteins vs E.coli:

PIR:G64595 acriflavine resistance protein
E.coli  |EG14057|yegN Function unknown            (1040) 1544 300.4 1.7e-81
E.coli |EG10014|acrD Sensitivity to acriflavine (1037) 1160 228.4 7.8e-60
E.coli |EG14058|yegO Function unknown (1025) 1139 224.5 1.2e-58
E.coli |EG10267|acrF Encodes lipoprotein with s (1034) 1051 208.0 1.1e-53
E.coli |EG12241|yhiV Function unknown (1037) 1047 207.2 1.8e-53
E.coli |EG11704|acrB AcrAB system has major rol (1049) 966 192.1 6.8e-49
E.coli |EG12367|ybdE Function unknown (1047) 412 88.3 1.2e-17
PIR: A64641 cation efflux system protein
E.coli  |EG14057|yegN Function unknown            (1040) 1072 216.4 3.3e-56
E.coli |EG14058|yegO Function unknown (1025) 1061 214.2 1.4e-55
E.coli |EG12367|ybdE Function unknown (1047) 939 190.9 1.6e-48
E.coli |EG12241|yhiV Function unknown (1037) 715 147.9 1.3e-35
E.coli |EG10014|acrD Sensitivity to acriflavine (1037) 656 136.6 3.4e-32
E.coli |EG10267|acrF Encodes lipoprotein with s (1034) 528 112.0 8.3e-25
E.coli |EG11704|acrB AcrAB system has major rol (1049) 473 101.5 1.3e-21
E.coli  |EG13246|ybaR Function unknown            ( 834) 1303 276.2 1.8e-74
E.coli |EG12215|zntA Zn2+ translocating P-type ( 732) 703 153.7 1.3e-37
E.coli |EG10514|kdpB High-affinity potassium tr ( 682) 342 79.9 1.8e-15
E.coli  |EG12931|yhgE Function unknown            ( 574)  104  30.5    0.51
E.coli |EG12623|ybaL Function unknown ( 558) 100 29.7 0.87
E.coli |EG12853|abgT Para-aminobenzoyl-glutamat ( 510) 99 29.5 0.94
PIR:B64681nicotinamide mononucleotide transporter
E.coli  |EG11700|pnuC Nicotinamide mononucleotid  ( 239)  149  41.5 6.9e-05
secD protein-export membrane protein
E.coli  |EG10940|secF Membrane protein with prot  ( 323)  685 155.6 6.2e-39
E.coli |EG10938|secD Membrane component of prot ( 615) 209 54.0 4.4e-08
PIR:B64681nicotinamide mononucleotide transporter
E.coli  |EG11700|pnuC Nicotinamide mononucleotid  ( 239)  149  41.5 6.9e-05
PIR:G64605 (FeoB) iron(II) transport protein
E.coli  |EG12102|feoB Membrane protein of ferrou  ( 773)  597 129.0 2.9e-30
E.coli  |EG11062|uvrB Excision nuclease subunit   ( 673) 2356 488.3 1.9e-138
E.coli |EG11619|mfd Transcription repair coupli (1148) 188 48.4 8.6e-06
E.coli |EG11235|rhlE DEAD-box protein family; A ( 454) 168 44.0 7.4e-05

Discussion --top--

As already mentioned, uvrB will not be considered, since a DNA repair enzyme does not seem likely to be interacting with a transporter protein complex.

It is quite interesting to see that a lot of the neighbors of SecG are independent (*) transporter molecules like cation efflux system protein, the copper transporting P-type ATPase, nicotinamide mononucleotide transporterand the nickel transporter protein iron(II) transport protein.
*this can be assumed since the Sec-pathway has not been shown to transport metal ions

It seems like these proteins could be part of an artifact as well, depending on the method used to detect their interaction. Since they are all transmembrane proteins, it could just have been that they aggregate together due to their hydrophobic transmembrane sequences. Another possibility could be that they are all substrate for the Sec-pathway: next to protein secretion, Sec also integrates transmembrane protein into the cytoplasmic membrane.

On the other hand, it is quite interesting to notice that cation efflux system protein and acriflavine resistance protein seem to share similarities with YegN and YegO of E.coli, as well as to some extend with the Acr proteins. However, when we tried to find some information on YegN or YegO, all we could find in the literature was the following limited description:

"resistance-nodulation-cell division-type multidrug transporters" ( Baranova et al., 2002)

which basically describes our finding as well: a protein which is somehow involved with drug-resistance (acr genes) and at the same time a transporter and a cell divisions type. Either way, it could be interesting to determine its interaction with the Sec-translocon.

Independent from the cluster analysis it was interesting to see that the sequence similarity of SecD of H.pylori is 23% similar to the SecD version of E.coli and 35% similar to SecF. This appears to be a gene fusion.

Arguing on the E-value, nixA does not seem to have a homolog in E.coli.

II. Cell division apparatus --top--

The second cluster we looked at was the cluster around the cell division complex.

FtsZ is the essential part for cell division, it assembles into a ring-like structure called the Z-ring. It undergoes GTP-induced selfassembly into dynamic filamentous structures to build up the dividing ring.

cell cycle

Fig. 4 cell cycle in E.coli
The picture shows the replication cycle of Escherichia coli.

The cell is a cylinder with hemispherical poles that grows. The shape of the cell is maintained by the peptidoglycan sacculus (shown as a thick gray layer) which is in inner membrane space.

On the right side the folded chromosomal DNA is shown as a grey ellipse within the cell. The 'newborn' cell has a ring of MinE protein at the cell center and a coating of MinC + MinD proteins around the cell membrane in one-half of the cell (thick black curved line). This coating switches from one half of the cell to the other about every 40 s and prevents polymerization of FtsZ protein (shown as individual dots in the cytoplasm) at these locations. At about the same time as completion of chromosome replication, the cell reaches twice its minimum 'unit' length and FtsZ proteins polymerize into a ring at the cell center (where MinE blocks the action of the MinCD proteins) and recruit ZipA and FtsA to the ring. Initial contraction of the ring (in this model) initiates the synthesis of the septum by ingrowth of the sacculus at this location. Growth of the sacculus accompanies contraction of the FtsZ_ZipA_FtsA ring and requires the localization and action of the remaining cell division proteins (FtsI, Q, L, W and N) at this site. After completion of the septum and separation of the two peptidoglycan layers, the outer membrane must also separate to cover the two sister cells. (Taken from Molecular Microbiology 40 (4), 779-785).

Here is a fragment of a relevant cluster for H.pylori as displayed in XINviewer:

Fig.5 cluster around FtsZ in H.pylori
The neighboring proteins of FtsZ in H.pylori reported in the database are:

Here is a fragment of a similarly relevant cluster for E.coli as displayed in XINviewer:

cluster around the E.coli FtsZ
Fig.6 cluster around FtsZ in E.coli
The cluster around the E.coli version of FtsZ has been more characterized than the pylori proteins, which is has been also incremented into the DIP database. It contains:

Results: sequence similarities --top--

Analysis of H.pylori proteins vs. the E.coli genome

PIR: A64642 conserved hypothetical secreted protein HP0977
E.coli  |EG13249|ppiD Peptidylprolyl-cis-trans-i  ( 623)  295  72.1 2.5e-13
TrpB Tryptophan synthase beta chain
E.coli  |EG11025|trpB Tryptophan synthase subuni  ( 397) 1721 378.0 1.1e-105
FtsAseptum formation protein
E.coli  |EG10339|ftsA Cell division, septation    ( 420)  529 124.0   4e-29

Discussion and findings --top--

FtsA has been shown to interact with FtsZ in the cell division complex.

What is more interesting to look at in this context is PIR: A64642, which seems to be the homolog to ppiD. PpiD has a peptidyl-prolyl isomerase (PPIase) activity (for more information see Dartigalongue et al., 1998) which has a transmembrane anchor, but its catalytic domain is facing the periplasm (the space between the two membrane of a gram-negative bacterium). Of course, this could be part of an artifact as described above, but it could also be a true interaction partner for the cell division apparatus.

This could be important with respect to cell division, especially since during division, the bacterial cell wall formed by the peptidoglycan sacculus, which is mainly cross-linked through amino acids, has to be cleaved. Another aspect is that FtsZ oligomerizes for the formation of the septic ring, which enforces the cell division into two daughter cells.

These aspects seem to be interesting for further experimental investigation.

Possible improvement and future directions --top--

There are many future avenues for improving the functionality of XINViewer. Here are some of them:

Links --top--