From here on we changed the B2N code to allow the use of the MCL

From here on we changed the B2N code to allow the use of the MCL with a similarity measure corresponding to the normalized alignment bit score between homologous sequences:

where S ii is the maximal score attainable using the i th query and it corresponds to the query aligned check details with itself. The adjacency matrix is normalized to make it stochastic, a prerequisite for the MCL algorithm used to define clusters of orthologous sequences. The MCL algorithm simulates flow alternating two algebraic operations on matrices: expansion of the input matrix (M out = M in * M in ) models the spreading out of flow and inflation (m ij = ). Parameter r controls the granularity of the clustering and it is set to 2. After these two steps we apply diagonal scaling to keep the matrix stochastic and ready for the next iteration. Inflation models the contraction of flow, and it is thicker in regions of higher buy EX 527 current and thinner in regions of lower current. The consequence is that the flow spreads out within clusters while evaporating in-between clusters leaving at convergence an idempotent matrix revealing the clusters hidden in the original adjacency matrix. Plasmid analysis Concerning the

identification of VirR targets, we analysed plasmids with the same procedure used for genomes. Phylogenetic profiling and the find more hypergraph describing the similarity in gene contents of different plasmid molecules were calculated using the software Blast2network [13] and visualization with the software Visone [17]. The phylogenetic profiling technique is described in detail in several papers, e.g. [18, 19] so that we will not discuss it here in

detail, it is enough to say that by comparing the distribution of different genes in different plasmids we can quantify the extent at which proteins tend to co-occur which is an indication of the degree of functional SPTLC1 overlapping between different proteins. We want to spend some word concerning the hypergraph shown in figure 3. Let’s suppose to have an adjacency matrix describing homologies between proteins encoded by several different plasmids. In this matrix, element m ij corresponds to the similarity between sequences i and j. However these matrices can be quite large (i.e. the total number of proteins in the study set), so that it is possible to apply some dimensionality reduction approach to extract the information we are interested in. In our case, given the mobility of genes encoded on plasmids, we wanted to assess the degree of similarities between them in term of gene content, and to identify the most plausible routes for gene exchange in the strains under analysis. One way to do that is to calculate the similarity in the phylogenetic profiles of each plasmid and then reduce the original matrix to a new one whose size corresponds to the number of plasmids in the dataset.

Comments are closed.