The Gaussian Linking Number (GLN) is another measure of entanglement between a pair of chromosomes. It indicates how many times (and in which direction) one chromosome winds around the other one. There are two main reasons to evaluate GLN:
The GLN enables to distinguish between linked and unlinked pairs of chromosomes when their linking invariant is classified as Other (Other denotes a very complex polynomial, with more than 10 crossings). For instance, GLN values close to 1 indicate that chromosome one winds around chromosome two more or less once.
The GLN provides more information about the entanglements between chromosomes. The GLN enables to identify the specific fragment of the chromosome winding around another one.
A definition of a linking number between two closed curves in 3 dimensions (e.g. gamma 1 and gamma 2) is given by the Gauss double integral, Eq. 1.
In the case of chromosomes, chains become collections of points, i.e. positions of beads (base pairs), and the integrals may be replaced by a sum over segments. We can relax the requirement to have basic integer indicators for linking and we perform the double Gauss discrete integral over open chains. Gauss proved that for closed curves this integral is always integer, which is an invariant up to isotopies, and it indicates how many times one curve winds around the second one. Thus the Gauss discrete integral over the open chains takes the form:
Using the following methods, introduced by us for proteins in [1], we analyze the following four quantities for each pair of chromosomes:
whGLN: describes entanglement between the whole chromosome 1 and the whole chromosome 2,
minGLN, maxGLN: denote minimum and maximum values of the GLN between
the chromosome 1 and any fragment of the chromosomes 2, and the chromosome 2 and any fragment of the chromosomes 1, respectively,
max|GLN| = max{maxGLN, - minGLN},
maxshort|GLN|: describes the shortest fragments of chromosomes which wind around each other
We distinguish the directions of windings with respect to the natural direction of chromosome chains from 5' to 3'. Thus a high maxGLN or low minGLN indicate that corresponding part of the chromosome significantly winds around a loop in a "positive" or "negative" direction, respectively. A high max|GLN| combines those two cases and thus indicates a significant winding in any direction.
Note that if two chromosomes are linked the identified subchains via the max short |GLN| method are significantly shorter than the subchains determined via max|GLN| method, e.g. see Fig. 1. When chromosomes are unlinked, subchains identified via maxshort|GLN| method are still rather long in the comparison to these identified via the max|GLN| method, e.g. see Fig. 1. Comparison of subchains determined based on max and maxshort|GLN| provides additional descriptor to distinguish between linked and unlinked chromosomes.
Fig. 1 An example of entanglement analysis of a pair of chromosomes based on max and max short |GLN| method.
(A) Chromosomes a and d (model 1) from [1], whGLN: 0.37, max and max short|GLN| respectively 1.01 and 0.85, for identified subchains of chromosome a: 432-1425 (red) and 432-1425 (red, the range of this subchain did not change); chromosome d: 308-1372 (blue) and 839-1371 (cyan). These chromosomes are unlinked.
(B) Chromosomes a and c from (model 1) [1], whGLN: 1, max and max short |GLN| respectively 2.55 and 2.09 for identified subchains of chromosomes a: 762,1212 (red) and 762-1212 (red); chromosome c: 170-1469 (blue) and 1368-1450 (cyan). These chromosomes are unlinked.
(C) Chromosome a and h (model 1) from [1], whGLN: -1.77, max and max short|GLN| respectively -2.72 and -2.51, for identified subchains of chromosome a: 674-1394 (red) and 674-1394 (red, the range of this subchain did not change); chromosome h: 309-525 (blue) and 367-421 (cyan). These chromosomes are linked. Note a significant decrease in the length of subchain in the case of chromosome h.
(D) Chromosome a and n (model 1) from [1], whGLN: -0.7, max and max short|GLN| respectively -1.55 and -1.36, for identified subchains of chromosome a: 357-767 (red) and 357,767 (red, the range of this subchain did not change); chromosome n: 11-358 (blue) and 172-259 (cyan). These chromosomes are linked. Note a significant decrease in the length of subchain in the case of chromosome n.
To identify max|GLN| we conduct the following procedure:
determine max|GLN| value between the whole chain 1 with all subchains (mean all combinations of subchains) from the chain 2,
determine max|GLN| value between the whole chain 2 with all subchains from the chain 1,
present data as max|GLN| with corresponding ranges of subchains from chain 1 and chain 2.
To identify maxshortGLN the shortest fragments of chromosomes which wind around each other we conduct the following procedure:
determine a fragment from chain 2 with max|GLN| based on the whole chain 1 with all subchains from the chain 2,
deterimine one fragment from chain 1 with max|GLN| searching all subchains from chain 1 while treating the subchain from chain 2 found in step 1 as the full chain 2,
determine a fragment of chain 2 with max|GLN| searching all subchains of the fragment of chain 2 found in step 1, while treating the fragment of chain 1 found in step 2 as the full chain 1,
repeat above steps until the smallest fragments for both chains are found and max|GLN| does not change.
Fig. 2 Sketch of a method to determine max|GLN| between chain 1 and all subchains of chain 2.
Application - examples
The GLN can be help to distinguish between linked and unlinked pair of chromosomes, especially when the determined polynomial is classified as Other. Other has two meanings: a pair of chromosomes is linked in a very complex manner, or chromosomes are unlinked, but each of them possesses a complex knot. Figure 3 presents an example of linked chromosomes – pair e and n (Cell 1, model 1, [2]). In this case whGLN=2.88 and max|GLN| =3.38, strongly suggesting that they are linked. The highest value of max|GLN| is found for the following subchains chain e: 35-946, chain n: 25-895. These fragments are rather broad. A more precise location of winding is provided for users using the additional local search (as described above) over the winding fragments. Application of the first round of the method described here gives max|GLN|=3.31 for chain e: 35-333 and chain n: 25-370. The values of min and max GLN are presented on matrices in Fig. 4. The method saturates for max|GLN|=2.76 for chain e: 53-320 and chain n: 170-200 (this fragment consists only of 30 beads). With adequate coloring of subchains it is now possible to see, even by a naked eye, that these chromosomes are linked.
Fig. 3 GLN can be used to identify the shortest fragments of chromosomes which wind around each other. These fragments in chromosomes n and e are indicated with red and blue color, left panel. Mechanical smoothing of the same chromosomes right panel.
Fig. 4 The GLN matrices presenting min and max GLN between chromosomes n and e. Left panel: value between the whole chromosome n with all subchains from the chromosomes e. Right panel: opposite situation.
[1] A Gierut, W Niemyska, P Dabrowski-Tumanski, P Sułkowski, JI Sulkowska, PyLasso – a PyMOL plugin to identify lassos
Bioinformatics (2017), btx493
[2] Stevens, T.J.; Lando, D.; Basu, S.; Atkinson, L.P.; Cao, Y.; Lee, S.F.; Leeb, M.; Wohlfahrt, K.J.; Boucher, W.; O'Shaughnessy-Kirwan, A.; et al. 3D structures of individual mammalian genomes studied by single-cell Hi-C. Nature 2017, 544, 59—64.