The Gaussian Linking Number (GLN) is another measure of entanglement between a pair of chromosomes. It indicates how many times (and in which direction) one chromosome winds around the other one. There are two main reasons to evaluate GLN:

- The GLN enables to distinguish between linked and unlinked pairs of chromosomes when their linking invariant is classified as Other (Other denotes a very complex polynomial, with more than 10 crossings). For instance, GLN values close to 1 indicate that chromosome one winds around chromosome two more or less once.
- The GLN provides more information about the entanglements between chromosomes. The GLN enables to identify the specific fragment of the chromosome winding around another one.

A definition of a linking number between two closed curves in 3 dimensions (e.g. gamma 1 and gamma 2) is given by the Gauss double integral, Eq. 1.

In the case of chromosomes, chains become collections of points, i.e. positions of beads (base pairs), and the integrals may be replaced by a sum over segments. We can relax the requirement to have basic integer indicators for linking and we perform the double Gauss discrete integral over open chains. Gauss proved that for closed curves this integral is always integer, which is an invariant up to isotopies, and it indicates how many times one curve winds around the second one. Thus the Gauss discrete integral over the open chains takes the form:

Using the following methods, introduced by us for proteins in [1], we analyze the following four quantities for each pair of chromosomes:

**whGLN:**describes entanglement between the whole chromosome 1 and the whole chromosome 2,**minGLN, maxGLN:**denote minimum and maximum values of the GLN between the chromosome 1 and any fragment of the chromosomes 2, and the chromosome 2 and any fragment of the chromosomes 1, respectively,**max|GLN|**= max{maxGLN, - minGLN},**maxshort|GLN|**: describes the shortest fragments of chromosomes which wind around each other

We distinguish the directions of windings with respect to the natural direction of chromosome chains from 5' to 3'. Thus a high maxGLN or low minGLN indicate that corresponding part of the chromosome significantly winds around a loop in a "positive" or "negative" direction, respectively. A high max|GLN| combines those two cases and thus indicates a significant winding in any direction. Note that if two chromosomes are linked the identified subchains via the max short |GLN| method are significantly shorter than the subchains determined via max|GLN| method, e.g. see Fig. 1. When chromosomes are unlinked, subchains identified via maxshort|GLN| method are still rather long in the comparison to these identified via the max|GLN| method, e.g. see Fig. 1. Comparison of subchains determined based on max and maxshort|GLN| provides additional descriptor to distinguish between linked and unlinked chromosomes.

- determine max|GLN| value between the whole chain 1 with all subchains (mean all combinations of subchains) from the chain 2,
- determine max|GLN| value between the whole chain 2 with all subchains from the chain 1,
- present data as max|GLN| with corresponding ranges of subchains from chain 1 and chain 2.

- determine a fragment from chain 2 with max|GLN| based on the whole chain 1 with all subchains from the chain 2,
- deterimine one fragment from chain 1 with max|GLN| searching all subchains from chain 1 while treating the subchain from chain 2 found in step 1 as the full chain 2,
- determine a fragment of chain 2 with max|GLN| searching all subchains of the fragment of chain 2 found in step 1, while treating the fragment of chain 1 found in step 2 as the full chain 1,
- repeat above steps until the smallest fragments for both chains are found and max|GLN| does not change.

The GLN can be help to distinguish between linked and unlinked pair of chromosomes, especially when the determined polynomial is classified as Other. Other has two meanings: a pair of chromosomes is linked in a very complex manner, or chromosomes are unlinked, but each of them possesses a complex knot. Figure 3 presents an example of linked chromosomes – pair **e** and **n** (Cell 1, model 1, [2]). In this case whGLN=2.88 and max|GLN| =3.38, strongly suggesting that they are linked. The highest value of max|GLN| is found for the following subchains **chain e:** 35-946, **chain n:** 25-895. These fragments are rather broad. A more precise location of winding is provided for users using the additional local search (as described above) over the winding fragments. Application of the first round of the method described here gives max|GLN|=3.31 for **chain e:** 35-333 and **chain n:** 25-370. The values of min and max GLN are presented on matrices in Fig. 4. The method saturates for max|GLN|=2.76 for **chain e:** 53-320 and **chain n:** 170-200 (this fragment consists only of 30 beads). With adequate coloring of subchains it is now possible to see, even by a naked eye, that these chromosomes are linked.

**Fig. 3** GLN can be used to identify the shortest fragments of chromosomes which wind around each other. These fragments in chromosomes **n** and **e** are indicated with red and blue color, left panel. Mechanical smoothing of the same chromosomes right panel.

[1] A Gierut, W Niemyska, P Dabrowski-Tumanski, P Sułkowski, JI Sulkowska, PyLasso – a PyMOL plugin to identify lassos Bioinformatics (2017), btx493

[2] Stevens, T.J.; Lando, D.; Basu, S.; Atkinson, L.P.; Cao, Y.; Lee, S.F.; Leeb, M.; Wohlfahrt, K.J.; Boucher, W.; O'Shaughnessy-Kirwan, A.; et al. 3D structures of individual mammalian genomes studied by single-cell Hi-C. Nature 2017, 544, 59—64.

KnotGenom | Interdisciplinary Laboratory of Biological Systems Modelling