(A) Photographs of the mammoth sample used in this study (specimen ID IN18-032) showing intact hair and ear. A close-up (left) shows one of the two skin fragments that have been collected for research purposes. The edge of the skin, from which the two fragments were cut side-by-side, is shown in the lower-right.
(B) Light microscopy, Van Gieson/Giemsa and DAPI staining of the mammoth sample, and successive zoom-ins of the regions shown in Figure 1C. Top row: Derma (Van Gieson/Giemsa staining): general view (left), successive zoom-ins (center and right panel). Middle row: Hair follicle (Giemsa staining): general view (left), and zoom-in on the hair shaft and outer root sheath cells (center, right). Bottom row: Skeletal muscle (Van Gieson/Giemsa staining and DAPI): general view (left), zoom-in (center), and DAPI staining of nuclei-like structures (laser scanning confocal microscopy). More images are available at
https://doi.org/10.5281/zenodo.11193545.
(C) Contaminants in the mammoth sample: fungi (top, Van Gieson/Eosin), unidentified (possibly parasites) (middle, Van Gieson/Giemsa), bacteria (bottom, Giemsa). More images are available at
https://doi.org/10.5281/zenodo.11193545.
(D) Top: Starting with the Asian elephant draft genome assembly ASM1433276v1 from Tollis et al.,29 we use in situ Hi-C data (left) to error-correct, anchor, order and orient the draft sequences to produce a chromosome-length de novo assembly for the Asian elephant, ASM1433276v1_HiC (right). Rainbow tracks on top and to the left of the contact maps are used to highlight corresponding loci between the two assemblies: the same color is used to show matching sequences. The draft sequences on the left are ordered by size, from largest to smallest. The chromosome-length sequences on the right are ordered according to the published Asian elephant karyotype.76 The ovals shaping the rainbow track on the right outline the boundaries of the 28 chromosomes in ASM1433276v1_HiC. The dashed oval after the 28th chromosome-length scaffold represents unanchored sequences. The dashed oval around the draft assembly rainbow track highlights that all sequences in the draft are unanchored. Bottom: Starting with the African elephant draft genome assembly Loxafr3.0 from Palkopoulou et al.,28 we use in situ Hi-C data (left) from Álvarez-González et al.27 to error-correct, anchor, order and orient the draft sequences to produce a chromosome-length de novo assembly for the African elephant, Loxafr3.0_HiC, on the right. Rainbow tracks on top and to the left of the contact maps are used to highlight corresponding loci between the two assemblies: the same color is used to show matching sequences. The draft sequences on the left are ordered by size. The chromosome-length sequences on the right are ordered according to the published African elephant karyotype.76 The ovals shaping the rainbow track on the right outline the boundaries of the 28 chromosomes in Loxafr3.0_HiC. The dashed oval after the 28 chromosome-length scaffolds represents unanchored sequences. The dashed oval around the draft assembly rainbow track highlights that all sequences in the draft are unanchored. An interactive version of the contact maps shown in this figure can be found at
https://t.3dg.io/3d-mammoth-Fig-S1D.
(E) Bottom: As is often the case with ancient DNA (aDNA) sequencing, many of the read pairs in PaleoHi-C align to a variety of contaminants. The stacked bar chart shows the taxonomic composition of 22 BGI libraries spanning six biological tissue replicates, labeled alphabetically (see Table S1, #3). The replica ID is listed along the x axis. The y axis shows the number of reads in each library assigned to one of the NCBI Foreign Contamination Screen’s Genome Cross-species aligner (FCS-GX) taxonomic divisions shown below the bar chart. These include seven top-scoring known taxonomic divisions across all datasets (b-proteobacteria, g-proteobacteria, CFB group bacteria, insects, mammals, high GC Gram+, and plants) as well as unidentified sequences (“n/a”). All other identified taxonomic categories are bundled together and labeled as “other”. Endogenous mammoth sequences are expected to fall into the “mammals” taxonomic division. The libraries are bundled into four groups: EtOH-p for “EtOH precipitate,” the main PaleoHi-C dataset; EtOH-s for “EtOH supernatant,” a supplementary dataset prepared from the tissue lysis supernatant leftovers; no-EtOH-p for “No EtOH precipitate,” a supplementary dataset generated from the sample handled without ethanol; and no-EtOH-s for “No EtOH supernatant,” a supplementary dataset generated from the supernatant collected during no-ethanol tissue lysis. The total number of reads analyzed from each library is 20 million (10 million paired-end reads). Top: Immediate on-site preservation of samples in ethanol is compatible with PaleoHi-C and, along with mechanical removal of exterior contaminants, can contribute to curbing microbial activity and improve isolation of endogenous mammoth Hi-C sequences. Although with lower efficiency, endogenous contact data can be collected from non-ethanol samples as well. The percent of total alignable read pairs (calculated with respect to the total raw read pair count) are listed as reported by the Juicer pipeline.23,77 Alignment statistics are shown for the same 22 BGI libraries as are analyzed in the bottom panel.
(F) Top: Barplot showing the percentage of reads that display damage related to cytosine deamination in the woolly mammoth sample and the modern elephant samples, as determined by PMDtools.30 As expected, the mammoth-derived PaleoHi-C reads have an elevated degradation signature, on par with the non-USER (no-U) treated aDNA-Seq data. The damage is higher than in any of the modern elephant datasets. Note the elevated damage signature in the skin necropsy for the Asian elephant as compared to the fibroblast-derived African elephant dataset. The relative increase in percent of damaged reads is consistent with the necropsy sample being stored under suboptimal conditions for ∼20 years. Bottom: Barplot showing the percentage of mammoth alleles in reads overlapping mammoth-specific fixed genetic variants from Díez-del-Molino et al.24 in PaleoHi-C data vs. data derived from modern elephants. Whenever such diagnostic reads are examined in the mammoth, they overwhelmingly show the mammoth allele, confirming that PaleoHi-C data derives from ancient mammoth DNA rather than from modern elephantid contaminants. Of course, the diagnostic reads from modern elephants overwhelmingly show the modern elephant allele. The results of both the read damage and allele analyses remain unchanged if only reads representing candidate long-range contacts are analyzed (Long) as opposed to all reads in the library (All) and are robust to the choice of sequencing platform (BGI or Illumina [IL]).
(G) Supplementary details on the probability of observing a read pair in which the two reads map to positions in the genome separated by a given distance in base pairs. Top: A small fraction of PaleoHi-C data reflects contacts between loci that lie far away in 1D. No such fraction exists in aDNA-Seq datasets. The image is analogous to that shown in Figure 1E, with the following modifications: 1) the PaleoHi-C curve is based on data collated from both the BGI and Illumina sequencing platforms; 2) two aDNA-Seq datasets are included: the USER-treated (U) datasets (from across 32 replicates) from Díez-del-Molino et al.24 and the non-USER (no-U) treated dataset generated for the same sample as part of this study. To be maximally conservative, the dataset with the heaviest tail from Díez-del-Molino et al.24 was chosen to be plotted alongside PaleoHi-C data in Figure 1E. Middle: Contact probability curves for PaleoHi-C of woolly mammoth skin and in situ Hi-C of Asian elephant skin. The slopes between the two datasets are highly consistent across a wide range of distances. The curve is based on a combined dataset from both the BGI and Illumina platforms. Bottom: Contact probability curves plotted separately for the four library groups: EtOH-p (EtOH precipitate), EtOH-s (EtOH supernatant), no-EtOH-p (no EtOH precipitate) and no-EtOH-s (no EtOH supernatant).