Description
This track shows multiple alignments of 90 human genomes generated by the Minigraph-Cactus
pangenome pipeline, which creates pangenomes directly from whole-genome alignments. This method
builds graphs containing all forms of genetic variation while allowing use of current mapping and
genotyping tools.
Display Conventions and Configuration
In full and pack display modes, conservation scores are displayed as a
wiggle track (histogram) in which the height reflects the
size of the score.
The conservation wiggles can be configured in a variety of ways to
highlight different aspects of the displayed information.
Click the Graph configuration help link for an explanation
of the configuration options.
Pairwise alignments of each species to the human genome are
displayed below the conservation histogram as a grayscale density plot (in
pack mode) or as a wiggle (in full mode) that indicates alignment quality.
In dense display mode, conservation is shown in grayscale using
darker values to indicate higher levels of overall conservation
as scored by phastCons.
Checkboxes on the track configuration page allow selection of the
species to include in the pairwise display.
Note that excluding species from the pairwise display does not alter the
the conservation score display.
To view detailed information about the alignments at a specific
position, zoom the display in to 30,000 or fewer bases, then click on
the alignment.
Gap Annotation
The Display chains between alignments configuration option
enables display of gaps between alignment blocks in the pairwise alignments in
a manner similar to the Chain track display. The following
conventions are used:
- Single line: No bases in the aligned species. Possibly due to a
lineage-specific insertion between the aligned blocks in the human genome
or a lineage-specific deletion between the aligned blocks in the aligning
species.
- Double line: Aligning species has one or more unalignable bases in
the gap region. Possibly due to excessive evolutionary distance between
species or independent indels in the region between the aligned blocks in both
species.
- Pale yellow coloring: Aligning species has Ns in the gap region.
Reflects uncertainty in the relationship between the DNA of both species, due
to lack of sequence in relevant portions of the aligning species.
Genomic Breaks
Discontinuities in the genomic context (chromosome, scaffold or region) of the
aligned DNA in the aligning species are shown as follows:
-
Vertical blue bar: Represents a discontinuity that persists indefinitely
on either side, e.g. a large region of DNA on either side of the bar
comes from a different chromosome in the aligned species due to a large scale
rearrangement.
-
Green square brackets: Enclose shorter alignments consisting of DNA from
one genomic context in the aligned species nested inside a larger chain of
alignments from a different genomic context. The alignment within the
brackets may represent a short misalignment, a lineage-specific insertion of a
transposon in the human genome that aligns to a paralogous copy somewhere
else in the aligned species, or other similar occurrence.
Base Level
When zoomed-in to the base-level display, the track shows the base
composition of each alignment. The numbers and symbols on the Gaps
line indicate the lengths of gaps in the human sequence at those
alignment positions relative to the longest non-human sequence.
If there is sufficient space in the display, the size of the gap is shown.
If the space is insufficient and the gap size is a multiple of 3, a
"*" is displayed; other gap sizes are indicated by "+".
Methods
The MAF was obtained from the HPRC v1.0 minigraph-cactus HAL file (renamed
to replace all "." characters in sample names with "#" using
halRenameGenomes) using cactus v2.6.4 as follows.
cactus-hal2maf ./js ./hprc-v1.0-mc-grch38.h
al hprc-v1.0-mc-grch38.maf.gz --noAncestors --refGenome GRCh38
--filterGapCausingDupes --chunkSize 100000 --batchCores 96 --batchCount 1
0 --noAncestors --batchParallelTaf 32 --batchSystem slurm --logFile
hprc-v1.0-mc-grch38.maf.gz.log
zcat hprc-v1.0-mc-grch38.maf.gz | mafDuplicateFilter -m - -k | bgzip >
hprc-v1.0-mc-grch38-single-copy.maf.gz
Credits
Thank you to Glenn Hickey for providing the HAL file from the HPRC project.
References
Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas JK, Monlong J, Abel HJ et
al.
A draft human pangenome reference.
Nature. 2023 May;617(7960):312-324.
DOI: 10.1038/s41586-023-05896-x; PMID: 37165242; PMC: PMC10172123
Hickey G, Monlong J, Ebler J, Novak AM, Eizenga JM, Gao Y, Human Pangenome Reference Consortium,
Marschall T, Li H, Paten B.
Pangenome graph construction from genome alignments with Minigraph-Cactus.
Nat Biotechnol. 2023 May 10;.
DOI: 10.1038/s41587-023-01793-w; PMID: 37165083; PMC: PMC10638906
Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J
et al.
Progressive Cactus is a multiple-genome aligner for the thousand-genome era.
Nature. 2020 Nov;587(7833):246-251.
DOI: 10.1038/s41586-020-2871-y; PMID: 33177663; PMC: PMC7673649
Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D.
Cactus: Algorithms for genome multiple sequence alignment.
Genome Res. 2011 Sep;21(9):1512-28.
DOI: 10.1101/gr.123356.111;
PMID: 21665927; PMC: PMC3166836
|