ENCODE Genome Segmentations Track Settings
 
Genome Segmentations based on ENCODE data

Display mode:       Reset to defaults   
Select subtracks by method and cell line: (help)
 All Method ChromHMM  SegwayDBN  Combined 
Cell Line
GM12878 
H1-hESC 
K562 
HeLa-S3 
HepG2 
HUVEC 
Select subtracks further by: (select multiple categories and items - Help)
Tier:

List subtracks: only selected/visible    all    ()
  Tier↓1 Cell Line↓2 Method↓3   Track Name↓4  
 
hide
 1  GM12878  Combined  GM12878 Combined Genome Segmentation   Data format 
 
hide
 1  GM12878  ChromHMM  GM12878 ChromHMM Genome Segmentation   Data format 
 
hide
 1  GM12878  SegwayDBN  GM12878 Segway Genome Segmentation   Data format 
 
hide
 1  H1-hESC  Combined  H1-hESC Combined Genome Segmentation   Data format 
 
hide
 1  H1-hESC  ChromHMM  H1-hESC ChromHMM Genome Segmentation   Data format 
 
hide
 1  H1-hESC  SegwayDBN  H1-hESC Segway Genome Segmentation   Data format 
 
hide
 1  K562  Combined  K562 Combined Genome Segmentation   Data format 
 
hide
 1  K562  ChromHMM  K562 ChromHMM Genome Segmentation   Data format 
 
hide
 1  K562  SegwayDBN  K562 Segway Genome Segmentation   Data format 
 
hide
 2  HUVEC  Combined  HUVEC Combined Genome Segmentation   Data format 
 
hide
 2  HUVEC  ChromHMM  HUVEC ChromHMM Genome Segmentation   Data format 
 
hide
 2  HUVEC  SegwayDBN  HUVEC Segway Genome Segmentation   Data format 
 
hide
 2  HeLa-S3  Combined  HeLa-S3 Combined Genome Segmentation   Data format 
 
hide
 2  HeLa-S3  ChromHMM  HeLa-S3 ChromHMM Genome Segmentation   Data format 
 
hide
 2  HeLa-S3  SegwayDBN  HeLa-S3 Segway Genome Segmentation   Data format 
 
hide
 2  HepG2  Combined  HepG2 Combined Genome Segmentation   Data format 
 
hide
 2  HepG2  ChromHMM  HepG2 ChromHMM Genome Segmentation   Data format 
 
hide
 2  HepG2  SegwayDBN  HepG2 Segway Genome Segmentation   Data format 
    
Assembly: Human Feb. 2009 (GRCh37/hg19)


Note: ENCODE Project

Summary

This set of tracks represents multivariate genome-segmentation results based on ENCODE data (ENCODE Project Consortium, 2012). Using two different unsupervised machine learning techniques (ChromHMM and Segway), the genome was automatically segmented into disjoint segments. Each segment belongs to one of a few specific genomic "states" which is assigned an intuitive label. Each genomic state represents a particular combination and distribution of different ENCODE functional data tracks such as histone modifications, open chromatin data and specific TF binding data. A consensus unified segmentation was also generated by reconciling results from the individual segmentations. These segmentations were performed on several ENCODE cell lines. The specific descriptions for each segmentation are listed below.

ChromHMM Segmentations

Description

A common set of states across six human cell types (GM12878, H1-hESC, K562, HeLa-S3, HepG2, HUVEC) were learned by computationally integrating ChIP-seq data for 8 chromatin marks, input data and the CTCF transcription factor, two DNase-seq assays and a FAIRE-seq assay, using a Hidden Markov Model (HMM). In total, twenty-five states were used to segment the genome, and these states were then grouped and colored to highlight predicted functional elements. There are 6 ChromHMM tracks. Each track represents the segmentation results for each of the six cell lines.
The segmentations can be downloaded from http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/byDataType/segmentations/jan2011/.

Display Conventions and Configuration

The candidate annotations and associated segment colors are as follows:
  •  Bright Red  - Active Promoter
  •  Light Red  - Promoter Flanking
  •  Purple  - Inactive Promoter
  •  Orange  - Candidate Strong enhancer
  •  Yellow  - Candidate Weak enhancer
  •  Blue  - Distal CTCF/Candidate Insulator
  •  Dark Green  - Transcription associated
  •  Light Green  - Low activity proximal to active states
  •  Gray  - Polycomb repressed
  •  Light Gray  - Heterochromatin/Repetitive/Copy Number Variation

Methods

ChIP-seq data from the ENCODE Consortium was used to generate this track, and the ChromHMM program was used to perform the segmentation. Data for 8 chromatin marks, input data and the CTCF transcription factor, two DNase-seq assays and a FAIRE-seq assay and six cell types was binarized separately at a 200 base pair resolution based on a Poisson background model. The chromatin states were learned from this binarized data using a multivariate Hidden Markov Model (HMM) that explicitly models the combinatorial patterns of observed modifications (Ernst and Kellis, 2010). To learn a common set of states across the six cell types, first the genomes were concatenated across the cell types. For each of the six cell types, each 200 base pair interval was then assigned to its most likely state under the model.

References

Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nature Biotechnology 2010 Jul 25;28:817-825.


Segway Segmentations

Description

Sets of states across six human cell types (GM12878, H1-hESC, K562, HeLa-S3, HepG2, HUVEC) were learned by computationally integrating ChIP-seq data for 8 chromatin marks, input data and the CTCF transcription factor, two DNase-seq assays and a FAIRE-seq assay, using a Dynamic Bayesian Network (DBN). In total, twenty-five states were used to segment the genome, and these states were then grouped and colored to highlight predicted functional elements. There are 6 Segway tracks. Each track represents the segmentation results for each of the six cell lines.

Display Conventions and Configuration

The candidate annotations and associated segment colors are as follows:
  •  Bright Red  - Active Promoter
  •  Light Red  - Promoter Flanking
  •  Purple  - Inactive Promoter
  •  Orange  - Candidate Strong enhancer
  •  Yellow  - Candidate Weak enhancer
  •  Blue  - Distal CTCF/Candidate Insulator
  •  Dark Green  - Transcription associated
  •  Light Green  - Low activity proximal to active states
  •  Gray  - Polycomb repressed
  •  Light Gray  - Heterochromatin/Repetitive/Copy Number Variation

Methods

ChIP-seq data from the ENCODE Consortium was used to generate this track, and the Segway program was used to perform the segmentation. Data for ten factors plus input, two DNase-seq assays and a FAIRE-seq assay and six cell types was converted to real valued signal data using the Wiggler program. Using the ENCODE regions (spanning 1% of the human genome) the chromatin states were learned from this data using a Dynamic Bayesian Network (DBN) (Hoffman, in preparation). Models were learned separately for each of the six cell types. For each of the six cell types, the Viterbi algorithm was used to assign genomic regions to individual state labels at single base pair resolution over the entire genome.

References

Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes JA, Noble WS. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods. 2012 Mar 18;9(5):473-6.


Combined Segmentations

Description

These tracks display chromatin state segmentations from 6 cell lines, using a consensus merge of the segmentations produced by the ChromHMM and Segway software. In both segmentations, sets of states across six human cell types were learned by computationally integrating ChIP-seq data for ten factors plus input, two DNase-seq assays and a FAIRE-seq assay. In both segmentations, twenty-five states were used to segment the genome, and these states were then grouped and colored to highlight predicted functional elements. For ease of comprehension and display, the merged segmentation uses only seven states.

Display Conventions and Configuration

The seven states of the combined segmentation, the candidate annotations and associated segment colors are as follows:
TSS  Bright Red  Predicted promoter region including TSS
PF  Light Red  Predicted promoter flanking region
E  Orange  Predicted enhancer
WE  Yellow  Predicted weak enhancer or open chromatin cis regulatory element
CTCF Blue  CTCF enriched element
T  Dark Green  Predicted transcribed region
R  Gray  Predicted Repressed or Low Activity region

Methods

ChIP-seq data from the ENCODE Consortium was used to generate this track, and the ChromHMM and Segway programs were used to perform the segmentation. For both original segmentations, data for ten factors plus input, two DNase-seq assays and a FAIRE-seq assay and six cell types was used.

For ChromHMM, the data was binarized separately at a 200 base pair resolution based on a Poisson background model. The chromatin states were learned from this binarized data using a multivariate Hidden Markov Model (HMM) that explicitly models the combinatorial patterns of observed modifications (Ernst and Kellis, 2010). To learn a common set of states across the six cell types, first the genomes were concatenated across the cell types. For each of the six cell types, each 200 base pair interval was then assigned to its most likely state under the model.

For Segway, the data converted to real valued signal data using the Wiggler program. Using the ENCODE regions (spanning 1% of the human genome) the chromatin states were learned from this data using a Dynamic Bayesian Network (DBN) (Hoffman, in preparation). Models were learned separately for each of the six cell types. For each of the six cell types, the Viterbi algorithm was used to assign genomic regions to individual state labels at single base pair resolution over the entire genome.

To form the combined segmentation, for each original segmentation, we identified states that could be grouped based on similar signal patterns. For the ChromHMM segmentation, the states were grouped manually based on the mean signal values across multiple cell lines. For the Segway segmentations run independently over multiple cell lines, multiple hierarchical clustering techniques were applied across all states in the segmentations to identify the most consistent clustering of states, both across cell lines and with respect to existing biological knowledge. Using these criteria, the Ward clustering on euclidean distances between mean signal scores transformed to the unit interval was chosen to cluster the Segway state labels. Subsequently, pairwise relationships between the ChromHMM and Segway merged states were identified using both overlap calculations and manual annotation (Hoffman, Ernst et al., Submitted). Pairs of states that were viewed as concordant were placed assigned to one of the seven states classes. Regions of the genome occupied by concordant states between the two initial segmentations were were reassigned to the new summary labels. In some cases there were combinations of states between the two segmentations that could not be reconciled and these combinations were viewed as discordant. Regions with discordant states were not assigned a state label, and were dropped out of the summary combined segmentation.

References

ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012 Sep 6;489(7414):57-74.

Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nature Biotechnology 2010 Jul 25;28:817-825.

Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes JA, Noble WS. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature Methods 2012 Mar 18;9(5):473-476.

Hoffman MM, Ernst J, Wilder SP, Kundaje A, Harris RS, Libbrecht M, Bilmes JA, Giardine B, Birney E, Hardison RC, et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Research 2013 Jan 1;41(2):827-841.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.

There is no restriction on the use of segmentation data.

Contact

Michael Hoffman