在 GATK(Genome Analysis Toolkit)库中,ReferenceDataSource
接口是一个重要的接口,用于表示与参考基因组相关的数据源。它提供了一种标准化的方式来访问和操作参考基因组的不同来源的数据。ReferenceMemorySource 类和ReferenceFileSource 类是ReferenceDataSource接口的实现类。分别用于管理内存和文件中的参考基因组数据。
ReferenceDataSource
接口概述
ReferenceDataSource
接口定义了对参考基因组数据的访问操作,包括获取特定位置的参考序列。它通常用于 GATK 工具中的参考数据处理和访问。
主要功能
- 访问参考数据:提供对参考基因组数据的标准化访问方式。
- 支持不同的数据源:可以从不同的参考数据源(如 FASTA 文件)中获取参考序列。
接口实现:
ReferenceMemorySource 类和 ReferenceFileSource 类实现 ReferenceDataSource接口
ReferenceDataSource接口源代码:
package org.broadinstitute.hellbender.engine;
import htsjdk.samtools.SAMSequenceDictionary;
import htsjdk.samtools.reference.ReferenceSequence;
import org.broadinstitute.hellbender.utils.SimpleInterval;
import org.broadinstitute.hellbender.utils.iterators.ByteArrayIterator;
import org.broadinstitute.hellbender.utils.reference.ReferenceBases;
import java.nio.file.Path;
import java.util.Iterator;
/**
* Manages traversals and queries over reference data.
*
* Supports targeted queries over the reference by interval and over the entire reference.
*/
public interface ReferenceDataSource extends GATKDataSource<Byte>, AutoCloseable {
/**
* Initialize this data source using a fasta file.
*
* The provided fasta file must have companion .fai and .dict files.
*
* @param fastaPath reference fasta Path
*/
public static ReferenceDataSource of(final Path fastaPath) {
return new ReferenceFileSource(fastaPath);
}
/**
* Initialize this data source using a fasta file.
*
* The provided fasta file must have companion .fai and .dict files.
*
* If {@code preserveFileBases} is {@code true}, will NOT convert IUPAC bases in the file to `N` and will NOT capitalize lower-case bases.
*
* NOTE: Most GATK tools do not support data created by setting {@code preserveFileBases} to {@code true}.
*
* @param fastaPath reference fasta Path
* @param preserveAmbiguityCodesAndCapitalization Whether to preserve the original bases in the given reference file path.
*/
public static ReferenceDataSource of(final Path fastaPath, final boolean preserveAmbiguityCodesAndCapitalization) {
r