RefSeq
RefSeq
In kerasy, we can use kerasy.datasets.ncbi.getSeq
method to collect sequence data from Reference Sequence (RefSeq) database, which is an open access, annotated and curated collection
of publicly available nucleotide sequences (DNA, RNA) and their protein products.
This database is built by NCBI(National Center for Biotechnology Information), and, unlike GenBank, which is also build by it provides only a single record for each natural biological molecule (i.e. DNA, RNA or protein) for major organisms ranging from viruses to bacteria to eukaryotes.
def kerasy.dataset.ncbi.getSeq(refSeq_num, asfasta=False, path="")
RefSeq Number
Accession prefix | Description |
---|---|
NC | Complete genomic molecules |
NG | Incomplete genomic region |
NM | mRNA |
NR | ncRNA |
NP | Protein |
XM | predicted mRNA model |
XR | predicted ncRNA model |
XP | predicted Protein model (eukaryotic sequences) |
WP | predicted Protein model (prokaryotic sequences) |
Ref: Table 7. [Entrez queries to retrieve sets of RefSeq records.]. - The NCBI Handbook - NCBI Bookshelf
FAQ
If the method doesn't work, please look at Genomes Download FAQ