RefSeq

RefSeq

In kerasy, we can use kerasy.datasets.ncbi.getSeq method to collect sequence data from Reference Sequence (RefSeq) database, which is an open access, annotated and curated collection of publicly available nucleotide sequences (DNA, RNA) and their protein products.

This database is built by NCBI(National Center for Biotechnology Information), and, unlike GenBank, which is also build by it provides only a single record for each natural biological molecule (i.e. DNA, RNA or protein) for major organisms ranging from viruses to bacteria to eukaryotes.

def kerasy.dataset.ncbi.getSeq(refSeq_num, asfasta=False, path="")

RefSeq Number

Accession prefix Description
NC Complete genomic molecules
NG Incomplete genomic region
NM mRNA
NR ncRNA
NP Protein
XM predicted mRNA model
XR predicted ncRNA model
XP predicted Protein model (eukaryotic sequences)
WP predicted Protein model (prokaryotic sequences)

Ref: Table 7. [Entrez queries to retrieve sets of RefSeq records.]. - The NCBI Handbook - NCBI Bookshelf

FAQ

If the method doesn't work, please look at Genomes Download FAQ