2021. 1. 23. 21:05

This script takes a "taxon name" or "taxonomy ID", then outputs its NCBI lineage using the Biopython Entrez. There's another way to do it without connecting to the NCBI server by parsing the taxdump (which I posted here).

 

Your input would be either a taxon name like "Lentinula edodes" or a taxonomy ID like "5353." The output will be full lineage like "Eukaryota; Fungi; Dikarya; Basidiomycota; Agaricomycotina; Agaricomycetes; Agaricomycetidae; Agaricales; Omphalotaceae; Lentinula"

 

The code is here:

import re
import sys
from Bio import Entrez
Entrez.email = 'A.N.Other@example.com'

def get_ncbi_tax(taxon):
    '''Getn NCBI taxonomy'''
    # If the input is a string
    if not re.match(r'\d+', taxon):
        # Get taxonomy ID using Entrez
        taxon2 = '"' + taxon + '"'
        handle = Entrez.esearch(
            db='taxonomy', term=taxon2, rettype='gb', retmode='text')
        record = Entrez.read(handle, validate=False)
        handle.close()
        # If there's no result
        if not record['IdList']:
            sys.exit(
                '[ERROR] The taxon "{}" you provided is invalid. '
                'Please check NCBI Taxonomy'.format(taxon))
        tax_id = record['IdList']
    else:
        tax_id = taxon

    # Now connect NCBI again using the tax_id
    # Entrez.efetch will give you various information
    handle2 = Entrez.efetch(db='taxonomy', id=tax_id, retmode='xml')
    record2 = Entrez.read(handle2, validate=False)
    handle2.close()

    tax_list = record2[0]['LineageEx']
    for tax_element in tax_list:
        print('{}: {}'.format(
            tax_element['Rank'], tax_element['ScientificName']))
            
# Now call the function
get_ncbi_tax('5353')  # Using tax ID
get_ncbi_tax('Lentinula edodes')  # Using tax name

 

Explanation

If you don't have an ID, you must search for it first using the Entrez.esearch(). The taxon name can be any level (e.g., species, genus, etc).

taxon2 = '"' + taxon + '"'
handle = Entrez.esearch(
	db='taxonomy', term=taxon2, rettype='gb', retmode='text')
record = Entrez.read(handle, validate=False)
handle.close()

2. After you get the ID, use the Entrez.efetch() to get the lineage. You can also get other data such as common name or acronym. Explore the record2 for details.

handle2 = Entrez.efetch(db='taxonomy', id=tax_id, retmode='xml')
record2 = Entrez.read(handle2, validate=False)
handle2.close()

Please click the heart button to support this blog!

'English > Bioinformatics' 카테고리의 다른 글

Biopython - How to get NCBI taxonomy using Entrez  (0) 2021.01.23