BioInterface

https://img.shields.io/pypi/v/biointerface.svg Documentation Status Coverage Status

BioInterface is a Biopython based package that extracts Protein-DNA interfaces in a PDB structures.

Get Started

This is a little tutorial on how to use the BioInterface package.

The official release is found in the Python Package Index (PyPI)

$ pip install biointerface

You can extract all Protein-Nucleic acids interfaces from an entire structure.

from Bio.PDB.PDBList import PDBList
from Bio.PDB.MMCIFParser import MMCIFParser
from biointerface import InterfaceBuilder

# retrive file from PDB using Biopython
pdbl = PDBList()
pdbl.retrieve_assembly_file(pdb_code="1A02", assembly_num=1, pdir=".")
# ... or else use your own

# parse and build structure with Biopython
parser = MMCIFParser()
structure = parser.get_structure(
    structure_id="1A02", filename="1a02-assembly1.cif"
)

face_builder = InterfaceBuilder(search_radius=4.0)
face_list = face_builder.build_interfaces(entity=structure)
face_list
[<Interface protein_chains=N nucleic_chains=A,B contacts=143 search_radius=4.0>,
 <Interface protein_chains=F nucleic_chains=A,B contacts=73 search_radius=4.0>,
 <Interface protein_chains=J nucleic_chains=A,B contacts=59 search_radius=4.0>]

Check the official documentation for more information.

Features

  • Extract all Protein-DNA interfaces in a PDB entity, be it structure, model or chain;

  • Get all interacting residues in a interface, from protein and nucleic acids;

  • Get all interacting atoms in a interface, from protein and nucleic acids;

  • Interface data as pandas DataFrame;

  • Get nucleic acid binding protein;

  • Get nucleic acid binding domain;

  • Get all protein-bound nucleic acids;

  • Get all protein-bound double stranded nucleic acids;

  • Get all continous protein-bound nucleic acids. The minimum nucleic subsequence, which contains all protein-bound nucleotides;

  • Get all continous protein-bound double strand nucleic acids. The minimum nucleic subsequence, which contains all protein-bound base pairs;

  • Optionally fuse together polypeptides coming from a single protein chain;

To Do

  • let’s discuss some features, idk if we should implement them * Extract interfaces from given polypeptide * Extract interfaces from given nucleic acid * Extract interfaces from given double strand nucleic acid * Extract interfaces from given proteic atoms * Extract interfaces from given nucleic atoms * Maybe instead of given proteic or nucleic atoms, just make a Selector class or whatever to filter contacts and rebuild

  • Add padding as init parameter for interface builder. For binding domain and nucleic acids

  • Maybe fuse together methods of obtaining trimmed and non-trimmed stuff, with something like a trimmed: bool flag. The trimmed can also add padding or trim even more. Idk, discuss

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.