Usage¶
To use BioInterface in a Python project:
import biointerface
Extract All Protein-Nucleic Acid Interfaces¶
You can extract all Protein-Nucleic acids interfaces from an entire structure.
from Bio.PDB.PDBList import PDBList
from Bio.PDB.MMCIFParser import MMCIFParser
from biointerface import InterfaceBuilder
# retrive file from PDB using Biopython
pdbl = PDBList()
pdbl.retrieve_assembly_file(pdb_code="1A02", assembly_num=1, pdir=".")
# ... or else use your own
# parse and build structure with Biopython
parser = MMCIFParser()
structure = parser.get_structure(
structure_id="1A02", filename="1a02-assembly1.cif"
)
face_builder = InterfaceBuilder(search_radius=4.0)
face_list = face_builder.build_interfaces(entity=structure)
face_list
[<Interface chains=N:B contacts=143 search_radius=4.0>,
<Interface chains=F:AB contacts=73 search_radius=4.0>,
<Interface chains=J:AB contacts=59 search_radius=4.0>]
Extract One Protein-Nucleic Acid Interface¶
You can also extract a single Protein-Nucleic Acid interface from a single protein chain.
# filter by protein chain
faces_filt = [face for face in face_list if face.get_aminoacids()[0].parent.id == "F"]
faces_filt
[<Interface chains=F:AB contacts=73 search_radius=4.0>]
Get All Interacting Residues¶
You can access all interacting residues in a Protein-DNA interface, both aminoacids and nucleotides.
face = faces_filt[0]
face.get_aminoacids()
[<Residue ARG het= resseq=146 icode= >,
<Residue ASN het= resseq=147 icode= >,
<Residue LYS het= resseq=148 icode= >,
<Residue ARG het= resseq=158 icode= >,
<Residue SER het= resseq=154 icode= >,
<Residue ARG het= resseq=155 icode= >,
<Residue ALA het= resseq=151 icode= >,
<Residue LYS het= resseq=153 icode= >,
<Residue ALA het= resseq=150 icode= >,
<Residue ARG het= resseq=157 icode= >,
<Residue ARG het= resseq=144 icode= >]
face.get_nucleotides()
[<Residue DA het= resseq=5005 icode= >,
<Residue DT het= resseq=4015 icode= >,
<Residue DT het= resseq=5004 icode= >,
<Residue DC het= resseq=4016 icode= >,
<Residue DG het= resseq=5007 icode= >,
<Residue DA het= resseq=4017 icode= >,
<Residue DT het= resseq=4013 icode= >,
<Residue DT het= resseq=5006 icode= >,
<Residue DT het= resseq=4014 icode= >]
Get All Interacting Atoms¶
You can access all interacting atoms in a Protein-Nucleic acid interface.
First of all you can get all interacting atoms as atom pairs.
contacts = face.get_atomic_contacts()
contacts[:5]
[(<Atom C5'>, <Atom CE>),
(<Atom O5'>, <Atom CE>),
(<Atom OP2>, <Atom NZ>),
(<Atom OP2>, <Atom CE>),
(<Atom OP2>, <Atom CD>)]
You can also get all Protein or DNA interacting atoms, independently.
atoms = face.get_protein_atoms()
atoms[:5]
[<Atom CA>, <Atom CD>, <Atom NH2>, <Atom O>, <Atom NE>]
atoms = face.get_nucleic_acid_atoms()
atoms[:5]
[<Atom O5'>, <Atom O5'>, <Atom C3'>, <Atom C2'>, <Atom OP2>]
Interface Data as DataFrame¶
You can get all Protein-DNA interface features as a pandas DataFrame.
df = face.as_dataframe()
df.columns
Index(['protein_chain_id', 'prot_res_hetfield', 'prot_res_number',
'prot_res_icode', 'prot_res_name', 'prot_atom_name', 'prot_atom_altloc',
'prot_atom_element', 'prot_atom_coord_x', 'prot_atom_coord_y',
'prot_atom_coord_z', 'dna_chain_id', 'dna_res_hetfield',
'dna_res_number', 'dna_res_icode', 'dna_res_name', 'dna_atom_name',
'dna_atom_altloc', 'dna_atom_element', 'dna_atom_coord_x',
'dna_atom_coord_y', 'dna_atom_coord_z', 'euclidean_distance'],
dtype='object')
df
protein_chain_id prot_res_hetfield prot_res_number prot_res_icode ... euclidean_distance
0 F 148 ... 3.964944
1 F 148 ... 3.271817
2 F 148 ... 3.719538
3 F 148 ... 3.183074
4 F 148 ... 3.976905
.. ... ... ... ... ... ...
68 F 150 ... 3.279753
69 F 150 ... 3.368906
70 F 150 ... 3.584772
71 F 154 ... 3.474709
72 F 154 ... 3.930451
[73 rows x 23 columns]
Protein-Bound Nucleic Acids¶
BioInterface can extract all double-strand nucleic acids bound by
the input protein, as a DoubleStrandNucleicAcid class from the package
PDBNucleicAcids.
bound_dsna_list = face.get_bound_double_strands()
bound_dsna = bound_dsna_list[0]
bound_dsna
<DoubleStrandNucleicAcid type='dsDNA' strand ids='A:B' length=7>
The DoubleStrandNucleicAcid class has other useful methods.
bound_dsna.get_i_strand().get_seq()
Seq('TTTCATA')
Nucleic Acid Binding Domain¶
BioInterface can extract the nucleic acid binding domain of the input protein by extracting the minimum protein subsequence, which contains all nucleic acid binding aminoacids.
bd = face.get_binding_domain()
bd.get_sequence()
Seq('RERNKMAAAKSRNRR')