Usage

To use BioInterface in a Python project:

import biointerface

Extract All Protein-Nucleic Acid Interfaces

You can extract all Protein-Nucleic acids interfaces from an entire structure.

from Bio.PDB.PDBList import PDBList
from Bio.PDB.MMCIFParser import MMCIFParser
from biointerface import InterfaceBuilder

# retrive file from PDB using Biopython
pdbl = PDBList()
pdbl.retrieve_assembly_file(pdb_code="1A02", assembly_num=1, pdir=".")
# ... or else use your own

# parse and build structure with Biopython
parser = MMCIFParser()
structure = parser.get_structure(
    structure_id="1A02", filename="1a02-assembly1.cif"
)

face_builder = InterfaceBuilder(search_radius=4.0)
face_list = face_builder.build_interfaces(entity=structure)
face_list
[<Interface protein_chains=N nucleic_chains=A,B contacts=143 search_radius=4.0>,
 <Interface protein_chains=F nucleic_chains=A,B contacts=73 search_radius=4.0>,
 <Interface protein_chains=J nucleic_chains=A,B contacts=59 search_radius=4.0>]

Extract Protein-Nucleic Acid Interfaces

You can also filter Protein-Nucleic Acid interface from various properties.

# filter by protein chain
faces_filt = [
    face
    for face in face_list
    if face.get_binding_protein()[0].parent.id == "F"
]
faces_filt
[<Interface protein_chains=F nucleic_chains=A,B contacts=73 search_radius=4.0>]

Get All Interacting Residues

You can access all interacting residues in a Protein-DNA interface, both aminoacids and nucleotides.

face = faces_filt[0]
face.get_aminoacids()
[<Residue ARG het=  resseq=146 icode= >,
 <Residue ASN het=  resseq=147 icode= >,
 <Residue LYS het=  resseq=148 icode= >,
 <Residue ARG het=  resseq=158 icode= >,
 <Residue SER het=  resseq=154 icode= >,
 <Residue ARG het=  resseq=155 icode= >,
 <Residue ALA het=  resseq=151 icode= >,
 <Residue LYS het=  resseq=153 icode= >,
 <Residue ALA het=  resseq=150 icode= >,
 <Residue ARG het=  resseq=157 icode= >,
 <Residue ARG het=  resseq=144 icode= >]
face.get_nucleotides()
[<Residue DA het=  resseq=5005 icode= >,
 <Residue DT het=  resseq=4015 icode= >,
 <Residue DT het=  resseq=5004 icode= >,
 <Residue DC het=  resseq=4016 icode= >,
 <Residue DG het=  resseq=5007 icode= >,
 <Residue DA het=  resseq=4017 icode= >,
 <Residue DT het=  resseq=4013 icode= >,
 <Residue DT het=  resseq=5006 icode= >,
 <Residue DT het=  resseq=4014 icode= >]

Get All Interacting Atoms

You can access all interacting atoms in a Protein-Nucleic acid interface.

First of all you can get all interacting atoms as atom pairs.

contacts = face.get_atomic_contacts()
contacts[:5]
[(<Atom C5'>, <Atom CE>),
 (<Atom O5'>, <Atom CE>),
 (<Atom OP2>, <Atom NZ>),
 (<Atom OP2>, <Atom CE>),
 (<Atom OP2>, <Atom CD>)]

You can also get all Protein or DNA interacting atoms, independently.

atoms = face.get_protein_atoms()
atoms[:5]
[<Atom CA>, <Atom CD>, <Atom NH2>, <Atom O>, <Atom NE>]
atoms = face.get_nucleic_acid_atoms()
atoms[:5]
[<Atom O5'>, <Atom O5'>, <Atom C3'>, <Atom C2'>, <Atom OP2>]

Interface Data as DataFrame

You can get all Protein-DNA interface features as a pandas DataFrame.

df = face.as_dataframe()
df.columns
Index(['protein_chain_id', 'prot_res_hetfield', 'prot_res_number',
   'prot_res_icode', 'prot_res_name', 'prot_atom_name', 'prot_atom_altloc',
   'prot_atom_element', 'prot_atom_coord_x', 'prot_atom_coord_y',
   'prot_atom_coord_z', 'dna_chain_id', 'dna_res_hetfield',
   'dna_res_number', 'dna_res_icode', 'dna_res_name', 'dna_atom_name',
   'dna_atom_altloc', 'dna_atom_element', 'dna_atom_coord_x',
   'dna_atom_coord_y', 'dna_atom_coord_z', 'euclidean_distance'],
  dtype='object')
df
   protein_chain_id prot_res_hetfield  prot_res_number prot_res_icode  ... euclidean_distance
0                 F                                148                 ...           3.964944
1                 F                                148                 ...           3.271817
2                 F                                148                 ...           3.719538
3                 F                                148                 ...           3.183074
4                 F                                148                 ...           3.976905
..              ...               ...              ...            ...  ...                ...
68                F                                150                 ...           3.279753
69                F                                150                 ...           3.368906
70                F                                150                 ...           3.584772
71                F                                154                 ...           3.474709
72                F                                154                 ...           3.930451

[73 rows x 23 columns]

Nucleic Acid Binding Protein

BioInterface can extract the nucleic acid binding protein.

pp_list = face.get_binding_proteins()
pp = pp_list[0]
pp.get_sequence()
Seq('RRIRRERNKMAAAKSRNRRRELTDTLQAETDQLEDEKSALQTEIANLLKEKEK')

BioInterface can also extract the nucleic acid binding domain of the input protein by extracting the minimum protein subsequence, which contains all nucleic acid binding aminoacids.

bd_list = face.get_binding_domains()
bd.get_sequence()
Seq('RERNKMAAAKSRNRR')

Protein-Bound Nucleic Acids

BioInterface can extract all nucleic acids bound by the input protein, as a NucleicAcid class from the package PDBNucleicAcids.

face.get_bound_nucleic_acids()
[<NucleicAcid chain='A' type='DNA' start=4001 end=4020>,
 <NucleicAcid chain='B' type='DNA' start=5001 end=5020>]

BioInterface can also extract all double strand nucleic acids bound by the input protein, as a DoubleStrandNucleicAcid class from the package PDBNucleicAcids.

face.get_bound_double_strands()
[<DoubleStrandNucleicAcid type='dsDNA' strand ids='A:B' length=18>]

BioInterface can extract all nucleic acids bound by the input protein, by extracting the minimum nucleic subsequence, which contains all protein-bound nucleotides.

face.get_trimmed_nucleic_acids()
[<NucleicAcid chain='A' type='DNA' start=4013 end=4017>,
 <NucleicAcid chain='B' type='DNA' start=5004 end=5007>]

Same thing with double strand nucleic acids, by extracting the minimum nucleic subsequence, which contains all protein-bound base pairs.

face.get_trimmed_double_strands()
[<DoubleStrandNucleicAcid type='dsDNA' strand ids='A:B' length=7>]

The length of the DoubleStrandNucleicAcid was halfed because we extracted only the portion of interest, the portion that binds directly with the protein.

Note: Classes from PDBNucleicAcids have useful methods by their own.

dsna_list = face.get_trimmed_double_strands()
dsna = dsna_list[0]
dsna.get_i_strand().get_seq()
Seq('TTTCATA')

Build Interfaces by Chain or Structure

By default, an interface corresponds to a single polypetide.

However there are many PDB files that have unmodeled pieces of proteins, which causes a single proteic chain to be split into many polypeptides during the building process. BioInterface can address this issue by building an interface with polypeptides coming from a single proteic chain.

First of all this is the default behavior:

pdbl = PDBList()
pdbl.retrieve_assembly_file(pdb_code="1BSS", assembly_num=1, pdir=".")

# parse and build structure with Biopython
parser = MMCIFParser()
structure = parser.get_structure(
    structure_id="1BSS", filename=f"1bss-assembly1.cif"
)

face_builder = InterfaceBuilder(search_radius=4.0)
face_list = face_builder.build_interfaces(entity=structure, by="polypeptide")

for face in face_list:
    print(face)
    print(face.get_binding_protein())
<Interface protein_chains=A nucleic_chains=C,D contacts=41 search_radius=4.0>
[<Polypeptide start=3 end=96>]
<Interface protein_chains=A nucleic_chains=C contacts=35 search_radius=4.0>
[<Polypeptide start=102 end=141>]
<Interface protein_chains=A nucleic_chains=C,D contacts=84 search_radius=4.0>
[<Polypeptide start=149 end=245>]
<Interface protein_chains=B nucleic_chains=C,D contacts=55 search_radius=4.0>
[<Polypeptide start=19 end=97>]
<Interface protein_chains=B nucleic_chains=D contacts=40 search_radius=4.0>
[<Polypeptide start=102 end=140>]
<Interface protein_chains=B nucleic_chains=C,D contacts=63 search_radius=4.0>
[<Polypeptide start=147 end=245>]

And this is building interfaces by chain:

face_list = face_builder.build_interfaces(entity=structure, by="chain")

for face in face_list:
    print(face)
    print(face.get_binding_proteins())
<Interface protein_chains=A,A,A nucleic_chains=C,D contacts=160 search_radius=4.0>
[<Polypeptide start=3 end=96>, <Polypeptide start=102 end=141>, <Polypeptide start=149 end=245>]
<Interface protein_chains=B,B,B nucleic_chains=C,D contacts=158 search_radius=4.0>
[<Polypeptide start=19 end=97>, <Polypeptide start=102 end=140>, <Polypeptide start=147 end=245>]

It’s also possible to build a single interface for the whole PDB structure.

face_list = face_builder.build_interfaces(entity=structure, by="structure")
face = face_list[0]
print(face)
print(face.get_binding_proteins())
<Interface protein_chains=A,A,A,B,B,B nucleic_chains=C,D contacts=318 search_radius=4.0>
[<Polypeptide start=3 end=96>,
 <Polypeptide start=102 end=141>,
 <Polypeptide start=149 end=245>,
 <Polypeptide start=19 end=97>,
 <Polypeptide start=102 end=140>,
 <Polypeptide start=147 end=245>]

In these cases, when using by=”chain” or by=”structure” we might want to concatenate the list of binding proteins into a single polypeptide. We have developed an helper function just for this task. It can also be helpful to reset Interface.binding_pp_list.

face_list = face_builder.build_interfaces(entity=structure, by="structure")
face = face_list[0]
print(face)
print(face.get_binding_proteins())

face.binding_pp_list = concat_polypeptides(face.get_binding_proteins())
print(face)
print(face.get_binding_proteins())
<Interface protein_chains=A,A,A,B,B,B nucleic_chains=C,D contacts=318 search_radius=4.0>
[<Polypeptide start=3 end=96>,
 <Polypeptide start=102 end=141>,
 <Polypeptide start=149 end=245>,
 <Polypeptide start=19 end=97>,
 <Polypeptide start=102 end=140>,
 <Polypeptide start=147 end=245>]
 <Interface protein_chains=A,B nucleic_chains=C,D contacts=318 search_radius=4.0>
[<Polypeptide start=3 end=245>, <Polypeptide start=19 end=245>]