===== Usage ===== To use BioInterface in a Python project: .. code-block:: python import biointerface Extract All Protein-Nucleic Acid Interfaces ------------------------------------------- You can extract all Protein-Nucleic acids interfaces from an entire structure. .. code-block:: python from Bio.PDB.PDBList import PDBList from Bio.PDB.MMCIFParser import MMCIFParser from biointerface import InterfaceBuilder # retrive file from PDB using Biopython pdbl = PDBList() pdbl.retrieve_assembly_file(pdb_code="1A02", assembly_num=1, pdir=".") # ... or else use your own # parse and build structure with Biopython parser = MMCIFParser() structure = parser.get_structure( structure_id="1A02", filename="1a02-assembly1.cif" ) face_builder = InterfaceBuilder(search_radius=4.0) face_list = face_builder.build_interfaces(entity=structure) face_list .. code-block:: console [, , ] Extract Protein-Nucleic Acid Interfaces --------------------------------------- You can also filter Protein-Nucleic Acid interface from various properties. .. code-block:: python # filter by protein chain faces_filt = [ face for face in face_list if face.get_binding_protein()[0].parent.id == "F" ] faces_filt .. code-block:: console [] Get All Interacting Residues ---------------------------- You can access all interacting residues in a Protein-DNA interface, both aminoacids and nucleotides. .. code-block:: python face = faces_filt[0] face.get_aminoacids() .. code-block:: console [, , , , , , , , , , ] .. code-block:: python face.get_nucleotides() .. code-block:: console [, , , , , , , , ] Get All Interacting Atoms ------------------------- You can access all interacting atoms in a Protein-Nucleic acid interface. First of all you can get all interacting atoms as atom pairs. .. code-block:: python contacts = face.get_atomic_contacts() contacts[:5] .. code-block:: console [(, ), (, ), (, ), (, ), (, )] You can also get all Protein or DNA interacting atoms, independently. .. code-block:: python atoms = face.get_protein_atoms() atoms[:5] .. code-block:: console [, , , , ] .. code-block:: python atoms = face.get_nucleic_acid_atoms() atoms[:5] .. code-block:: console [, , , , ] Interface Data as DataFrame --------------------------- You can get all Protein-DNA interface features as a ``pandas`` DataFrame. .. code-block:: python df = face.as_dataframe() df.columns .. code-block:: console Index(['protein_chain_id', 'prot_res_hetfield', 'prot_res_number', 'prot_res_icode', 'prot_res_name', 'prot_atom_name', 'prot_atom_altloc', 'prot_atom_element', 'prot_atom_coord_x', 'prot_atom_coord_y', 'prot_atom_coord_z', 'dna_chain_id', 'dna_res_hetfield', 'dna_res_number', 'dna_res_icode', 'dna_res_name', 'dna_atom_name', 'dna_atom_altloc', 'dna_atom_element', 'dna_atom_coord_x', 'dna_atom_coord_y', 'dna_atom_coord_z', 'euclidean_distance'], dtype='object') .. code-block:: python df .. code-block:: console protein_chain_id prot_res_hetfield prot_res_number prot_res_icode ... euclidean_distance 0 F 148 ... 3.964944 1 F 148 ... 3.271817 2 F 148 ... 3.719538 3 F 148 ... 3.183074 4 F 148 ... 3.976905 .. ... ... ... ... ... ... 68 F 150 ... 3.279753 69 F 150 ... 3.368906 70 F 150 ... 3.584772 71 F 154 ... 3.474709 72 F 154 ... 3.930451 [73 rows x 23 columns] Nucleic Acid Binding Protein --------------------------- BioInterface can extract the nucleic acid binding protein. .. code-block:: python pp_list = face.get_binding_proteins() pp = pp_list[0] pp.get_sequence() .. code-block:: console Seq('RRIRRERNKMAAAKSRNRRRELTDTLQAETDQLEDEKSALQTEIANLLKEKEK') BioInterface can also extract the nucleic acid binding domain of the input protein by extracting the minimum protein subsequence, which contains all nucleic acid binding aminoacids. .. code-block:: python bd_list = face.get_binding_domains() bd.get_sequence() .. code-block:: console Seq('RERNKMAAAKSRNRR') Protein-Bound Nucleic Acids --------------------------- BioInterface can extract all nucleic acids bound by the input protein, as a ``NucleicAcid`` class from the package PDBNucleicAcids_. .. code-block:: python face.get_bound_nucleic_acids() .. code-block:: console [, ] BioInterface can also extract all double strand nucleic acids bound by the input protein, as a ``DoubleStrandNucleicAcid`` class from the package PDBNucleicAcids_. .. code-block:: python face.get_bound_double_strands() .. code-block:: console [] BioInterface can extract all nucleic acids bound by the input protein, by extracting the minimum nucleic subsequence, which contains all protein-bound nucleotides. .. code-block:: python face.get_trimmed_nucleic_acids() .. code-block:: console [, ] Same thing with double strand nucleic acids, by extracting the minimum nucleic subsequence, which contains all protein-bound base pairs. .. code-block:: python face.get_trimmed_double_strands() .. code-block:: console [] The length of the ``DoubleStrandNucleicAcid`` was halfed because we extracted only the portion of interest, the portion that binds directly with the protein. **Note**: Classes from PDBNucleicAcids_ have useful methods by their own. .. code-block:: python dsna_list = face.get_trimmed_double_strands() dsna = dsna_list[0] dsna.get_i_strand().get_seq() .. code-block:: console Seq('TTTCATA') Build Interfaces by Chain or Structure -------------------------------------- By default, an interface corresponds to a single polypetide. However there are many PDB files that have unmodeled pieces of proteins, which causes a single proteic chain to be split into many polypeptides during the building process. BioInterface can address this issue by building an interface with polypeptides coming from a single proteic chain. First of all this is the default behavior: .. code-block:: python pdbl = PDBList() pdbl.retrieve_assembly_file(pdb_code="1BSS", assembly_num=1, pdir=".") # parse and build structure with Biopython parser = MMCIFParser() structure = parser.get_structure( structure_id="1BSS", filename=f"1bss-assembly1.cif" ) face_builder = InterfaceBuilder(search_radius=4.0) face_list = face_builder.build_interfaces(entity=structure, by="polypeptide") for face in face_list: print(face) print(face.get_binding_protein()) .. code-block:: console [] [] [] [] [] [] And this is building interfaces by chain: .. code-block:: python face_list = face_builder.build_interfaces(entity=structure, by="chain") for face in face_list: print(face) print(face.get_binding_proteins()) .. code-block:: console [, , ] [, , ] It's also possible to build a single interface for the whole PDB structure. .. code-block:: python face_list = face_builder.build_interfaces(entity=structure, by="structure") face = face_list[0] print(face) print(face.get_binding_proteins()) .. code-block:: console [, , , , , ] In these cases, when using `by="chain"` or `by="structure"` we might want to concatenate the list of binding proteins into a single polypeptide. We have developed an helper function just for this task. It can also be helpful to reset `Interface.binding_pp_list`. .. code-block:: python face_list = face_builder.build_interfaces(entity=structure, by="structure") face = face_list[0] print(face) print(face.get_binding_proteins()) face.binding_pp_list = concat_polypeptides(face.get_binding_proteins()) print(face) print(face.get_binding_proteins()) .. code-block:: console [, , , , , ] [, ] .. _PDBNucleicAcids: https://gitlab.com/MorfeoRenai/pdbnucleicacids