biointerface package

Submodules

biointerface.core module

Core module for extracting Protein-DNA interfaces.

class biointerface.core.Interface(structure, protein_chain_id, search_radius=4.0)[source]

Bases: object

Extract Protein-DNA interface.

Parameters

structureBio.PDB.Structure

Biopython Structure entity.

protein_chain_idstr

Chain id of a protein that may interact with DNA.

search_radiusfloat | int, optional

Search radius, measured in Armstrong, within which Protein-DNA interactions are found. Default is 4.0

as_dataframe() DataFrame[source]

Get all data from the interface, as a dataframe.

Contains the following data fields:

Residue hetero field Residue number Residue insertion code Residue name Atom name Atom alternate location Atom element Atomic coordinates (x, y, z) From both protein and DNA atoms Euclidean distance between atom pair in contact

Returns

dfpd.DataFrame

All data from the interface.

get_aminoacids() list[Residue | None][source]

Get only protein residues in the protein-DNA interface.

Returns

list[Residue]

List of protein residues in the interface.

get_atomic_contacts() list[tuple[Atom, Atom]][source]

Get interface contacts as pairs of atoms.

Returns

list[tuple[Atom, Atom]]

List of pairs of atoms, first one is DNA, second is proteic.

get_binding_domain(upstream_pad: int = 0, downstream_pad: int = 0, radius: int | float = 1.8, aa_only: int = 1)[source]

Get nucleic acid (NA) binding domain from the protein.

The output is the binding “gapped” subsequence of the full protein found in the structure.

This method allows for “gaps” of unbound aminoacids inside the binding domain, only the aminoacids at the ends are trimmed according to being bound to NAs or not.

A visual example of “gaps”:

Input full protein:          MQMLLNHKPTKFNGAIDERFHWKVIQRISGSEG

``NA-bound: **** ** ``

``Output binding domain: FNGAIDER ``

This method is only an inference of the NA-binding domain: while the output will likely align with the annotated true domain, it’ll likely not infer the whole domain. This is because a domain is defined by folding properties, while this method is much more naive. This is why I implemented some “padding” on both ends of the binding domain, it allows to be more lenient of the extent of the binding domain.

Parameters:

upstream_pad – Number of non-binding residues,

upstream of the first binding residue, to take inside the binding domain. Allows some leniency on what is considered a binding domain. :type radius: int

Parameters:

downstream_pad – Number of non-binding residues,

downstream of the last binding residue, to take inside the binding domain. Allows some leniency on what is considered a binding domain. :type radius: int

Parameters:

radius – Angstrom distance allowed for connections between

N and C atoms. :type radius: float

Parameters:

aa_only (int) – if 1, the residue needs to be a standard AA

Returns

binding_domainPolypeptide

Nucleic acid binding domain.

get_bound_double_strands() list[DoubleStrandNucleicAcid][source]

Get all double-strand nucleic acids bound by the protein.

The output double stranded nucleic acids (DSNAs) are subsequences of the full DSNAs found in the structure, since proteins usually do not bind the whole DSNA.

This method allows for “gaps” of unbound base-pairs inside the DSNA, only the base pairs at the ends are trimmed according to being protein-bound or not.

A visual example of “gaps”:

Input full DSNA:            GATATACAAGCCA

``Protein-bound: **** ** ``

``Output protein-bound DSNA: TATACAAG ``

Returns

bound_dsna_listlist[DoubleStrandNucleicAcid]

List of double-strand nucleic acids bound by the protein.

get_dna_atoms() list[Atom][source]

Get only DNA atoms in the protein-DNA interface.

Returns

list[Atom]

List of DNA atoms in the interface.

get_nucleotides() list[Residue | None][source]

Get only DNA residues in the protein-DNA interface.

Returns

list[Residue]

List of DNA residues in the interface.

get_protein_atoms() list[Atom][source]

Get only protein atoms in the protein-DNA interface.

Returns

list[Atom]

List of protein atoms in the interface.

biointerface.core.build_interfaces(structure, search_radius=4.0) list[Interface][source]

Extract all Protein-DNA interfaces found in a structure.

Parameters

structureBio.PDB.Structure

Biopython Structure entity.

search_radiusfloat | int, optional

Search radius, measured in Armstrong, within which Protein-DNA interactions are found. Default is 4.0

Returns

list

List of all Protein-DNA interfaces found in a structure.

Module contents

Top-level package for BioInterface.

class biointerface.Interface(structure, protein_chain_id, search_radius=4.0)[source]

Bases: object

Extract Protein-DNA interface.

Parameters

structureBio.PDB.Structure

Biopython Structure entity.

protein_chain_idstr

Chain id of a protein that may interact with DNA.

search_radiusfloat | int, optional

Search radius, measured in Armstrong, within which Protein-DNA interactions are found. Default is 4.0

as_dataframe() DataFrame[source]

Get all data from the interface, as a dataframe.

Contains the following data fields:

Residue hetero field Residue number Residue insertion code Residue name Atom name Atom alternate location Atom element Atomic coordinates (x, y, z) From both protein and DNA atoms Euclidean distance between atom pair in contact

Returns

dfpd.DataFrame

All data from the interface.

get_aminoacids() list[Residue | None][source]

Get only protein residues in the protein-DNA interface.

Returns

list[Residue]

List of protein residues in the interface.

get_atomic_contacts() list[tuple[Atom, Atom]][source]

Get interface contacts as pairs of atoms.

Returns

list[tuple[Atom, Atom]]

List of pairs of atoms, first one is DNA, second is proteic.

get_binding_domain(upstream_pad: int = 0, downstream_pad: int = 0, radius: int | float = 1.8, aa_only: int = 1)[source]

Get nucleic acid (NA) binding domain from the protein.

The output is the binding “gapped” subsequence of the full protein found in the structure.

This method allows for “gaps” of unbound aminoacids inside the binding domain, only the aminoacids at the ends are trimmed according to being bound to NAs or not.

A visual example of “gaps”:

Input full protein:          MQMLLNHKPTKFNGAIDERFHWKVIQRISGSEG

``NA-bound: **** ** ``

``Output binding domain: FNGAIDER ``

This method is only an inference of the NA-binding domain: while the output will likely align with the annotated true domain, it’ll likely not infer the whole domain. This is because a domain is defined by folding properties, while this method is much more naive. This is why I implemented some “padding” on both ends of the binding domain, it allows to be more lenient of the extent of the binding domain.

Parameters:

upstream_pad – Number of non-binding residues,

upstream of the first binding residue, to take inside the binding domain. Allows some leniency on what is considered a binding domain. :type radius: int

Parameters:

downstream_pad – Number of non-binding residues,

downstream of the last binding residue, to take inside the binding domain. Allows some leniency on what is considered a binding domain. :type radius: int

Parameters:

radius – Angstrom distance allowed for connections between

N and C atoms. :type radius: float

Parameters:

aa_only (int) – if 1, the residue needs to be a standard AA

Returns

binding_domainPolypeptide

Nucleic acid binding domain.

get_bound_double_strands() list[DoubleStrandNucleicAcid][source]

Get all double-strand nucleic acids bound by the protein.

The output double stranded nucleic acids (DSNAs) are subsequences of the full DSNAs found in the structure, since proteins usually do not bind the whole DSNA.

This method allows for “gaps” of unbound base-pairs inside the DSNA, only the base pairs at the ends are trimmed according to being protein-bound or not.

A visual example of “gaps”:

Input full DSNA:            GATATACAAGCCA

``Protein-bound: **** ** ``

``Output protein-bound DSNA: TATACAAG ``

Returns

bound_dsna_listlist[DoubleStrandNucleicAcid]

List of double-strand nucleic acids bound by the protein.

get_dna_atoms() list[Atom][source]

Get only DNA atoms in the protein-DNA interface.

Returns

list[Atom]

List of DNA atoms in the interface.

get_nucleotides() list[Residue | None][source]

Get only DNA residues in the protein-DNA interface.

Returns

list[Residue]

List of DNA residues in the interface.

get_protein_atoms() list[Atom][source]

Get only protein atoms in the protein-DNA interface.

Returns

list[Atom]

List of protein atoms in the interface.

biointerface.build_interfaces(structure, search_radius=4.0) list[Interface][source]

Extract all Protein-DNA interfaces found in a structure.

Parameters

structureBio.PDB.Structure

Biopython Structure entity.

search_radiusfloat | int, optional

Search radius, measured in Armstrong, within which Protein-DNA interactions are found. Default is 4.0

Returns

list

List of all Protein-DNA interfaces found in a structure.