biointerface package¶

Submodules¶

biointerface.core module¶

Core module for extracting Protein-nucleic acid interfaces.

class biointerface.core.Interface(pp_atoms: list[Atom], na_atoms: list[Atom], search_radius: int | float = 4.0, pp_list: list[Polypeptide] = [], na_list: list[NucleicAcid] = [], dsna_list: list[DoubleStrandNucleicAcid] = [])[source]¶

Bases: object

Class for Protein-nucleic acid interface.

Parameters:

pp_atoms (list[Atom]) – Protein atoms.
na_atoms (list[Atom]) – Nucleic atoms.
search_radius (float | int) – Search radius, measured in Angstrom, within which Protein-nucleic acid interactions are found. Default is 4.0.
pp_list (list[Polypeptide]) – List of proteins found in the entity given to the builder. Default is an empty list.
na_list (list[NucleicAcid]) – List of nucleic acids found in the entity given to the builder. Default is an empty list.
dsna_list (list[DoubleStrandNucleicAcid]) – List of double stranded nucleic acids found in the entity given to the builder. Default is an empty list.

as_dataframe() → DataFrame[source]¶

Get all data from the interface, as a dataframe.

Contains the following data fields:: Residue hetero field Residue number Residue insertion code Residue name Atom name Atom alternate location Atom element Atomic coordinates (x, y, z) From both protein and nucleic acid atoms Euclidean distance between atom pair in contact

Returns¶

dfpd.DataFrame: All data from the interface.

get_aminoacids() → list[Residue][source]¶

Get only protein residues in the protein-nucleic acid interface.

Returns¶

list[Residue]: List of protein residues in the interface.

get_atomic_contacts() → list[tuple[Atom, Atom]][source]¶

Get interface contacts as pairs of atoms.

Returns¶

list[tuple[Atom, Atom]]: List of pairs of atoms, first one is from the nucleic acids, second one is from the protein.

get_binding_domains(upstream_pad: int = 0, downstream_pad: int = 0) → list[Polypeptide][source]¶

Get nucleic acid binding domains from the binding proteins.

The output is the binding “gapped” subsequence of the full protein found in the structure.

This method allows for “gaps” of unbound aminoacids inside the binding domain, only the aminoacids at the ends are trimmed according to being bound to nucleic acids (NAs) or not.

A visual example of “gaps”:: ` Input full protein: MQMLLNHKPTKFNGAIDERFHWKVIQRISGSEG NA-bound: **** ** Output binding domain: FNGAIDER `

This method is only an inference of the NA-binding domain: while the output will likely align with the annotated true domain, it’ll likely not infer the whole domain. This is because a domain is defined by folding properties, while this method is much more naive. This is why I implemented some “padding” on both ends of the binding domain, it allows to be more lenient of the extent of the binding domain.

Parameters:

upstream_pad (int) – Number of non-binding residues, upstream of the first binding residue, to take inside the binding domain. Allows some leniency on what is considered a binding domain.
downstream_pad (int) – Number of non-binding residues, downstream of the last binding residue, to take inside the binding domain. Allows some leniency on what is considered a binding domain.

Returns¶

list[Polypeptide]: List of nucleic acid binding domains.

get_binding_proteins() → list[Polypeptide][source]¶

Get all nucleic acid binding proteins.

Returns¶

list[Polypeptide]: List of nucleic acid binding proteins.

get_bound_double_strands() → list[DoubleStrandNucleicAcid][source]¶

Get all double strand nucleic acids bound by the protein.

Returns¶

list[DoubleStrandNucleicAcid]: List of double strand nucleic acids bound by the protein.

get_bound_nucleic_acids() → list[NucleicAcid][source]¶

Get all nucleic acids bound by the protein.

Returns¶

list[NucleicAcid]: List of nucleic acids bound by the protein.

get_nucleic_acid_atoms() → list[Atom][source]¶

Get only nucleic acid atoms in the protein-nucleic acid interface.

Returns¶

list[Atom]: List of nucleic acid atoms in the interface.

get_nucleotides() → list[Residue][source]¶

Get only nucleic acid residues in the protein-nucleic acid interface.

Returns¶

list[Residue]: List of nucleic acid residues in the interface.

get_protein_atoms() → list[Atom][source]¶

Get only protein atoms in the protein-nucleic acid interface.

Returns¶

list[Atom]: List of protein atoms in the interface.

get_trimmed_double_strands() → list[DoubleStrandNucleicAcid][source]¶

Get all double-strand nucleic acids bound by the protein, but trimmed by binding.

The output double stranded nucleic acids (DSNAs) are subsequences of the full DSNAs found in the structure, since proteins usually do not bind the whole DSNA found in a PDB structure.

This method allows for “gaps” of unbound base pairs inside the DSNA, only the base pairs at the ends are trimmed according to being protein-bound or not.

A visual example of “gaps”:

``` Input full DSNA: GATATACAAGCCA

TGGCTTGTATATC

Protein-bound: **** ** Output protein-bound DSNA: TATACAAG

CTTGTATA

```

Returns¶

list[DoubleStrandNucleicAcid]: List of double stranded nucleic acids bound by the protein, but trimmed by binding.

get_trimmed_nucleic_acids() → list[NucleicAcid][source]¶

Get all nucleic acids bound by the protein, but trimmed by binding.

The output nucleic acids (NAs) are subsequences of the full NAs found in the structure, since proteins might not bind the whole NA.

This method allows for “gaps” of unbound nucleotides inside the NA, only the nucleotides at the ends are trimmed according to being protein-bound or not.

A visual example of “gaps”:: ` Input full NA: GATATACAAGCCA Protein-bound: **** ** Output protein-bound NA: TATACAAG `

Returns¶

list[NucleicAcid]: List of nucleic acids bound by the protein, but trimmed by binding.

class biointerface.core.InterfaceBuilder(search_radius: float | int = 4.0, pp_builder: PPBuilder = <Bio.PDB.Polypeptide.PPBuilder object>, na_builder: NABuilder = <PDBNucleicAcids.NucleicAcid.NABuilder object>, dsna_builder: DSNABuilder = <PDBNucleicAcids.NucleicAcid.DSNABuilder object>)[source]¶

Bases: object

Use atomic distance to find Protein-Nucleic acid interfaces.

Assuming you only want standard nucleotides and amino acids.

Parameters¶

search_radiusfloat | int, optional: Search radius, measured in Angstrom, within which Protein-Nucleic acid interactions are found. Default is 4.0
pp_builderPPBuilder, optional: Polypeptide builder class from Biopython. Default is PPBuilder with default parameters.
na_builderNABuilder, optional: Polypeptide builder class from PDBNucleicAcids. Default is NABuilder with default parameters.
dsna_builderDSNABuilder, optional: Polypeptide builder class from PDBNucleicAcids. Default is DSNABuilder with default parameters.

build_interfaces(entity: Structure | Model | Chain, by: Literal['polypeptide', 'chain', 'structure']='polypeptide', standard_aminoacids: bool = True, standard_nucleotides: bool = True, pairing_rules: BasePairRules = <PDBNucleicAcids.BasePairRules.WatsonCrickBasePairRules object>) → list[Interface][source]¶

Extract all Protein-Nucleic acid interfaces found in a PDB entity.

Parameters¶

entityL{Structure}, L{Model} or L{Chain}: Protein-nucleic acid interfaces are searched for in this object. L{Structure} is the suggested input.
by: str, optional: If ‘polypeptide’, interfaces are extracted between nucleic acids bound by one polypeptide. If ‘chain’, interfaces are extracted between nucleic acids bound by one protein chain, composed by one or more polypeptides. If ‘structure’, interfaces are extracted between nucleic acids bound by all protein chains present in the structure, composed by one or more polypeptides. Most likely several polypeptides.
standard_aminoacids: bool, optional: Use only standard aminoacids. This is the aa_only parameter in the PPBuilder.build_peptides() method. Default is True.
standard_nucleotides: bool, optional: Use only standard nucleotides. This parameter is used in the NABuilder.build_nucleic_acids() method and in the DSNABuilder.build_double_strands() method. Default is True.
pairing_rulesoptional: Rules for proper base pairing class instance from PDBNucleicAcids. This parameter is used in the DSNABuilder.build_double_strands() method. Default is WatsonCrickBasePairRules() with default parameters.

Raises¶

PDBConstructionException: In case there is no protein
in the input entity.

PDBConstructionException: In case there is no nucleic acid
in the input entity.

Returns¶

list[Interface]: List of all Protein-Nucleic acid interfaces found in a PDB entity.

biointerface.core.concat_polypeptides(pp_list: list[Polypeptide]) → list[Polypeptide][source]¶

Module contents¶

Top-level package for BioInterface.

class biointerface.Interface(pp_atoms: list[Atom], na_atoms: list[Atom], search_radius: int | float = 4.0, pp_list: list[Polypeptide] = [], na_list: list[NucleicAcid] = [], dsna_list: list[DoubleStrandNucleicAcid] = [])[source]¶

Bases: object

Class for Protein-nucleic acid interface.

Parameters:

pp_atoms (list[Atom]) – Protein atoms.
na_atoms (list[Atom]) – Nucleic atoms.
search_radius (float | int) – Search radius, measured in Angstrom, within which Protein-nucleic acid interactions are found. Default is 4.0.
pp_list (list[Polypeptide]) – List of proteins found in the entity given to the builder. Default is an empty list.
na_list (list[NucleicAcid]) – List of nucleic acids found in the entity given to the builder. Default is an empty list.
dsna_list (list[DoubleStrandNucleicAcid]) – List of double stranded nucleic acids found in the entity given to the builder. Default is an empty list.

as_dataframe() → DataFrame[source]¶

Get all data from the interface, as a dataframe.

Contains the following data fields:: Residue hetero field Residue number Residue insertion code Residue name Atom name Atom alternate location Atom element Atomic coordinates (x, y, z) From both protein and nucleic acid atoms Euclidean distance between atom pair in contact

Returns¶

dfpd.DataFrame: All data from the interface.

get_aminoacids() → list[Residue][source]¶

Get only protein residues in the protein-nucleic acid interface.

Returns¶

list[Residue]: List of protein residues in the interface.

get_atomic_contacts() → list[tuple[Atom, Atom]][source]¶

Get interface contacts as pairs of atoms.

Returns¶

list[tuple[Atom, Atom]]: List of pairs of atoms, first one is from the nucleic acids, second one is from the protein.

get_binding_domains(upstream_pad: int = 0, downstream_pad: int = 0) → list[Polypeptide][source]¶

Get nucleic acid binding domains from the binding proteins.

The output is the binding “gapped” subsequence of the full protein found in the structure.

This method allows for “gaps” of unbound aminoacids inside the binding domain, only the aminoacids at the ends are trimmed according to being bound to nucleic acids (NAs) or not.

A visual example of “gaps”:: ` Input full protein: MQMLLNHKPTKFNGAIDERFHWKVIQRISGSEG NA-bound: **** ** Output binding domain: FNGAIDER `

This method is only an inference of the NA-binding domain: while the output will likely align with the annotated true domain, it’ll likely not infer the whole domain. This is because a domain is defined by folding properties, while this method is much more naive. This is why I implemented some “padding” on both ends of the binding domain, it allows to be more lenient of the extent of the binding domain.

Parameters:

upstream_pad (int) – Number of non-binding residues, upstream of the first binding residue, to take inside the binding domain. Allows some leniency on what is considered a binding domain.
downstream_pad (int) – Number of non-binding residues, downstream of the last binding residue, to take inside the binding domain. Allows some leniency on what is considered a binding domain.

Returns¶

list[Polypeptide]: List of nucleic acid binding domains.

get_binding_proteins() → list[Polypeptide][source]¶

Get all nucleic acid binding proteins.

Returns¶

list[Polypeptide]: List of nucleic acid binding proteins.

get_bound_double_strands() → list[DoubleStrandNucleicAcid][source]¶

Get all double strand nucleic acids bound by the protein.

Returns¶

list[DoubleStrandNucleicAcid]: List of double strand nucleic acids bound by the protein.

get_bound_nucleic_acids() → list[NucleicAcid][source]¶

Get all nucleic acids bound by the protein.

Returns¶

list[NucleicAcid]: List of nucleic acids bound by the protein.

get_nucleic_acid_atoms() → list[Atom][source]¶

Get only nucleic acid atoms in the protein-nucleic acid interface.

Returns¶

list[Atom]: List of nucleic acid atoms in the interface.

get_nucleotides() → list[Residue][source]¶

Get only nucleic acid residues in the protein-nucleic acid interface.

Returns¶

list[Residue]: List of nucleic acid residues in the interface.

get_protein_atoms() → list[Atom][source]¶

Get only protein atoms in the protein-nucleic acid interface.

Returns¶

list[Atom]: List of protein atoms in the interface.

get_trimmed_double_strands() → list[DoubleStrandNucleicAcid][source]¶

Get all double-strand nucleic acids bound by the protein, but trimmed by binding.

The output double stranded nucleic acids (DSNAs) are subsequences of the full DSNAs found in the structure, since proteins usually do not bind the whole DSNA found in a PDB structure.

This method allows for “gaps” of unbound base pairs inside the DSNA, only the base pairs at the ends are trimmed according to being protein-bound or not.

A visual example of “gaps”:

``` Input full DSNA: GATATACAAGCCA

TGGCTTGTATATC

Protein-bound: **** ** Output protein-bound DSNA: TATACAAG

CTTGTATA

```

Returns¶

list[DoubleStrandNucleicAcid]: List of double stranded nucleic acids bound by the protein, but trimmed by binding.

get_trimmed_nucleic_acids() → list[NucleicAcid][source]¶

Get all nucleic acids bound by the protein, but trimmed by binding.

The output nucleic acids (NAs) are subsequences of the full NAs found in the structure, since proteins might not bind the whole NA.

This method allows for “gaps” of unbound nucleotides inside the NA, only the nucleotides at the ends are trimmed according to being protein-bound or not.

A visual example of “gaps”:: ` Input full NA: GATATACAAGCCA Protein-bound: **** ** Output protein-bound NA: TATACAAG `

Returns¶

list[NucleicAcid]: List of nucleic acids bound by the protein, but trimmed by binding.

class biointerface.InterfaceBuilder(search_radius: float | int = 4.0, pp_builder: PPBuilder = <Bio.PDB.Polypeptide.PPBuilder object>, na_builder: NABuilder = <PDBNucleicAcids.NucleicAcid.NABuilder object>, dsna_builder: DSNABuilder = <PDBNucleicAcids.NucleicAcid.DSNABuilder object>)[source]¶

Bases: object

Use atomic distance to find Protein-Nucleic acid interfaces.

Assuming you only want standard nucleotides and amino acids.

Parameters¶

search_radiusfloat | int, optional: Search radius, measured in Angstrom, within which Protein-Nucleic acid interactions are found. Default is 4.0
pp_builderPPBuilder, optional: Polypeptide builder class from Biopython. Default is PPBuilder with default parameters.
na_builderNABuilder, optional: Polypeptide builder class from PDBNucleicAcids. Default is NABuilder with default parameters.
dsna_builderDSNABuilder, optional: Polypeptide builder class from PDBNucleicAcids. Default is DSNABuilder with default parameters.

build_interfaces(entity: Structure | Model | Chain, by: Literal['polypeptide', 'chain', 'structure']='polypeptide', standard_aminoacids: bool = True, standard_nucleotides: bool = True, pairing_rules: BasePairRules = <PDBNucleicAcids.BasePairRules.WatsonCrickBasePairRules object>) → list[Interface][source]¶

Extract all Protein-Nucleic acid interfaces found in a PDB entity.

Parameters¶

entityL{Structure}, L{Model} or L{Chain}: Protein-nucleic acid interfaces are searched for in this object. L{Structure} is the suggested input.
by: str, optional: If ‘polypeptide’, interfaces are extracted between nucleic acids bound by one polypeptide. If ‘chain’, interfaces are extracted between nucleic acids bound by one protein chain, composed by one or more polypeptides. If ‘structure’, interfaces are extracted between nucleic acids bound by all protein chains present in the structure, composed by one or more polypeptides. Most likely several polypeptides.
standard_aminoacids: bool, optional: Use only standard aminoacids. This is the aa_only parameter in the PPBuilder.build_peptides() method. Default is True.
standard_nucleotides: bool, optional: Use only standard nucleotides. This parameter is used in the NABuilder.build_nucleic_acids() method and in the DSNABuilder.build_double_strands() method. Default is True.
pairing_rulesoptional: Rules for proper base pairing class instance from PDBNucleicAcids. This parameter is used in the DSNABuilder.build_double_strands() method. Default is WatsonCrickBasePairRules() with default parameters.

Raises¶

PDBConstructionException: In case there is no protein
in the input entity.

PDBConstructionException: In case there is no nucleic acid
in the input entity.

Returns¶

list[Interface]: List of all Protein-Nucleic acid interfaces found in a PDB entity.

biointerface.concat_polypeptides(pp_list: list[Polypeptide]) → list[Polypeptide][source]¶

biointerface package¶

Submodules¶

biointerface.core module¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Parameters¶

Parameters¶

Raises¶

Returns¶

Module contents¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Returns¶

Parameters¶

Parameters¶

Raises¶

Returns¶

BioInterface

Navigation

Related Topics