Sequence format transformation

essential amino acids non-essential amino acids one-letter amino acid code three-letter amino acid code

Sequence format transformation is a computing process to transform peptide sequences between one-letter codes, IUPAC condensed, amino acid chain, graph representation, and sequence graph formats. The details of this algorithm are available at dfwlab/cyclicpepedia on GitHub.

FormatExampleDetail
One letter codeFGIKPPQRThe simplest representation of peptide sequences, ignoring all loops.
IUPAC condensedcyclo[DL-N(Me)Ala-DL-Leu-N(Me)Phe(a,b-dehydro)-Gly] Developed by the International Union of Pure and Applied Chemistry (IUPAC). The prefix"Cyclo" indicates a head-to-tail cyclization. The sequence of amino acids is represented by standard three-letter codes, separated by '-'. Modifications to the amino acids are indicated in the sequence, such as "D" and "L" refer to the chirality of the amino acid, and ring closure bonds are represented by "(num)". It can represent multiple chains through separator '.' , for example: D-N(1)Ala-Arg(CONHMe)-N(Me)Phe-Asp(2)-OH.N(2)Asp(1)-OH.
Amino acid chainGly(1)--Cys(2)--Asn--4OH-Pro--Ile--Trp(2)--Gly--Ile(1)Define by CyclicPepdia, basically consistent with IUPAC condensed. The separator changes to "--" to adapt to situations where the amino acid unit (monomer) has a "-"
Graph representationaThr,Tyr,dhAbu,bOH-Gln,Gly,Gln,His,Dab,C13:2(t4.t6)-OH(2.3),Lyx,dhAbu @1,5 @6,10 @0,8 Inspired by the NOR format from the Norine database, monomers are divided by comma, and ring closure bonds are represented by '@idx,idy.'
Sequence graphG(nodes=[]; edges=[])The sequence graph is built by networkx through a list of nodes and edges. For example:
nodes = [(0, 'Gly'), (1, '4OH-Pro'), (2, 'Ala')], edges = [(0, 1), (1, 2), (0, 2)]