* All technologies are under intellectual property protection.


Key Technologies

Protein Folding Shape Code (PFSC)

Protein Folding Shape Code (PFSC) is one-dimensional description folding vector for 5 residues. Any protein with given 3D structure is able to be expressed by PFSC code. Along alpha-C backbone in protein, from N-terminal to C-terminal, the folding conformation of entire protein is described by one-dimensional digitized alphabetic letters.

PFSC provides a complete description for protein conformation, includes regular secondary fragment and irregular tertiary fragment. The PFSC assignment of secondary structural fragments is overall agreed with results in PDB. Also it reflects the flexibility of in folding shape and length for secondary structural fragments.  

The PFSC is fingerprint for protein conformation. It is good for protein structure alignment for comparison. Also, it is applied to high-throughput screening a large protein structural data.


Protein Structure Fingerprint

The Protein Structure Fingerprint is one-dimensional digitized description for protein, including protein sequence, structural folding conformation and physicochemical properties. The protein structure fingerprint is generated by Protein Folding Shape Code (PFSC). With this approach, researcher is able effectively to compare protein structures and assess similarity. It is more important to makes high throughput screening protein database to acquire the lead information for drug discovery research and development of molecular marker.

Protein Structure Fingerprint for a fragment


Probe Binding Site for Drug Discovery (PBSDD)


The drug binding site can be described by protein structure fingerprint, which indicates the locations of fragment motifs around the drug binding pocket as well as the residues with stronger interaction. The related amino acid sequence, folding conformation and physicochemical properties etc. of drug binding site are explicitly expressed by the protein structure fingerprint.

With protein structure fingerprint, the Probe Binding Site for Drug Discovery (PBSDD) technology is developed. This PBSDD platform is able effectively to make the high-throughput screening large volume of protein data, acquire the lead information for drug discovery and guide the directions of bio-assays. For a specific drug molecule, this platform is able to discover protein multiple targets and new therapeutic diseases, and predict its site effects. Also, it will provide the information to study the mechanism of drugs and drug pathway.


Protein Folding Variation Matrix (PFVM)


Protein Folding Variation Matrix (PFVM) assembled all possible local folding variations along protein sequence. The PFVM possesses several prominent features. First, it showed the fluctuation patterns with folding numbers and folding shapes along sequence which revealed how the protein folding was related the order of amino acids in sequence. Second, with simplification of alphabetic description, all folding variations for an entire protein can be simultaneously apprehended at a glance within PFVM. Third, an astronomical number of conformations can be determined by local folding variations in PFVM, so total number of conformations is no longer ambiguous for any protein. Finally, any possible folding conformation, especially the most probable conformations, can be acquired for protein structure prediction. Therefore, this novel approach provides the significant information for further to study the protein folding.


Protein Folding Shape Alignment

With one-dimensional PFSC alphabetic description, the protein conformation structures are able to be compared by protein folding shape alignment (PFSA) approach (Yang, 2011). Similarly as sequence alignment, the PFSC alphabetic strings for proteins are aligned to
match the similarity. The Needleman-Wunsch algorithm of dynamic programming technique (Needleman SB & Wunsch, 1970) is used in the PFSA for structural alignment. Therefore, the structural similarity of two proteins is able to be discovered by structural
alphabetic alignment with PFSA approach.

In PFSA approach, a substitution matrix for 27 PFSC vectors is defined according relationship of vector similarity. Within substitution matrix S, each element of similarity matrix S[i, j] is determined by the similarity between PFSC[i] an d PFSC[j], which is determined by the integrated relationship of 27 PFSC vectors (Yang, 2008). For identical folding shape, the value S[i, i] = 2; for analogous folding shape, the value S[i, j] = 1 and for different folding shape, the value S[i, j] = 0. The substitution matrix S is displayed below.

Similarity score: With optimized alignment, the protein structural similarity score is calculated. Each match
of identical folding shape is assigned by 2; analogous folding shape 1; different folding shape 0; penalty of open a gap -2 and penalty of extended a gap -0.25. The value of protein folding structure alignment score (PFSA-S) is determined by the total contribution of identical folding shapes, analogous folding shapes and gaps.  The score is normalized with below function. 

Here IDFS  is the number of identical folding shapes, ANFS  the number of analogous folding shape, GPO the number of open gaps,  GPE the number of extended gaps and  TSQ  is the length of PFSC of protein. The denominator in formula, 2 x TSQ, assures the value of PFSA-S to equal numeral one for comparison of two identical structures. When similarity between two protein structures decreases, the value of PFSA-S will decrease. When two proteins have less similarity, the structural alignment produces larger number of gaps, which may give negative value for PFSA-S and signify no noteworthy similarity existing. For   normalization, the value of PFSA-S is limited to larger or equal to zero, so any negative value of PFSA-S is converted as zero. Therefore, the PFSA approach provides a normalized score between one and zero to evaluate the protein structural similarity. 

* Yang J & Lee WH, Protein Structure Alphabetic Alignment, Protein Structure, Edited by Eshel Faraggi, InTech Publishers, (ISBN 978-953-51-0555-8), 2012, 133-156

Content 1
Content 2