An Integrated Approach for Statistical Genome Sequence Analysis between Genetic Datasets

Title: An Integrated Approach for Statistical Genome Sequence Analysis between Genetic Datasets

Authors
Authors and Affiliations

Hassan Mathkour, Muneer Ahmad, Hassan Mahmood khan

Corresponding email: binmathkour@yahoo.com

Published at : 17 Jan 2014
Volume : IJtech Vol 1, No 1 (2010)
DOI : https://doi.org/10.14716/ijtech.v1i1.31

Cite this article as:
Mathkour, H., Ahmad, M., khan, H.M., 2010. An Integrated Approach for Statistical Genome Sequence Analysis between Genetic Datasets. International Journal of Technology. Volume 1(1), pp. 1-10

874

Downloads

Hassan Mathkour	Department of Computer Science College of Computer & Information sciences King Saudi University, P.O. Box 51178, Riyadh 11543 Kingdom of Saudi Arabia
Muneer Ahmad	Department of Computer Science College of Computer & Information sciences King Saudi University, P.O. Box 51178, Riyadh 11543 Kingdom of Saudi Arabia
Hassan Mahmood khan	Department of Computer Science College of Computer & Information sciences King Saudi University, P.O. Box 51178, Riyadh 11543 Kingdom of Saudi Arabia

Email to Corresponding Author

Abstract

An Integrated Approach for Statistical Genome Sequence Analysis between Genetic Datasets

Genome Sequence Analysis for genetic datasets by using ORF (Open Reading Frames) techniques is an interesting area of research for bioinformatics researchers nowadays. There is a strong research focus on comparative analysis between genetic behaviors and diversity of different species. Contrary to whole genome sequence analysis, scientists are now trying to concentrate specifically on layered analysis to get a better insight of relevancy among genetic datasets. This phenomenon will help to better understand species. An ORF statistical analysis for genetic data-sets of species Chimera Monstrosa and Poly Odontidae is presented. For completion of this analysis, we use a hybrid approach that combines a generic mechanism for statistical analysis with specific approach designed for out performance. At first instance, genetic datasets are refined for better usage at next level. These sets are then passed through layers of filters that perform DNA to Protein translation. Statistical comparison is performed during this translation. This layered architecture helps in better understanding of the degree of similarity and differences in genomic sequences.

Keywords

Amino acid, Codon count, Distributed generation, Open Reading Frame, Pre-processing filter

References

Anonym, http://www.ncbi.nlm.nih.gov.

Bartkowiak, 2008. Nonlinear dimensionality reduction by isomap and MLEdim as applied to Amino-Acid distribution in yeast ORFs, Computer Information Systems and Industrial Management Application, 2008, pp.183-188.

Bilu Y., Agarwal P.K. & Kolodny R., 2006. Faster algorithms for optimal multiple sequence alignment based on pairwise comparisons, IEEE/ACM Transactions on Computational Biology and Bioinformatics 2006, Volume 3, Issue 4, pp.408-422.

Chang P.H.-M., Soo V.-W., Chen T.-Y., Lai W.-S., Su S.-C. & Huang Y.-L., 2004. Automating the determination of open reading frames in genomic sequences using the Web service techniques - a case study using SARS coronavirus, Fourth IEEE Symposium on Bioinformatics and Bioengineering 2004, pp.451-458.

Garbarine E. & Rosen G., 2008. An information theoretic method of microarray probe design for genome classification, 30th Annual International Conference of the Engineering in Medicine and Biology Society 2008, pp.3779-3782.

Gowda T., Leshner S., Vrudhula S. & Kim S., 2007. Threshold logic gene regulatory networks, International Workshop on Genomic Signal Processing and Statistics 2007, pp.1-4, ISBN: 978-1-4244-0998-3.

Gupta R., Mittal A., Singh K., Bajpai P., Suraj, & Prakash, 2007. A time series approach for identification of Exons and Introns, 10th International Conference on Information Technology 2007, pp.91-93.

Hireche N., Langlois J.M.P. & Nicolescu G., 2006. Survey of biological high performance computing: Algorithms, Implementations and Outlook Research, Canadian Conference on Electrical and Computer Engineering 2006, pp.1926-1929.

Kumar R., Kumar A. & Agarwa S., 2007. A distributed bioinformatics computing system for analysis of DNA sequences, In: IEEE proceedings of Southeast Conference 2007, pp.358-363.

Kurata K.-i., Breton V. & Nakamura H., 2003. A method to find unique sequences on distributed genomic databases, IEEE/ACM International Symposium on Cluster Computing and the Grid 2003, 3rd Volume, pp.62-69.

Li A., Wang T., Zhou Y., Wang M.-h. & Feng H.-q., 2003. An efficient structure learning method in gene prediction, In: Proceedings of the International Conference on Neural Networks and Signal Processing 2003, Volume 1, pp.567-570.

Lousado J. & Moura R.G., 2008. Exploiting codon-triplets association for genome primary structure analysis, International Conference on Bio-computation, Bioinformatics, and Biomedical Technologies 2008, pp.155-158.

Download PDF

Who cite this paper

Table of Contents

Article

Abstract

References