Bioinformatics – An Emerging Field


Savita More*, Vijay Raje, Namita Phalke, Sarika Lokhande

GES College of Pharmacy (D.Pharm), Limb, Satara, Maharashtra, India

*Corresponding Author E-mail:



Bioinformatics is mainly used for drug discovery in which various databases are used. These technological advances are likely to have profound impact on knowledge of etiology of complex diseases and new approaches of their prevention and treatment. Bioinformatics in drug discovery will have increasing amount and type of data to improve experimentation in pharmaceutical development. "Bioinformatics is the field of science in which biology, computer science, and information technology merges into a single discipline. There are three important sub-disciplines within bioinformatics: the development of new algorithms and statistics with which to assess relationships among members of large data sets; the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information." In this article, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.


KEYWORDS: Etiology, information technology, algorithms and statistics




A GENE is the unit of heredity in every living organism. Genes are encoded by nucleic acid molecules known as DNA or RNA, and direct the physical development and behavior of the organism. Most genes encode proteins, which are biological macromolecules comprising linear chains of amino acids that affect most of the chemical reactions carried out by the cell. Some genes do not encode proteins, but produce non-coding RNA molecules that play key roles in protein biosynthesis and gene regulation. Molecules that result from gene expression, whether RNA or protein, are collectively known as gene products.1


Most genes contain non-coding regions, that do not code for the gene products, but often regulate gene expression.

Since Mendel period, genetic record keeping have come a long way. The understanding of genetics has advanced remarkably in the last thirty years which led to the development of fields of research and techniques.


BIOINFORMATICS is new emerging field. In general bioinformatics means (Bio- life, Informatics – computer science) applying computer science to life sciences mainly for determining gene structure-relationship.2


Today, bioinformatics embraces protein structure analysis, gene and protein functional information, data from patients, pre-clinical and clinical trials, and the metabolic pathways of numerous species. Various developments in the genomics, protein and RNA analysis, and single nucleotide polymorphism (SNP) are due to bioinformatics only. All these data associated with genes is being added to our knowledge of human genome.The new challenge in bioinformatics is to identifying relationship, which connects the expression and action of gene.


In pharmaceutical sector, bioinformatics is mainly used for drug discovery in which various databases are used. These technological advances are likely to have profound impact on knowledge of etiology of complex diseases and new approaches of their prevention and treatment. Bioinformatics in drug discovery will have increasing amount and type of data to improve experimentation in pharmaceutical development.


"Bioinformatics is the field of science in which biology, computer science, and information technology merges into a single discipline. There are three important sub-disciplines within bioinformatics: the development of new algorithms and statistics with which to assess relationships among members of large data sets; the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information."3



The general concept of bioinformatics can be elaborated as follows:

A mixture of Biochemistry, Molecular Biology, and Computer Science.

Obtaining, storing, organizing, and analyzing biological and genetic information for understanding its activity in living organisms.

Main goal of bioinformatics is to convert multitude of complex data into useful information and knowledge.

Data includes gene and protein sequences, cDNA, and nucleotide sequences.

Data from gene sequencing, combinatorial chemical synthesis, gene-expression investigations, pharmcogenomics, proteomic studies, and other methods of study.

Information used to build synthetic and predictive models allowing scientists to better understand complex living systems.4


Bioinformatics has various applications in research in medicine, biotechnology, agriculture etc. Following research fields6 has integral component of bioinformatics-

1. Computational Biology: The development and application of data-analytical andtheoretical methods, mathematical modeling and computational simulation techniquesto the study of biological, behavioral, and social systems.

2. Genomics: Genomics is any attempt to analyze or compare the entire genetic complement of a species or species. It is, of course possible to compare genomes by comparing more-or-less representative subsets of genes within genomes.


3. Proteomics: Proteomics is the study of proteins - their location, structure and function. It is the identification, characterization and quantification of all proteins involved in a particular pathway, organelle, cell, tissue, organ or organism that can be studied in concert to provide accurate and comprehensive data about that system.


Proteomics is the study of the function of all expressed proteins. The study of the proteome, called proteomics, now evokes not only all the proteins in any given cell, but also the set of all protein isoforms and modifications, the interactions between them, the structural description of proteins and their higher-order complexes, and for that matter almost everything 'post-genomic'."                             


4. Pharmacogenomics: Pharmacogenomics is the application of genomic approaches and technologies to the identification of drug targets. In Short, pharmacogenomics is using genetic information to predict whether a drug will help make a patient well or sick. It Studies how genes influence the response of humans to drugs, from the population to the molecular level.


5. Pharmacogenetics: Pharmacogenetics is the study of how the actions of and reactions to drugs vary with the patient's genes. All individuals respond differently to drug treatments; some positively, others with little obvious change in their conditions and yet others with side effects or allergic reactions. Much of this variation is known to have a genetic basis. Pharmacogenetics is a subset of pharmacogenomics which uses genomic/bioinformatic methods to identify genomic correlates, for example SNPs (Single Nucleotide Polymorphisms), characteristic of particular patient response profiles and use those markers to inform the administration and development of therapies. Strikingly such approaches have been used to "resurrect" drugs thought previously to be ineffective, but subsequently found to work with in subset of patients or in optimizing the doses of chemotherapy for particular patients.


6.Cheminformatics: 'The mixing of those information resources [information technology and information management] to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the arena of drug lead identification and optimization..Related terms of cheminformatics are chemi-informatics, chemometrics, computational chemistry, chemical informatics, chemical information management/science, and cheminformatics.
But we can distinguish chemoinformatics and chemical informatics as follows


Chemical informatics:'Computer-assisted storage, retrieval and analysis of chemical information, from data to chemical knowledge.' This definition is distinct from ' Chemoinformatics ' (and the synonymous cheminformatics and chemiinformatics) which focus on drug design


Chemometrics: The application of statistics to the analysis of chemical data (from organic, analytical or medicinal chemistry) and design of chemical experiments and simulations.

Computational chemistry: A discipline using mathematical methods for the calculation of molecular properties or for the simulation of molecular behavior.  It also includes, e.g., synthesis planning, database searching, combinatorial library manipulation.


7. Structural genomics or structural bioinformatics refers to the analysis of macromolecular structure particularly proteins, using computational tools and theoretical frameworks.


8. Comparative genomics: The study of human genetics by comparisons with model organisms such as mice, the fruit fly, and the bacterium E. coli.


9. Biophysics: The British Biophysical Society defines biophysics as: "an interdisciplinary field which applies techniques from the physical sciences to understanding biological structure and function".


10. Biomedical informatics/ Medical informatics: "Biomedical Informatics is an emerging discipline that has been defined as the study, invention, and implementation of structures and algorithms to improve communication, understanding and management of medical information."


11. Mathematical Biology: Mathematical biology also tackles biological problems, but the methods it uses to tackle them need not be numerical and need not be implemented in software or hardware. It includes things of theoretical interest which are not necessarily algorithmic, not necessarily molecular in nature, and are not necessarily useful in analyzing collected data.


12. Computational chemistry: Computational chemistry is the branch of theoretical chemistry whose major goals are to create efficient computer programs that calculate the properties of molecules (such as total energy, dipole moment, vibrational frequencies) and to apply these programs to concrete chemical objects. It is also sometimes used to cover the areas of overlap between computer science and chemistry.


13. Functional genomics: Functional genomics is a field of molecular biology that is attempting to make use of the vast wealth of data produced by genome sequencing projects to describe genome function. Functional genomics uses high-throuput techniques like DNA microarrays, proteomics, metabolomics and mutation analysis to describe the function and interactions of genes.


15. Pharmacoinformatics: Pharmacoinformatics concentrates on the aspects of bioinformatics dealing with drug discovery


In silico ADME-Tox Prediction: Drug discovery is a complex and risky treasure hunt to find the most efficacious molecule which do not have toxic effects but at the same time have desired pharmacokinetic profile. The hunt starts when the researchers look for the binding affinity of the molecule to its target. Huge amount of research requires to be done to come out with a molecule which has the reliable binding profile.


16. Agroinformatics/ Agricultural informatics: Agroinformatics concentrates on the aspects of bioinformatics dealing with plant genomes.


17. Systems biology:Systems biology is the coordinated study of biological systems by investigating the components of cellular networks and their interactions,by applying exprerimental high-throughput and whole-genome techniques, and integrating computational methods with experiemntal efforts.5,6


Bioinformatics is used for a virtually limitless number of tasks, but some of the most common are

1.    Finding homologs ('twins') of a gene in your favorite species given a sequence you have in a model species. eg. Finding a rice gene given the sequence of an Arabidopsis gene which has been characterized already.

2.    Comparing the similarity between two or more gene sequences to get a measure of their relatedness.  This can be used to group genes into subsets (orthological - paralogical -) which might give an indication of the function or activity of the members of these subsets based upon what is already known about the proteins encoded by the membership of that subset.  Comparisons also allow taxonomy to be examined, as well as the drawing of phylogenetic trees (trees of relatedness) and insights can be made into sequence evolution. 

3.    Design of primers for the PCR reaction.  Online and offline tools allow individuals and whole projects (eg. sequencing projects) to have their computers design thousands of primers with little effort. 7



Human Genome Project - An Introduction

The Human Genome Project has encouraged a series of paradigm changes to the view that biology is an informational science. The draft of the human genome has given us a genetics parts list of what is necessary for building a human: approximately 35,000 genes, their regulatory regions, a lexicon of motifs that are the building block components of proteins and genes, and access to the human variability that make us each different from one user.8


Genomes - Discovering Methodology and Study

Discovery science defines all of the elements in a biological system. For example, sequence of the genome, identification and quantitation of all of the mRNAs or proteins in a particular cell type - respectively, genome, transcriptome, and the proteome. Discovery science creates databases of information, in contrast to the more classical hypothesis-driven science that formulates hypotheses and attempts to test them. The high-throughput tools both provide the means for discovery science and can assay how global information sets, for example, transcriptomes or protemes change as systems are perturbed.


System Biology

Biology is a highly informational science. There are mainly two types of biological information.



The main tools of a bioinformatician are computer software programs and the internet. A fundamental activity is sequence analysis of DNA and proteins using various programs and databases available on the World Wide Web. Anyone, from clinicians to molecular biologists, with access to the internet and relevant websites can now freely discover the composition of biological molecules such as nucleic acids and proteins by using basic bioinformatic tools. This does not imply that handling and analysis of raw genomic data can easily be carried out by all. Bioinformatics is an evolving discipline, and expert bioinformaticians now use complex software programs for retrieving, sorting out, analysing, predicting, and storing DNA and protein sequence data.


Large commercial enterprises such as pharmaceutical companies employ bioinformaticians to perform and maintain the large scale and complicated bioinformatic needs of these industries. With an ever-increasing need for constant input from bioinformatic experts, most biomedical laboratories may soon have their own in-house bioinformatician. The individual researcher, beyond a basic acquisition and analysis of simple data, would certainly need external bioinformatic advice for any complex analysis.9

The Bioinformatics Tools may be categorized into following categories

1.    Homology and Similarity Tools

2.    Protein Function Analysis

3.    Structural Analysis

4.    Sequence Analysis  


1. Homology and Similarity Tools

The term homology implies a common evolutionary relationship between two traits -whether they are DNA sequences or bristle patterns on a fly's nose. Homologous sequences are sequences that are related by divergence from a common ancestor. Thus the degree of similarity between two sequences can be measured while their homology is a case of being either true of false. This set of tools can be used to identify similarities between novel query sequences of unknown structure and function and database sequences whose structure and function have been elucidated.


2. Protein Function Analysis

Function Analysis is Identification and mapping of all functional elements (both coding and non-coding) in a genome. This group of programs allows you to compare your protein sequence to the secondary (or derived) protein databases that contain information on motifs, signatures and protein domains. Highly significant hits against these different pattern databases allow you to approximate the biochemical function of your query protein.


3. Structural Analysis

This set of tools allows you to compare structures with the known structure databases. The function of a protein is more directly a consequence of its structure rather than its sequence with structural homologs tending to share functions. The determination of a protein's 2D/3D structure is crucial in the study of its function.


4. Sequence Analysis

This set of tools allows you to carry out further, more detailed analysis on your query sequence including evolutionary analysis, identification of mutations, hydropathy regions, CpG islands and compositional biases. The identification of these and other biological properties are all clues that aid the search to elucidate the specific function of your sequence.



Bioinformatics is the use of IT in biotechnology for the data storage, data warehousing and analyzing the DNA sequences. In Bioinfomatics knowledge of many branches are required like biology, mathematics, computer science, laws of physics & chemistry, and of course sound knowledge of IT to analyze biotech data. Bioinformatics is not limited to the computing data, but in reality it can be used to solve many biological problems and find out how living things works.10


A. Nucleotide Applications:

1.        Information Retrieval

There are numerous databases around the world containing information useful for computational biologists.  The main ones are: the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI), and the DNA Database of Japan (DDBJ). The following applications are tools which search these sites to find a particular sequence or to identify a sequence already known to you. 


2.        Sequence Retrieval – Find the nucleotide sequence for a gene of interest.          


3.        Sequence Identification Find function and possible origin of gene from a sequence.


4.        Sequence Analysis

With these applications we can align two sequences, align multiple sequences, and perform phylogenic analyses. One reason we would do this is to determine what parts of the sequences are conserved from one species to the next. Another reason would be to see how much an organism has diverged from other organisms simply by comparing their DNA sequences. The more similar two gene sequences are to one another, the more closely the organisms are related. And the more dissimilar the two sequences, the farther the two genes are in relation. With this application we can compare sequences to determine how organisms have diverged possibly as a result of evolution.


5.        Single Sequence Alignments – Compares desired sequence to a database with many sequences in it for similarity.


6.        Aligning Two Sequences – Compare two sequences with one another for similarity and % identity.


7.        Multiple Sequence Alignments – Compare multiple sequences for similarity so that we may conclude % identity of sequence. Analogous to phylogenic studies.


8.        Restriction Enzyme Mapping Determine cut sites in a sequence.


9.        Entelechon GmbH's Sequence inversion - This program takes a sequence and can invert it or output the complimentary strand.  Instructions are included at the site and very simple to use.


10.     Oligo-Primer Properties Calculator - This program will calculate the melting point temperature and the OD of your oligo.


B. Sequence Translation

Computational biologists need to analyze their nucleotide sequences, and the best way to do that is to study the protein product.  The following programs will either convert your DNA sequence into an amino acid (protein) sequence or it will take your protein and convert it into its complimentary DNA (cDNA) sequences.  These protein and DNA sequences can then be analyzed using other applications on this page.

1.    Translation – Converts nucleotide sequences into protein sequences.

2.    Backtranslation – Converts protein sequences into nucleotide sequences or complimentary DNA (cDNA)


C. Protein Applications:

1.    Information Retrieval

      The numerous information retrieval sites on the Internet can give very valuable information concerning the sequence and properties of a protein.  Numerous databases exist and each database is accessible through convenient search programs.  This section will introduce useful sites that provide database search capabilities.


2.    Protein Sequence Retrieval – Allows user to retrieve sequence from protein name, accession number, or GI identification number.


3.    Protein Identification – Allows user to retrieve a protein name or accession and GI numbers from polypeptide sequence.


D. Protein Analysis:

After obtaining the identity or sequence of a protein, there are several valuable tools that allow further analysis of the protein.  Information can be obtained concerning the characteristic properties of the proteins from the sequence.  Another valuable tool is sequence alignment applications that establish the degree of similarity between two proteins or multiple proteins.


1.    Determining Protein Sequence Properties – User can find molecular weight (MW), isoelectric point (pI), titration curves, hydrophobicity  etc. for particular protein


2.    Protein Sequence Alignment – Align a single sequence to sequences in a database.


3.    Pair wise Sequence Alignment – Align two protein sequences to each other.



4.    Multiple Sequence Alignment – Align many sequences against a single sequence.


E. Structure Analysis:

Several programs have been created that give scientists the ability to look at the three dimensional shape of proteins and nucleotides.  Examining a protein in 3D allows for greater understanding of protein functions, as well as providing students with a visual understanding that cannot always be conveyed through still photographs or descriptions.  We have found that the best to date 3D program is RasMol, originally developed by Roger Sayle.  To use this program it must first be downloaded onto your computer.




Molecular medicine Û Personalized medicine Û Preventative medicine


Gene therapy Û Drug development Û Microbial genome applications


Waste cleanup Û Climate change Studies Û  Alternative energy sources


Biotechnology Û Antibiotic resistance ÛForensic analysis of microbes


Bio-weapon creation Û Evolutionary studies Û Crop improvement


Insect resistance Û Improve nutritional quality


Development of Drought resistance varieties Û Veterinary Science



The clinical applications of bioinformatics can be viewed in the immediate, short, and long term.


Basic bioinformatic tools are already accessed in certain clinical situations to aid in diagnosis and treatment plans. For example, PubMed is accessed freely for biomedical journals cited in Medline, and OMIM (Online Mendelian Inheritance in Man) a search tool for human genes and genetic disorders, is used by clinicians to obtain information on genetic disorders in the clinic or hospital setting. An example of the application of bioinformatics in new therapeutic advances is the development of novel designer targeted drugs such as imatinib mesylate (Gleevec), which interferes with the abnormal protein made in chronic myeloid leukaemia. (Imatinib mesylate was synthesised at Novartis Pharmaceuticals by identifying a lead in a high throughput screen for tyrosine kinase inhibitors and optimising its activity for specific kinases.) The ability to identify and target specific genetic markers by using bioinformatic tools facilitated the discovery of this drug.


In the short term, as a result of the emerging bioinformatic analysis of the human genome project, more disease genes will be identified and new drug targets will be simultaneously discovered. Bioinformatics will serve to identify susceptibility genes and illuminate the pathogenic pathways involved in illness, and will therefore provide an opportunity for development of targeted therapy. Recently, potential targets in cancers were identified from gene expression profiles.


In the longer term, integrative bioinformatic analysis of genomic, pathological, and clinical data in clinical trials will reveal potential adverse drug reactions in individuals by use of simple genetic tests. Ultimately, pharmacogenomics (using genetic information to individualise drug treatment) is likely to bring about a new age of personalised medicine; patients will carry gene cards with their own unique genetic profile for certain drugs aimed at individualised therapy and targeted medicine free from side effects.



With the confluence of biology and computer science, the computer applications of molecular biology are drawing a greater attention among the life science researchers and scientists these days. As it becomes imperative for people to seek the help of information technology professionals to accomplish the ever growing computational requirements of a host of exciting and needy biological problems, the synergy between modern biology and computer science is to blossum in the days to come. Thus the research scope for all the mathematical techniques and algorithms coupled with software programming languages, software development and deployment tools are to get a real boost. In addition, information technologies such as databases, middleware, graphical user interface (GUI) design, distributed object computing, storage area networks (SAN), data compression, network and communication and remote management are all set to play a very critical role in taking forward the goals for which the bioinformatics field came into exist.


Moreover, clinical applications of bioinformatics are important. Bioinformatics used in clinical trials as the immediate, short, and long term. It is more easy to conduct trials using bioinformatics tools.



The future of bioinformatics is integration. For example, integration of a wide variety of data sources such as clinical and genomic data will allow us to use disease symptoms to predict genetic mutations and vice versa. The integration of GIS data, such as maps, weather systems, with crop health and genotype data, will allow us to predict successful outcomes of agriculture experiments. Another future area of research in bioinformatics is large-scale comparative genomics. For example, the development of tools that can do 10-way comparisons of genomes will push forward the discovery rate in this field of bioinformatics. Along these lines, the modeling and visualization of full networks of complex systems could be used in the future to predict how the system (or cell) reacts, to a drug, for example. A technical set of challenges faces bioinformatics and is being addressed by faster computers, technological advances in disk storage space, and increased bandwidth, but by far one of the biggest hurdles facing bioinformatics today, is the small number of researchers in the field. This is changing as bioinformatics moves to the forefront of research but this lag in expertise has lead to real gaps in the knowledge of bioinformatics in the research community. Finally, a key research question for the future of bioinformatics will be how to computationally compare complex biological observations, such as gene expression patterns and protein networks. Bioinformatics is about converting biological observations to a model that a computer will understand. This is a very challenging task since biology can be very complex. This problem of how to digitize phenotypic data such as behavior, electrocardiograms, and crop health into a computer readable form offers exciting challenges for future bioinformatics.



The authors wish to acknowledge to GES College of Pharmacy, limb, Satara for providing valuable help and authors are also thankful to Mr. Raje V. N., Principal GES College of Pharmacy, limb, Satara for providing necessary guidance for these Article.




1.     Reichhardt T. It’s sink or swim as a tidal wave of data approaches. Nature 999; 399(6736):517-20.

2.     Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Rapp BA, Wheeler DL. GenBank. Nucleic Acids Res 2000; 28 (1):15-8.

3.     Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 2000; 28(1):45-8.

4.     Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 1995;269 (5223):496-512.

5.     Sílvia A. Sousa, Jorge H. Leitão, Raul C. Martins, João M. Sanches, Jasjit S. Suri,   and Alejandro Giorgetti ,Bioinformatics Applications in Life Sciences and Technologies Biomed Res Int. 2016; 4(1)360-68.

6.     Ardeshir Bayat, Bioinformatics BMJ. 2002 Apr 27; 324(7344): 1018–1022.

7.     Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, et al. The minimal gene complement of Mycoplasma genitalium. Science. 1995; 270:397–403.

8.     Tsoka S, Ouzounis CA. Recent developments and future directions in computational genomics. FEBS Lett. 2000; 480:42–48.

9.     Patrick Lambrix Manal Habbouche Marta Pérez, Evaluation of ontology development tools for bioinformatics Bioinformatics, Volume 19, Issue 12, 12 August 2003, Pages 1564–1571,

10.   Matsunaga A, Tsugawa M, Fortes J: CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications. Fourth IEEE International Conference on eScience: 2008 2008.

11.   F.S Collins. Medical and Societal Consequences of the Human Genome Project N. Engl. J. Med., 341 (1999), pp. 28-37

12.   I.S Kohane. Bioinformatics and clinical informatics: the imperative to collaborate J. Am. Med. Inform. Assoc., 7 (5) (2000), pp. 512-516

13.   G.J.E De Moor, I Iakovidis, S Norager, F Martin Sanchez Special Issue on synergy between research in medical informatics, bioinformatics and neuroInformatics Methods Inf. Med., 42 (2003), pp. 111-189

14.   M Tsiknakis, D.G Katehakis, S.C Orphanoudakis An open, component-based information infrastructure for integrated health information networks Int. J. Med. Inform., 68 (1–3) (2002), pp. 3-26

15.   A.D Roses Pharmacogenetics and future drug development and delivery Lancet, 355 (9212) (2000), pp. 1358-1361.




Received on 17.07.2018                Modified on 27.09.2018

Accepted on 25.10.2018            © A&V Publications All right reserved

Asian J. Res. Pharm. Sci. 2018; 8(4): 185-191.

DOI: 10.5958/2231-5659.2018.00032.2