Human Genome Project – History, Goals, Methodologies, Features, Applications

In this article, we will be discussing about various aspects of the megaproject - Human Genome Project that operated for around 13 years. So, let us know more about its process, goals, methodologies, etc.

As we all know, the genetic make-up of an organism is present in the DNA sequences. Every individual must differ from the other because the DNA sequences are different in everyone. This led the way to a worldwide project of finding out the complete DNA sequence of the human genome. Initially, we will see what is this project about, its history, etc. Further, we will discuss the goals of the human genome project, its methodologies, applications, etc.

What is Human Genome Project or HGP?

The human genome project (HGP) was a massive international research project with an aim to sequence every base in the human genome. It was started in 1990 and got completed in 2003. It sequenced around 3.3 billion base pairs of the human genome. Also, it was called a mega project, running under the leadership of American geneticist Francis Collins. Furthermore, HGP was closely associated with the rapid development of a new area in biology called Bioinformatics.


As discussed above, the Human Genome Project was a 13-year project which was completed in 2003. It was supported by the U.S. Department of Energy and the National Institute of Health. Initially, the Wellcome Trust (U.K.) became a major partner. The efforts were soon joined by scientists from around the world such as Japan, France, Germany, China and others.

Furthermore, the advancement in technologies in the sequencing process helped in the rapid progress of the project.

Goals of the Human Genome Project

The human genome project aimed at achieving various goals. Some of them are as follows:

  1. To identify all the genes in human DNA which is approximately around 20,000-25,000 in numbers.
  2. To determine the sequences of the 3 billion chemical base pairs (i.e., adenine [A], thymine [T], guanine [G], and cytosine [C]) that constitutes the human DNA.
  3. Store the gathered information in databases.
  4. To transfer related technologies to other sectors, such as industries.
  5. Improve tools for data analysis and the technologies to interpret the analyzed genomic sequences.
  6. To address the ethical, legal, and social issues (ELSI) that might arise from the project or while defining the entire human genomic sequence.

Methodologies of HGP

The project involved the following two methods, primarily:

  1. Expressed Sequence Tags:
    • This method focused on identifying all the genes that expressed as RNA referred to as Expressed Sequence Tags (ESTs).
  2. Sequence Annotation:
    • This method focused on sequencing the whole set of genome that contained all the coding and non-coding sequence, and later assigning different regions in the sequence with functions.

Procedure of the Human Genome Project

The process of HGP deals with various steps as mentioned below:

  1. It starts with the process of sequencing, where the first step is to isolate the total DNA from a cell.
  2. Then, convert them into random fragments of relatively smaller sizes.
  3. Clone them in suitable host using specialised vectors such as, BAC (bacterial artificial chromosomes), and YAC (yeast artificial chromosomes). This step helps in the amplification of the DNA fragments so as to ease the sequencing process.
  4. Sequence the fragments using automated DNA Sequencers.
  5. Arrange the sequences on the basis of overlapping regions.
  6. Store the information of the genome sequence in a computer-based program.
  7. Annotate the sequence and assign them to each chromosome.

Observations or Features of the Human Genome Project

The project helped us with various observations such as:

  1. The human genome contains 3164.7 million nucleotide bases.
  2. The total number of genes is around 30,000 which is comparitively lower than the previous estimates of 80,000 to 1,40,000 genes.
  3. 99.9% tht is almost all the nucleotide bases are same in all people.
  4. The size of the genes vary greatly. The largest human gene is dystrophin with 2.4 million bases.
  5. An average gene contains 3000 bases.
  6. Chromosome 1 contains the most genes i.e. 2968, and the Y has the fewest i.e. 231.
  7. The functions of more than 50% of the discovered genes are unknown.
  8. Less than 2% of the genome codes for protein.
  9. A major part of the human genome consists of repeated sequences. However, these sequences have no coding funcitions but they give information about the chromosome structure, dynamics and evolution.
  10. In humans, around 1.4 million locations show singlebase DNA differences (SNPs – single nucleotide polymorphism). Furthermore, this information will help in the processes of finding chromosomal locations for disease-associated sequences and tracing human history.

Application of Human Genome Project

  1. All the genes in a genome can be studied together.
  2. It enables us to give a new approach to biological research.
  3. Helps to understand how tens of thousands of genes and proteins work together in interconnected networks.
  4. The human genome database helps in the identification of a variety of genes that are associated with certain diseases.
  5. The knowledge of a patient’s entire genome sequence will help in effective care for that patient.

