The Basics for Understanding the HGP
The complete set of instructions for
making an organism is called its genome. It contains the master blueprint
for all cellular structures and activities for the lifetime of the cell or
organism. Found in every nucleus of a person's many trillions of cells, the
human genome consists of tightly coiled threads of deoxyribonucleic acid (DNA)
and associated protein molecules, organized into structures called chromosomes.
Glossary:
DNA Gene
Chromosome Genome
Human Genome
Sequencing
Mapping
Cloning
![]()
DNA
In humans, as in other higher organisms, a DNA molecule consists of two strands
that wrap around each other to resemble a twisted ladder whose sides, made of
sugar and phosphate molecules, are connected by rungs of nitrogen-containing
chemicals called bases. Each strand is a linear arrangement of repeating similar
units called nucleotides, which are each composed of one sugar, one
phosphate, and a nitrogenous base. Four different bases are present in DNA:
adenine (A), thymine (T), cytosine (C), and guanine (G). The particular
order of the bases arranged along the sugar-phosphate backbone is called the DNA
sequence; the sequence specifies the exact genetic instructions required to
create a particular organism with its own unique traits.
The two DNA strands are held together by weak bonds
between the bases on each strand, forming base pairs (bp). Genome size is
usually stated as the total number of base pairs; the human genome contains
roughly 3 billion bp.
Some
DNA details |
If unwound and tied together, the strands of DNA would stretch more than 5 feet but would be only 50 trillionths of an inch wide. For each organism, the components of these slender threads encode all the information necessary for building and maintaining life, from simple bacteria to remarkably complex human beings. Understanding how DNA performs this function requires some knowledge of its structure and organization. |
Each time a cell divides into two daughter cells, its full genome is duplicated; for humans and other complex organisms, this duplication occurs in the nucleus. During cell division the DNA molecule unwinds and the weak bonds between the base pairs break, allowing the strands to separate. Each strand directs the synthesis of a complementary new strand, with free nucleotides matching up with their complementary bases on each of the separated strands. Strict base-pairing rules are adhered to; adenine will pair only with thymine (an A-T pair) and cytosine with guanine (a C-G pair). Each daughter cell receives one old and one new DNA strand. The cells' adherence to these base-pairing rules ensures that the new strand is an exact copy of the old one. This minimizes the incidence of errors (mutations) that may greatly affect the resulting organism or its offspring. (go top)
Genes
Each DNA molecule contains many genes--the basic physical and functional units
of heredity. A gene is a specific sequence of nucleotide bases whose
sequences carry the information required for constructing proteins, which
provide the structural components of cells and tissues as well as enzymes for
essential biochemical reactions. The human genome is estimated to comprise
approximately 80,000-100,000 genes.
Human genes vary widely in length, often extending
over thousands of bases, but only about 10% of the genome is known to include
the protein-coding sequences (exons) of genes. Interspersed with many genes are
intron sequences, which have no coding function. The balance of the genome is
thought to consist of other noncoding regions (such as control sequences and
intergenic regions), whose functions are obscure.
| All living organisms are composed largely of proteins; humans can synthesize about 80,000 different kinds. Proteins are large, complex molecules made up of long chains of subunits called amino acids. Twenty different kinds of amino acids are usually found in proteins. Within the gene, each specific sequence of three DNA bases (codons) directs the cells' protein-synthesizing machinery to add specific amino acids. For example, the base sequence ATG codes for the amino acid methionine. Since 3 bases code for 1 amino acid, the protein coded by an average-sized gene (3000 bp) will contain 1000 amino acids. The genetic code is thus a series of codons that specify which amino acids are required to make up specific proteins. | From
genes to proteins |
Chromosomes
The 3 billion bp in the human genome are organized into 24 distinct, physically
separate microscopic units called chromosomes. All genes are arranged linearly
along the chromosomes. The nucleus of most human cells contains two sets of
chromosomes, one set given by each parent. Each set has 23 single
chromosomes--22 autosomes and an X or Y sex chromosome. (A normal female will
have a pair of X chromosomes; a male will have an X and Y pair.) Chromosomes
contain roughly equal parts of protein and DNA; chromosomal DNA contains an
average of 150 million bases. DNA molecules are among the largest molecules now
known.
Chromosomes can be seen under a light microscope and, when stained with certain dyes, reveal a pattern of light and dark bands reflecting regional variations in the amounts of A and T vs G and C. Differences in size and banding pattern allow the 24 chromosomes to be distinguished from each other, an analysis called a karyotype. A few types of major chromosomal abnormalities, including missing or extra copies or gross breaks and rejoinings (translocations), can be detected by microscopic examination; Down's syndrome, in which an individual's cells contain a third copy of chromosome 21, is diagnosed by karyotype analysis.
Most changes in DNA, however, are too subtle to be detected by this technique and require molecular analysis. These subtle DNA abnormalities (mutations) are responsible for many inherited diseases such as cystic fibrosis and sickle cell anemia or may predispose an individual to cancer, major psychiatric illnesses, and other complex diseases. (go top)
Genome
A genome is all the DNA in an organism, including its genes. Genes carry
information for making all the proteins required by all organisms. These
proteins determine, among other things, how the organism looks, how well its
body metabolizes food or fights infection, and sometimes even how it behaves.
DNA is made up of four similar chemicals (called bases and abbreviated A, T, C, and G) that are repeated millions or billions of times throughout a genome. The human genome, for example, has 3 billion pairs of bases.
The particular order of As, Ts, Cs, and Gs is
extremely important. The order underlies all of life's diversity, even dictating
whether an organism is human or another species such as yeast, rice, or fruit
fly, all of which have their own genomes and are themselves the focus of genome
projects. Because all organisms are related through similarities in DNA
sequences, insights gained from nonhuman genomes often lead to new knowledge
about human biology.
(go top)
The Human Genome
The human genome is made up of DNA, which has four different chemical
building blocks. These are called bases and abbreviated A, T, C, and G. In the
human genome, about 3 billion bases are arranged along the chromosomes in a
particular order for each unique individual. To get an idea of the size of the
human genome present in each of our cells, consider the following analogy: If
the DNA sequence of the human genome were compiled in books, the equivalent of
200 volumes the size of a Manhattan telephone book (at 1000 pages each) would be
needed to hold it all.
It would take about 9.5 years to read out loud (without stopping) the 3 billion bases in a person's genome sequence. This is calculated on a reading rate of 10 bases per second, equaling 600 bases/minute, 36,000 bases/hour, 864,000 bases/day, 315,360,000 bases/year.
Storing all this information is a great challenge to computer experts known as bioinformatics specialists. One million bases (called a megabase and abbreviated Mb) of DNA sequence data is roughly equivalent to 1 megabyte of computer data storage space. Since the human genome is 3 billion base pairs long, 3 gigabytes of computer data storage space are needed to store the entire genome. This includes nucleotide sequence data only and does not include data annotations and other information that can be associated with sequence data.
As time goes on, more annotations will be entered as a result of laboratory findings, literature searches, data analyses, personal communications, automated data-analysis programs, and auto annotators. These annotations associated with the sequence data will likely dwarf the amount of storage space actually taken up by the initial 3 billion nucleotide sequence. Of course, that's not much of a surprise because the sequence is merely one starting point for much deeper biological understanding! (go top)
DNA sequencing
DNA sequencing, the process of determining the exact order of the 3 billion
chemical building blocks (called bases and abbreviated A, T, C, and G) making up
the DNA of the 24 different human chromosomes, is the greatest technical
challenge in the Human Genome Project. Achieving this goal will help reveal the
estimated 100,000 human genes within our DNA as well as the regions controlling
them. The resulting DNA sequence maps will be used by 21st century scientists to
explore human biology and other complex phenomena.
Meeting Human Genome Project sequencing goals by 2003 will require continual improvements in sequencing speed, reliability, and costs. Standard methods are based on separating DNA fragments by gel electrophoresis, which is extremely labor intensive and expensive. Total sequencing output in the community was about 200 Mb for 1998.
Encouraging progress ranges from enhancements of gel-based technologies and the development of novel, gel-less automatable approaches, such as the use of DNA fragments bound to a solid surface DNA chips, and separation of fragments by mass spectrometry. New gel-based sequencers use multiple tiny (capillary) tubes to run standard electrophoretic separations. These separations are much faster because the tubes dissipate heat well and allow the use of much higher electric fields. (go top)
Chromosome Mapping
Mapping is the construction of a series of chromosome descriptions that
depict the position and spacing of unique, identifiable biochemical landmarks,
including some genes, that occur on the DNA of chromosomes. (go
top)
Cloning
To Human Genome Project researchers, cloning refers to copying genes and other
pieces of chromosomes to generate enough identical material for further study.
Two other types of cloning produce complete, genetically identical animals.
Blastomere separation (sometimes called "twinning" after the naturally
occurring process that creates identical twins) involves splitting a developing
embryo soon after fertilization of the egg by a sperm (sexual reproduction) to
give rise to two or more embryos. The resulting organisms are identical twins
(clones) containing DNA from both the mother and the father. Dolly, on the other
hand, is the result of another type of cloning that produces an animal carrying
the DNA of only one parent. Using somatic cell nuclear transfer, scientists
transferred genetic material from the nucleus of an adult sheep's udder cell to
an egg whose nucleus, and thus its genetic material, had been removed. (All
cells that are not egg or sperm cells are somatic cells.) (go
top)
For more basic knowledge behind the HGP, please click here to visit the About Biotech, a good website offering basic biological knowledge.