[Intellectual contribution]
The international collaborative effort completed the genome sequencing of Asian
cultivated rice, Oryza sativa ssp. japonica cv. Nipponbare in 2004. At that time, it was still
a challenge to decipher all of the genes in
the genome so that the genomic information
would be fully utilized in further experimental
studies. The process of assignment of gene
positions and functions is called annotation. To
understand biological roles of a DNA/protein
sequence, annotation is currently recognized as
a crucial step. With this in mind, we decided to
organize a new international group for genomewide
annotation of rice, the Rice Annotation
Project(RAP), which was composed of 35
institutions.
While gene predictions are efficient in
prokaryotic genomes, complex exon-intron
structures hamper computational predictions
of genes in higher eukaryotes. Therefore, fulllength
cDNAs(FLcDNAs) are thought to give
strong evidence of genes in these species. We
compared the rice FLcDNAs and ESTs with
the genome sequences and could determine
29,550 loci. However, there should be loci
for which cDNAs are missing in the current
clone libraries. By using the cDNAs and gene
predictions, we estimated the number of rice
genes to be ~32,000. The average length of rice
transcripts was longer than that of Arabidopsis(Arabidopsis thaliana)(Table 1). This was
mainly because transposable elements were enriched in non-coding regions of the rice
genome, so that introns and untranslated
regions were longer in rice than in Arabidopsis.
To annotate the genes of the rice genome, a
jamboree-style annotation meeting was held in
Japan and all of the functional descriptions were
curated by experts. Since automated methods
inevitably produce erroneous annotations,
curation of computational analysis is essential
before public release of a database. As a result,
we could assign functions of 19,969(70.0%) of
28,540 probable protein-coding loci. In addition,
131 convincing candidates of non-coding
RNA genes were found. For details of the
annotations, see http://rapdb.dna.affrc.go.jp/.
Our comparison of the gene sets between
rice and Arabidopsis suggested that over half
of the genes were highly conserved during
evolution, but each species possessed thousands
of species-specific genes (Fig. 1). These unique
genes might be related to characteristics
of the species that led to their evolutionary
differences.
A complete genome sequence is a basis to
understand the whole biological process of a
species. We expect that our curated genome
annotation will contribute to future functional
analyses of rice. Furthermore, the annotation
information presented in this study will be
an important resource for genomics of rice
cultivars as well as other cereals such as wheat
and barley.
Table 1 Comparison of O. sativa and A. thaliana transcripts |
Fig. 1 Comparison of the gene sets between rice and Arabidopsis The protein sequences of the genes were searched against UniProtKB. The proteins were classified by level of sequence conservation. |