[Intellectual contribution]

Genome annotation of Oryza sativa ssp. japonica cv. Nipponbare

Takeshi ITOH, Tsuyoshi TANAKA, Hiroaki SAKAI and Hisataka NUMA
Bioinformatics Research Unit

   The international collaborative effort completed the genome sequencing of Asian cultivated rice, Oryza sativa ssp. japonica cv. Nipponbare in 2004. At that time, it was still a challenge to decipher all of the genes in the genome so that the genomic information would be fully utilized in further experimental studies. The process of assignment of gene positions and functions is called annotation. To understand biological roles of a DNA/protein sequence, annotation is currently recognized as a crucial step. With this in mind, we decided to organize a new international group for genomewide annotation of rice, the Rice Annotation Project(RAP), which was composed of 35 institutions.
 While gene predictions are efficient in prokaryotic genomes, complex exon-intron structures hamper computational predictions of genes in higher eukaryotes. Therefore, fulllength cDNAs(FLcDNAs) are thought to give strong evidence of genes in these species. We compared the rice FLcDNAs and ESTs with the genome sequences and could determine 29,550 loci. However, there should be loci for which cDNAs are missing in the current clone libraries. By using the cDNAs and gene predictions, we estimated the number of rice genes to be ~32,000. The average length of rice transcripts was longer than that of Arabidopsis(Arabidopsis thaliana)(Table 1). This was mainly because transposable elements were enriched in non-coding regions of the rice genome, so that introns and untranslated regions were longer in rice than in Arabidopsis.
   To annotate the genes of the rice genome, a jamboree-style annotation meeting was held in Japan and all of the functional descriptions were curated by experts. Since automated methods inevitably produce erroneous annotations, curation of computational analysis is essential before public release of a database. As a result, we could assign functions of 19,969(70.0%) of 28,540 probable protein-coding loci. In addition, 131 convincing candidates of non-coding RNA genes were found. For details of the annotations, see http://rapdb.dna.affrc.go.jp/. Our comparison of the gene sets between rice and Arabidopsis suggested that over half of the genes were highly conserved during evolution, but each species possessed thousands of species-specific genes (Fig. 1). These unique genes might be related to characteristics of the species that led to their evolutionary differences.
   A complete genome sequence is a basis to understand the whole biological process of a species. We expect that our curated genome annotation will contribute to future functional analyses of rice. Furthermore, the annotation information presented in this study will be an important resource for genomics of rice cultivars as well as other cereals such as wheat and barley.




Table 1  Comparison of O. sativa and A. thaliana transcripts
Table 1 Comparison of O. sativa and A. thaliana transcripts


Fig. 1 Comparison of the gene sets between rice and Arabidopsis
Fig. 1  Comparison of the gene sets between rice and Arabidopsis
The protein sequences of the genes were searched against UniProtKB. The proteins were classified by level of sequence conservation.


return to a table of contents