Journal of Zhejiang University SCIENCE
(ISSN 1009-3095, Monthly)
2005 Vol. 6A No. 11 p.1312-1317
A sustainable development OCR system in CADAL application
HUANG Chen1, ZHAO Ji-hai1, HU Xiao2
(1Zhejiang University Libraries, Zhejiang University, Hangzhou 310027, China)
(2Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, IL 61801, USA)
E-mail: chuang@lib.zju.edu.cn; jhzhao@lib.zju.edu.cn; xiaohu@uiuc.edu
Received Aug. 5, 2005; revision accepted Sept. 10, 2005
Abstract: This paper briefly introduces the main ideas of a sustainable development OCR system based on open architecture techniques and then describes the construction of an optical character recognition (OCR) center built on computer clusters, for the purpose of dynamically improving the recognition precision of the digitized texts of a million volumes of books produced by the China-US Million Books Digital Library (CADAL) Project. The practice of this center will provide helpful reference for other digital library projects.
Key words: Sustainable Development, Digital Library, optical character recognition (OCR), China-US Million Books Digital Library (CADAL)
doi:10.1631/jzus.2005.A1312 CLC number: TP391