Clustering Low Quality Farsi Sub-words For Word Recognition
محل انتشار: دوازدهمین کنفرانس ملی سیستم های هوشمند ایران
سال انتشار: 1392
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 1,232
فایل این مقاله در 5 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
ICS12_181
تاریخ نمایه سازی: 11 مرداد 1393
چکیده مقاله:
OCR of low resolution documents is not so common, because it has a lot of problems. However, today there are several archives of digital documents which are scanned at lowresolution, to consume less storage. These documents which usually have a resolution of 100 to 150 dpi, require to beconverted to searchable documents. In this paper presents a new method for clustering of low quality printed Persian sub-words.This is necessary to reduce the number of classes of sub-words inorder to improve the overall recognition rate. Two popular clustering methods, hierarchical and k-means implemented andcompared. Local binary patterns (LBP) and zoning algorithms used for feature extraction. Both features are fast and representthe global shape information very well. Moreover, we used different distance measures to find the similarity of featurevectors. We applied our algorithms on a dataset of 10,700 imagesof distinct Persian sub-words with 96 dpi resolution. Experimental results show that the hierarchical clustering withthe correlation distance measure has the best performance over other clustering methods and distance measures
کلیدواژه ها:
نویسندگان
Hamed ArabYarmohammadi
Faculty of Electrical and Robotic Engineering Shahrood University of Technology Shahrood, Iran
Alireza AhmadyFard
Faculty of Electrical and Robotic Engineering Shahrood University of Technology Shahrood, Iran
Hossein Khosravi
Faculty of Electrical and Robotic Engineering Shahrood University of Technology Shahrood, Iran