String Similarity Based on Phonetic in the Gujarati Language Using Gujsim Algorithm

EasyChair Preprint 2153

9 pages•Date: December 12, 2019

Abstract

Searching with top 10 search engine to find “ગાન્ધીજી” or “ગાંધીજી” and surprised to see the result which far differs from one to another. As in the Gujarati language, both strings are correct. Therefore, String similarity algorithm is useful for text mining applications. Basically, string similarity compares each character from both strings but it may not give the accurate result on highly rich Gujarati language due to different kinds of writing styles which depend on matras, reph, vatu and diacritics on simple and compound alphabets. GUJSIM (GUJarati SIMilarity) algorithm is the hybrid approach to do strings similarity for Gujarati language. Here, the author compares 70 strings pairs and GUJSIM algorithm gives good percentage result.

Keyphrases: Gujarati language, phonetic, string distance, string similarity

Links:

https://easychair.org/publications/preprint/TzRC

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:2153,
  author    = {Chandrakant Patel and Jayesh M Patel},
  title     = {String Similarity Based on Phonetic in the Gujarati Language Using Gujsim Algorithm},
  howpublished = {EasyChair Preprint 2153},
  year      = {EasyChair, 2019}}

Download PDF Open PDF in browser