Customize Normalization in SQL Server Full Text Search by replacing characters

307 views Asked by At

I want to customize SQL Server FTS to handle language specific features better.

In many language like Persian and Arabic there are similar characters that in a proper search behavior they should consider as identical char like these groups:

  • ['آ' , 'ا' , 'ء' , 'ا']
  • ['ي' , 'ی' , 'ئ']

Currently my best solution is to store duplicate data in new column and replace these characters with a representative member and also normalize search term and perform search in the duplicated column.

Is there any way to tell SQL Server to treat any members of these groups as an identical character?

1

There are 1 answers

0
Sina Gz On

as far as i understand ,this would be used for suggestioning purposes so the being so accurate is not important. so in farsi actually none of the character in list above doesn't share same meaning but we can say they do have a shared short form in some writing cases ('آ' != 'اِ' but they both can write as 'ا' )

SCENARIO 1 : THE INPUT TEXT IS IN COMPLETE FORM imagine "محمّد" is a record in a table formatted (id int,text nvarchar(12))named as 'table'. after removing special character we can use following command :

select * from [db].[dbo].[table] where text REPLACE(text,' ّ ','') = REPLACE(N'محمد',' ّ ','');

the result would be

enter image description here

SCENARIO 2: THE INPUT IS IN SHORT FORMAT imagine "محمد" is a record in a table formatted (id int,text nvarchar(12))named as 'table'.

in this scenario we need to do some logical operation on text before we query in data base
for e.g. if "محمد" is input as we know and have a list of this special character ,it should be easily searched in query as :

select * from [db].[dbo].[table] where REPLACE(text,' ّ ','') = 'محمد';

note:


this solution is not exactly a best one because the input should not be affected in client side it, would be better if the sql server configure to handle this. for people who doesn't understand farsi simply he wanna tell sql that َA =["B","C"] and a have same value these character in the list so : when a "dad" word searched, if any word "dbd" or "dcd" exist return them too.

add:

some set of characters can have same meaning some of some times not ( ['ي','أ'] are same but ['آ','اِ'] not) so in we got first scenario :

select * from [db].[dbo].[table] where text like N'%هی[أي]ت' and text like N'هی[أي]ت%';