I'm looking for a solution in c# to format search string entered by user, before calling SQL query
Full text index is enabled on Table, query looks like following
select [title] from publications where contains([title], @searchString)
main issues:
1) add 'OR' by default between two words (ex C and C-1 below)
1) remove adjacent duplicate from search string<br>( ex a,b,b-1, e below)
2) remove 'AND' 'OR' at the end of the string (ex d below)
Examples:
Input => output
a) "oyster and oyster or fish and clean water" => "oyster or fish and clean OR water"<br>
b) "oyster and and fish and clean water" => "oyster and fish and clean OR water"<br>
b-1) "oyster oyster fish fish clean and water"=> "oyster or fish or clean and water"
c) "oyster fish" => "oyster or fish"<br>
c-1) "oyster fish clean water" => "oyster or fish or clean or water"
d) "oyster and" => "oyster"<br>
e) "oyster and oyster" => "oyster"<br>
current code (wch failed in case a,b and b-1; works for c-1,d,e)
string Format(string str)
{
List<string> searchKeywords = new List<string> { "and", "or" };
//convert to lower case
str = str.Replace(",", " ").ToLower();
Regex regex = new Regex(@"[ ]{2,}", RegexOptions.None);
//remove extra whitespace with space
str = regex.Replace(str, @" ");
//split string
string[] strArray = str.Split(' ');
List<string> outputArray = new List<string>();
string output = "";
string prevStr = "";
string currStr = "";
bool keywordFlag = false;
bool duplicateFlag = false;
//remove adjacent keyword or same words
foreach (var item in strArray)
{
currStr = item.Trim();
keywordFlag = searchKeywords.Contains(prevStr) && searchKeywords.Contains(currStr);
duplicateFlag = outputArray.Contains(currStr) && !searchKeywords.Contains(currStr);
if (!currStr.Equals(prevStr) && !keywordFlag && !duplicateFlag)
{
outputArray.Add(currStr);
prevStr = currStr;
}
}
if (outputArray.Count() == 2 && searchKeywords.Contains(outputArray[1]))
{
outputArray.Remove(outputArray[1]);
}
output = string.Join(" ", outputArray);
if (output.Contains(" ") && !output.Contains("and") && !output.Contains("or"))
{
return string.Join(" or ", output.Split(' ').Select(I => I.Trim()));
}
return output;
}
![Current output][1]
oyster and fish and clean water
oyster and fish and clean water
oyster fish clean and water
oyster or fish or clean or water
oyster or fish
oyster
oyster
Since you haven't shown what you've done so far I'm assuming that you haven't started on a solution, so here's a high level algorithm:
In that case, use
String.Split(' ')to split thesearchstringby each space.Use a
foreachloop on the resulting array of strings and use string concatenation to complete, if a word was already used before that's notororand, don't add it to the resulting string. If the previous word wasororandand the current one also is, don't add it to the resulting string. If the previous word wasn'tororandand the current one isn't, addorto the resulting string.EDIT: Now that the code has been posted, I can see what's wrong
this conditional:
is only getting called if the output doesn't contain any instance of
andororDo the check to see if
orneeds to be added within yourforeachloop, and get rid of that conditionale.g:
Also, where you're checking to see if there are only 2 tokens in the array, you're only accounting for if they put
ororandafter a word, what happens if they put inor Oysteras an input string? The resulting string would just beoryou need to account for that: