C# Full-text search string format : string remove all adjacent duplicates and append with 'AND' 'OR'

1k views Asked by At

I'm looking for a solution in c# to format search string entered by user, before calling SQL query

Full text index is enabled on Table, query looks like following

select [title] from publications where contains([title], @searchString)

main issues:

1) add 'OR' by default between two words (ex C and C-1 below)
1) remove adjacent duplicate from search string<br>( ex a,b,b-1, e below)
2) remove 'AND' 'OR' at the end of the string (ex d below)

Examples:
Input => output

a)   "oyster and oyster or fish and clean water" => "oyster or fish and clean OR water"<br>
b)   "oyster and and fish and clean water" => "oyster and fish and clean OR water"<br>
b-1) "oyster oyster fish fish clean and water"=> "oyster or fish or clean and water"
c)   "oyster fish" => "oyster or fish"<br>
c-1) "oyster fish clean water" => "oyster or fish or clean or water"
d)   "oyster and" => "oyster"<br>
e)   "oyster and oyster" => "oyster"<br>

current code (wch failed in case a,b and b-1; works for c-1,d,e)

 string Format(string str)
    {
        List<string> searchKeywords = new List<string> { "and", "or" };
        //convert to lower case
        str = str.Replace(",", " ").ToLower();

        Regex regex = new Regex(@"[ ]{2,}", RegexOptions.None);
        //remove extra whitespace with space
        str = regex.Replace(str, @" ");

        //split string 
        string[] strArray = str.Split(' ');

        List<string> outputArray = new List<string>();
        string output = "";
        string prevStr = "";
        string currStr = "";
        bool keywordFlag = false;
        bool duplicateFlag = false;

        //remove adjacent keyword or same words
        foreach (var item in strArray)
        {
            currStr = item.Trim();
            keywordFlag = searchKeywords.Contains(prevStr) && searchKeywords.Contains(currStr);
            duplicateFlag = outputArray.Contains(currStr) && !searchKeywords.Contains(currStr);
            if (!currStr.Equals(prevStr) && !keywordFlag && !duplicateFlag)
            {
                outputArray.Add(currStr);
                prevStr = currStr;
            }
        }

        if (outputArray.Count() == 2 && searchKeywords.Contains(outputArray[1]))
        {
            outputArray.Remove(outputArray[1]);
        }

        output = string.Join(" ", outputArray);
        if (output.Contains(" ") && !output.Contains("and") && !output.Contains("or"))
        {
            return string.Join(" or ", output.Split(' ').Select(I => I.Trim()));
        }
        return output;
    }


![Current output][1]

oyster and fish and clean water
oyster and fish and clean water
oyster fish clean and water
oyster or fish or clean or water
oyster or fish
oyster
oyster

2

There are 2 answers

3
Saggio On BEST ANSWER

Since you haven't shown what you've done so far I'm assuming that you haven't started on a solution, so here's a high level algorithm:

In that case, use String.Split(' ') to split the searchstring by each space.

Use a foreach loop on the resulting array of strings and use string concatenation to complete, if a word was already used before that's not or or and, don't add it to the resulting string. If the previous word was or or and and the current one also is, don't add it to the resulting string. If the previous word wasn't or or and and the current one isn't, add or to the resulting string.

EDIT: Now that the code has been posted, I can see what's wrong

this conditional:

    if (output.Contains(" ") && !output.Contains("and") && !output.Contains("or"))
    {
        return string.Join(" or ", output.Split(' ').Select(I => I.Trim()));
    }

is only getting called if the output doesn't contain any instance of and or or

Do the check to see if or needs to be added within your foreach loop, and get rid of that conditional

e.g:

            foreach (var item in strArray)
            {
                currStr = item.Trim();
                keywordFlag = searchKeywords.Contains(prevStr) && searchKeywords.Contains(currStr);
                duplicateFlag = outputArray.Contains(currStr) && !searchKeywords.Contains(currStr);
                if (!currStr.Equals(prevStr) && !keywordFlag && !duplicateFlag)
                {
                    if (!searchKeywords.Contains(prevStr) && !searchKeywords.Contains(currStr) && prevStr != "")
                    {
                        outputArray.Add("or");
                    }
                    outputArray.Add(currStr);
                    prevStr = currStr;
                }
            }

Also, where you're checking to see if there are only 2 tokens in the array, you're only accounting for if they put or or and after a word, what happens if they put in or Oyster as an input string? The resulting string would just be or

you need to account for that:

            if (outputArray.Count() == 2)
            {
                if(searchKeywords.Contains(outputArray[0]))
                    outputArray.Remove(outputArray[0]);
                else if(searchKeywords.Contains(outputArray[1]))
                    outputArray.Remove(outputArray[1]);
            }
0
BeingDev On

not sure if this correct answer, thank you very much @saggio, for suggestions.

private string FormatSearchString(string str)
    {
        List<string> searchKeywords = new List<string> { "and", "or" };
        //convert to lower case
        str = str.Replace(",", " ").ToLower();

        Regex regex = new Regex(@"[ ]{2,}", RegexOptions.None);
        //remove extra whitespace with space
        str = regex.Replace(str, @" ");

        //split string 
        string[] strArray = str.Split(' ');

        List<string> outputArray = new List<string>();
        string output = "";
        string prevStr = "";
        string currStr = "";
        bool keywordFlag = false;
        bool duplicateFlag = false;

        //remove adjacent keyword or same words
        foreach (var item in strArray)
        {
            currStr = item.Trim();
            keywordFlag = searchKeywords.Contains(prevStr) && searchKeywords.Contains(currStr);
            duplicateFlag = outputArray.Contains(currStr) && !searchKeywords.Contains(currStr);
            if (!currStr.Equals(prevStr) && !keywordFlag && !duplicateFlag)
            {
                if (!searchKeywords.Contains(prevStr) && !searchKeywords.Contains(currStr) && prevStr != "")
                {
                    outputArray.Add("or");
                }
                outputArray.Add(currStr);
                prevStr = currStr;
            }
        }

        if (outputArray.Count() == 2)
        {
            if (searchKeywords.Contains(outputArray[0]))
                outputArray.Remove(outputArray[0]);
            else
                outputArray.Remove(outputArray[1]);
        }

        output = string.Join(" ", outputArray);

        return output;
    }