German characters not encoded correctly

98 views Asked by At

I have the following function which encodes a string. I basically loop through the string and encode the characters.

CREATE FUNCTION [dbo].[UrlEncode](@url nvarchar(MAX))
    RETURNS nvarchar(3072)
AS
BEGIN
    DECLARE @c nchar(1);
    DECLARE @count int = LEN(@url);
    DECLARE @i int = 1;
    DECLARE @urlReturn nvarchar(max) = '';

    WHILE (@i < = @count)
    BEGIN
        SET @c = SUBSTRING(@url, @i, 1);
    
        IF @c LIKE N'[A-Za-z0-9()''*\-._!~]' COLLATE Latin1_General_BIN ESCAPE N'\' COLLATE Latin1_General_BIN
        BEGIN
            SET @urlReturn = CONCAT(@urlReturn, @c);
        END
        ELSE
        BEGIN
            SET @urlReturn = CONCAT(@urlReturn, '%',
                SUBSTRING(sys.fn_varbintohexstr(CAST(@c AS varbinary(MAX))), 3, 2),
                ISNULL(NULLIF(SUBSTRING(sys.fn_varbintohexstr(CAST(@c AS varbinary(MAX))), 5, 2), '00'), ''));
        END
    
        SET @i = @i +1;
     END

    RETURN @urlReturn;
END

When I do:

SELECT dbo.UrlEncode('ΓΌ')

I get: %fc

However encoded it should be %C3%BC.

What am I missing?

1

There are 1 answers

0
Thom A On

I took a completely different direction here. Using a WHILE to achieve this will be awfully slow, and so too would a scalar function. I switched to a set based method, and designed a method for use in an iTVF instead.

This could very likely be streamlined, but it's just the direction I went as I wrote the solution:

CREATE FUNCTION [dbo].[UrlEncode](@url nvarchar(MAX))
RETURNS table
AS RETURN
    SELECT LOWER(CONCAT('#',STRING_AGG(SUBSTRING(B.VB,BGS.value,2),'#') WITHIN GROUP (ORDER BY GS.value, BGS.value))) AS UrlReturn
    FROM (VALUES(CONVERT(varchar(MAX),@url COLLATE LATIN1_GENERAL_100_CI_AS_SC_UTF8)))V(url)
         CROSS APPLY GENERATE_SERIES(CONVERT(bigint,1), LEN(@url)) GS
         CROSS APPLY (VALUES(SUBSTRING(V.url,GS.value,1)))SS(C)
         CROSS APPLY (VALUES(CONVERT(varchar(8),CONVERT(varbinary(10),SS.C),2)))B(VB)
         CROSS APPLY GENERATE_SERIES(1, LEN(B.VB),2) BGS;
GO