Lua | String Pattern Replacement

57 views Asked by At
local str = {
    ["red"] = "ff0000",
    ["blue"] = "4C9FFF",
    ["purple"] = "C33AFF",
    ["green"] = "53FF4A",
    ["gray"] = "E2E2E2",
    ["black"] = "000000",
    ["white"] = "ffffff",
    ["pink"] = "FB8DFF",
    ["orange"] = "FF8E1C",
    ["yellow"] = "FAFF52",
    --TODO Add Colors
}

function str:generate_string(text)
    assert(text and type(text) == "string")

    local function replaceColor(match)
        local colorName = match:sub(2)
        local colorValue = self[colorName]
        if colorValue then
            return "#" .. colorValue
        else
            return match
        end
    end

    local pattern = "(&%w+)"

    local result = text:gsub(pattern, replaceColor)

    return result
end

local text = "&whiteHello&white World"
print(str:generate_string(text))

Hello. I have such code. I want to pass a string as an argument and inside the function, replace parts where color names are mentioned, followed by an '&' sign, with their hexadecimal code. This code works fine as long as there is a space after the last letter. For example: "Hello&green World" Result: "Hello#53FF4A World" However, if there is no space after the pattern, like: "Hello &greenWorld" Then the code behaves incorrectly: Hello &greenWorld Is there a way to fix this issue? Also, if there's a more efficient way to implement this code, I'd appreciate your input.

I expect the code to produce such a result: For example: "Hello &greenWorld" Result: "Hello #53FF4AWorld"

3

There are 3 answers

2
darkfrei On

Please try simpler solution as ["&green"] = "#53FF4A".

2
Organis On

I agree with @darkfrei, you seem to overcomplicate things. This is much simpler and works:

local MAP = {
    ["&red"] = "#FF0000",
    ["&blue"] = "#4C9FFF",
    ["&purple"] = "#C33AFF",
    ["&green"] = "#53FF4A",
    ["&gray"] = "#E2E2E2",
    ["&black"] = "#000000",
    ["&white"] = "#FFFFFF",
    ["&pink"] = "#FB8DFF",
    ["&orange"] = "#FF8E1C",
    ["&yellow"] = "#FAFF52",
    --TODO Add Colors
}

function colorate(text)
    assert(text and type(text) == "string")

    local result = text
    
    for k,v in pairs(MAP) do
        result = result:gsub(k, v)
    end

    return result
end

local text = "&whiteHello&white World Hello &greenWorld"
print(colorate(text))
-- Output: #FFFFFFHello#FFFFFF World Hello #53FF4AWorld
0
Luatic On

However, if there is no space after the pattern, like: "Hello &greenWorld" Then the code behaves incorrectly

The reason for this is that %w "represents all alphanumeric characters". So your match will be &greenWorld rather than just &green, hence the table lookup will return nil and you won't perform the replacement.

Is there a way to fix this issue?

Is the color name delimited? HTML escape codes for example end with a semicolon, like this: &.

If your color names are guaranteed not to be followed by a lowercase letter, you could replace %w with [a-z].

Wanting to "map" some kind of "names" to "values" using gsub is a frequent occurrence, which is why gsub even provides a special feature for this to make this particularly convenient: The second argument need not be a function, it can also be a table. The first capture will then be used to index this table. If the resulting value is nil, no substitution will happen. Otherwise, the entire match will be substituted with the value.

Using this, you could just write:

local name_to_colorcode = {
    ["red"] = "#FF0000",
    ["blue"] = "#4C9FFF",
    ["purple"] = "#C33AFF",
    ["green"] = "#53FF4A",
    ["gray"] = "#E2E2E2",
    ["black"] = "#000000",
    ["white"] = "#FFFFFF",
    ["pink"] = "#FB8DFF",
    ["orange"] = "#FF8E1C",
    ["yellow"] = "#FAFF52",
    -- TODO add more colors
}

local text = "&whiteHello&white World"
print(text:gsub("&([a-z]+)", name_to_colorcode))

This solution is pretty much strictly better than repeated substitution: It is more performant (linear time O(n) vs O(nm) where m is the number of words in your "map"). And yes, for a map of this size, the difference will be measurable in practice.

It is also less error-prone; there are no issues with substitutions happening "in the wrong order". Suppose you had &rose and &rosegold. This one-pass solution would greedily replace &rosegold with the appropriate color code. In the multi-pass replacement solution, you would have to be careful to first replace the longer names, then the prefixes of those.

That said, it has the limitation of requiring some kind of (implicit) delimiter. I would argue that this is not much of an issue in practice. Most sensible "escape" codes have this - consider for example string (backslash) escape codes or HTML escape codes ("entities"). If your spec requires you to treat &greenhelloworld as #00FF00helloworld, I'd argue that that is quite confusing to tokenize for a human and thus probably a bad idea.

If you wanted to write an optimally performant single-pass solution, you have two options:

  • Use some kind of proper regex engine (Lua patterns are not regex) for Lua. Then you could write a pattern of the form &(red|green|blue|...) and substitute based on that. This is very likely to be more performant. In particular, it can be linear time, where the constant factor is not influenced by how many colors you have.
  • Do it yourself, manually, using a trie (prefix tree). This is indeed probably not worth your time.

If this is the case and you can't change the spec to require color names to be delimited one way or another, you should probably just go with the multi-pass solution until you have demonstrated a performance issue; do not optimize prematurely.

Also, if there's a more efficient way to implement this code, I'd appreciate your input.

Your code has three very minor issues:

  • You don't use a capture, so you resort to substringing. This is unnecessary.
  • You prepend the # to the strings on-demand, rather than having them pre-prepended in the table. This is also (very slightly) wasteful.
  • You use a function when a table would suffice.

My proposed solution should address all three (not that they would be very relevant for performance).

Much more importantly, I find it to be much simpler, and by adding a minor requirement, it can handle at least the test case you have given correctly.