I need to search for fields and values within a text and turn them into an object.
Example of text
// condition 1
<@if VERSION = "A1" || VERSION = "A3">
<@assign CTA = "blue">
<@assign CTA2 = "green">
<@assign TEXT1 = "Hello<br/>World">
<@elseif VERSION = "A2">
<@assign CTA = "red">
<@assign CTA2 = "yellow">
<@assign CTA3 = "brown">
<@assign TEXT1 = "Click <a href='https://example.com' style='text-decoration:none;color:#000000;'>here</a>">
<@else>
<@assign CTA = "black">
<@assign CTA2 = "white">
<@assign CTA3 = "pink">
</@if>
// condition 2
<@if VERSION = "A4" || VERSION = "A5">
<@assign CTA = "purple">
<@assign CTA2 = "orange">
<@assign TEXT1 = "Hi <span style='font-weight:bold;'>John</span>">
</@if>
// condition 3
<@if LANG = "en_US">
<@assign TITLE = "English">
<@else>
<@assign TITLE = "French">
</@if>
If the condition contains "@assign" must construct an object
code I'm trying
jsonObj = [];
var hidden_text = html_c.replace(/<@IF[\s\S]*?<\/@IF>/gi, function(i) {
i = i.replace(/<@IF[\s\S]*?>/gi, function(k) {
var $ogg;
item = {}
k = k.replace(/(^(?!.*@IF)|(?<=@IF)).*?((?=\=))/gi, function(x) {
x = x.replace(/^\s+|\s+$|\s+(?=\s)/g, "");
item[x] = [];
$ogg = x;
return x;
});
jsonObj.push(item);
item2 = {}
k = k.replace(/"[\s\S]*?"/gi, function(y) {
item2[y] = [];
return y;
});
item[$ogg].push(item2);
return k;
});
return i;
});
console.log(jsonObj);
<script>
const html_c = `
<@if VERSION = "A1" || VERSION = "A3">
<@assign CTA = "blue">
<@assign CTA2 = "green">
<@assign TEXT1 = "Hello<br/>World">
<@elseif VERSION = "A2">
<@assign CTA = "red">
<@assign CTA2 = "yellow">
<@assign CTA3 = "brown">
<@assign TEXT1 = "Click <a href='https://example.com'
style='text-decoration:none;color:#000000;'>here</a>">
</@if>
// condition 2
<@if VERSION = "A4" || VERSION = "A5">
<@assign CTA = "purple">
<@assign CTA2 = "orange">
<@assign TEXT1 = "Hi <span style='font-
weight:bold;'>John</span>">
</@if>
// condition 3
<@if LANG = "en_US">
<@assign TITLE = "English">
<@else>
<@assign TITLE = "French">
</@if>
`;
</script>
With this code I can create the first part of the object but I don't know how to go about it. moreover, if there are more than one condition with the same field name (e.g. VERSION) a new object is created, while I would like to make it go and update the existing one.
the result I want to get is this, considering:
"VERSION" could have any other name, the script must take the name it finds.
the values of the <@assign> variables may contain some html code
In the if and elseif conditions there could also be the double operator ==
case of the first condition
[{
"VERSION": [{
"A1": [{
"CTA": "blue",
"CTA2": "green",
"TEXT1": "Hello<br/>World",
}, ],
"A3": [{
"CTA": "blue",
"CTA2": "green",
"TEXT1": "Hello<br/>World",
}, ],
"A2": [{
"CTA": "red",
"CTA2": "yellow",
"CTA3": "brown",
"TEXT1": "Click <a href='https://example.com' style='text-
decoration:none;color:#000000;'>here</a>",
}, ],
"ELSE": [{
"CTA": "black",
"CTA2": "white",
"CTA3": "pink",
}, ],
}]
}, ]
after, if there is another condition that contains '@assign' the object must be updated:
in the case of condition 2, the field 'VERSION' already exists within the object so it will have to update by adding the values found
"A4": [{
"CTA": "purple",
"CTA2": "orange",
"TEXT1": "Hi <span style='font-
weight:bold;'>John</span>",
}, ],
"A5": [{
"CTA": "purple",
"CTA2": "orange",
"TEXT1": "Hi <span style='font-
weight:bold;'>John</span>",
}, ],
in the case of condition 3, the LANG field does not exist in the object and therefore will have to be created
"LANG": [{
"en_US": [{
"TITLE": "English",
}, ],
}]
the variables declared in the possible <@else> will go to update the already existing object "ELSE"
Final object
[{
"VERSION": [{
"A1": [{
"CTA": "blue",
"CTA2": "green",
"TEXT1": "Hello<br/>World",
}, ],
"A3": [{
"CTA": "blue",
"CTA2": "green",
"TEXT1": "Hello<br/>World",
}, ],
"A2": [{
"CTA": "red",
"CTA2": "yellow",
"CTA3": "brown",
"TEXT1": "Click <a href='https://example.com' style='text-
decoration:none;color:#000000;'>here</a>",
}, ],
"A4": [{
"CTA": "purple",
"CTA2": "orange",
"TEXT1": "Hi <span style='font-
weight:bold;'>John</span>",
}, ],
"A5": [{
"CTA": "purple",
"CTA2": "orange",
"TEXT1": "Hi <span style='font-
weight:bold;'>John</span>",
}, ],
"ELSE": [{
"CTA": "black",
"CTA2": "white",
"CTA3": "pink",
}, ],
}],
"LANG": [{
"en_US": [{
"TITLE": "English",
}, ],
"ELSE": [{
"TITLE": "French",
}, ],
}],
}, ]
UPDATE
Condition 4
If a new condition calls up an existing field and value pair, the object must update. For example:
<@if VERSION = "A1">
<@assign CTA = "black">
<@assign TEXT2 = "Hello world">
</@if>
VERSION: "A1" has already been created in the object, so it needs to be updated:
- CTA is already present within it, so the value will be replaced with "black"
- TEXT2 was not yet present, so it will be added
From one of my above comments ...
In order to create parsable markup one needs to ...
remove any
@character from the provided markup's custom tags.replace any custom
<assign ...>tag with a closed version of itself.replace any
VERSIONattribute with a unique'version'related name, here with a suffix which uses the index of each matchedVERSIONattribute.The result of the above described steps can be passed to a DOMParser's parseFromString method in order to create e.g. an HTML document.
Such a DOM can be regularly queried, for instance by
querySelectorAll. If one spreads the retrieved node-lists into arrays, one programmatically can create and aggregate the target data structure via nestedreducedbased passages.Edit ... which targets/covers all the additional requirements the OP came up with at a later point.
In order to fulfill the additional and, at one hand, more generic requirements regarding unknown attribute-names, but also more restricting ones regarding attribute-values, where the latter can contain HTML markup, the above approach has to be changed in terms of ...
<@if ...>and<@elseif ...>.Regarding 1), the regex patterns from the above posted first example code not only need to be adapted but new ones have to be utilized as well.
One would start with targeting every attribute-name and attribute-value pair of the originally provided custom markup. The used regex ...
(?<name>[\p{L}\p{N}_-]+)\s*=\s*"(?<value>.*?)(?<!\\)"/gus... does even match line breaks within a value's content. The string replacement does fix/sanitize each attribute's name-value assignment by removing unnecessary white spaces and line breaks, but most importantly by escaping the value sequence viaencodeURI, thus enabling further regex based parsing in the first place ...The next one, we do already know ... it's
/(<\/?)@(?=if|elseif|assign)/gwhich, used with the correct replacement, removes any@character from the provided markup's custom tags ...Third, one does assure the correct closing for every
<assign ...>tag .../<assign.*?>/g...Last, one provides a number based suffix to each attribute name of the conditional
<if ...>and<elseif ...>tags .../(?:(if\s+)|\|\|\s*)(?<attrName>[\p{L}\p{N}_-]+)(?==")/gu...... which is necessary in order to guarantee only unique attribute names. This specially treated attribute names will be transformed back into their normal/initial form when the data structure gets parsed from the HTML-document.
Regarding 2), one does need the help of yet another regex ...
/^([\p{L}\p{N}_-]+)-\d+$/u...... in order to verify ...
... and restore the mutated attribute names ...
... of the conditional
ifandelseifDOM-nodes.During parsing/aggregating the final data-structure from the HTML-document, there is still another restoration to do ... each uri-encoded value has to be decoded via
decodeURI.And the first solution's example code does finally change to the following one ...