Getting UTF-8 JSON file instead of UTF-16 LE BOM using batch script curl

324 views Asked by At

I have a batch script which does a curl command using:

powershell -Command "curl.exe -k -S -X GET -H 'Version: 16.0' -H 'Accept: application/json' 'https://IntendedURL' > result.json".

I am then using the result.json as an input file to another program.

The issue I am facing is that running the batch script results in a result.json file with format UTF-16 LE BOM. With UTF-16 LE BOM format, the .json file cannot even be opened and observed directly using a normal browser. It has exception of "SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data".

I did some searching around and the answer that came back is that curl should not result in BOM encoding being saved. Nonetheless, I ended up with result.json which is with BOM encoding. If there is something that can be done to the batch script to make it output without BOM, help is appreciated.

For the second part of my issue, I can manually copy the content of result.json and save it in a new text document and specifically choose UTF-8 format. I searched around as well and found a bit of a solution like here: Batch script remove BOM () from file , but the solution script to remove the BOM seems rather complicated.

Thus, any help to remove the BOM encoding without resorting to manually copying the content and saving it in the specific desired format is appreciated as well.

Best regards, cyborg1234

1

There are 1 answers

2
mklement0 On

Note:

  • See the bottom section for a way to avoid the problem by calling curl.exe directly from your batch file.

In PowerShell, > is (in effect) an alias of the Out-File cmdlet, whose default in Windows PowerShell is UTF-16LE (in PowerShell (Core) 7+, it is now the more sensible (BOM-less) UTF-8, across all cmdlets).

You have two options (spoiler alert: choose the 2nd):

  • In principle, you can use Out-File explicitly (or preferably, with text input, Set-Content), in which case you can use its -Encoding parameter to control the output character encoding. However:

    • In Windows PowerShell -Encoding UTF8 invariably creates a UTF-8 file with a BOM. Short of using .NET APIs directly, the only way to create BOM-less UTF-8 files in Windows PowerShell is to use New-Item - see this answer.

    • Even clearing that hurdle is typically not enough, and the following also affects PowerShell (Core) up to v7.3.x (but is no longer a problem in v7.4+ - see this answer):

      • When PowerShell captures output from external programs it currently invariably decodes them into .NET strings first - even when redirecting to a file - using the character encoding stored in [Console]::OutputEncoding, and then re-encodes on output, based on the specified or default character encoding of the cmdlet used.

      • If that encoding doesn't match the character encoding of what curl.exe outputs, the output will be misinterpreted - and given that [Console]::OutputEncoding defaults to the legacy system locale's OEM code page, misinterpretation will occur if the output is UTF-8-encoded, so - unless you've explicitly configured your system to use UTF-8 system-wide (see this answer) - you'll need to (temporarily) set [Console]::OutputEncoding = [Text.UTF8Encoding]::new() before the curl.exe call.

  • Preferably, let curl.exe itself write the output file, i.e. replace > result.json with -o result.json: this will save the output as-is to the target file.


Taking a step back:

  • You can make your curl.exe call directly from your batch file, which avoids the character-encoding problems.

  • cmd.exe's > operator, unlike PowerShell's is a raw byte conduit, so no data corruption should occur - that said, you're free to use -o result.json instead.

  • When calling from cmd.exe (a batch file), only "..." quoting is supported, so the '...' strings have been transformed into "..." ones below.

:: Directly from your batch file, using "..." quoting:
curl.exe -k -S -X GET -H "Version: 16.0" -H "Accept: application/json" "https://IntendedURL" > result.json