Regex with PowerShell won't work with /n (newline)

142 views Asked by At

For some reason, newline characters will not work with my regex.

Here's the first part of the PowerShell script. This piece of code is used to get a set of markdown files. For testing purposes, I'm only using the first file.

The code then gets the content inside the file.

$path = 'C:\Users\Will\Desktop\ProjectTemp'

$filelist = ls $path -filter *.md -recurse

[string[]]$currentfile_path = $filelist[0].FullName

$currentfile_data = Get-Content $currentfile_path

This is the problematic code. It works fine anywhere else, but PowerShell will not find the matched so that they can be replaced. Specifically, I have found that the newline characters will not work. Nothing I have tried will work.

$currentfile_data -Replace '([ \t]*)\| (.*) \| (.*) \| (.*) \|\r\n[ \t]*\| .* \| .* \| .* \|\r\n[ \t]*\| (.*) \| (.*) \| (.*) \|','$1<table>\n$1\t<tr align="center" valign="middle">\n$1\t\t<th>$2</th>\n$1\t\t<th>$3</th>\n$1\t\t<th>$4</th>\n$1\t</tr>\n$1\t<tr align="center" valign="middle">\n$1\t\t<td>$5</td>\n$1\t\t<td>$6</td>\n$1\t\t<td>$7</td>\n$1\t</tr>\n$1</table>'

This is meant to find the regex: '([ \t]*)\| (.*) \| (.*) \| (.*) \|\r\n[ \t]*\| .* \| .* \| .* \|\r\n[ \t]*\| (.*) \| (.*) \| (.*) \|'

and replace it with:

$1<table>\n$1\t<tr align="center" valign="middle">\n$1\t\t<th>$2</th>\n$1\t\t<th>$3</th>\n$1\t\t<th>$4</th>\n$1\t</tr>\n$1\t<tr align="center" valign="middle">\n$1\t\t<td>$5</td>\n$1\t\t<td>$6</td>\n$1\t\t<td>$7</td>\n$1\t</tr>\n$1</table>'

What the input and output should look like:

| th1 | th2 | th3 |
| :-: | :-: | :-: |
| td1 | td2 | td3 |
<table>
    <tr align="center" valign="middle">
        <th>th1</th>
        <th>th2</th>
        <th>th3</th>
    </tr>
    <tr align="center" valign="middle">
        <td>td1</td>
        <td>td2</td>
        <td>td3</td>
    </tr>
</table>

Is there any way this can be fixed? Am I doing something wrong?

1

There are 1 answers

0
mklement0 On

tl;dr

$currentfile_data = Get-Content -Raw $currentfile_path

-Raw ensures that Get-Content reads the iput file as a whole, as a single, typically multi-line string, which ensures that the subsequent -replace operation can perform matching across lines.


As for what you tried:

$currentfile_data = Get-Content $currentfile_path

This saves an array of the lines from the input file in variable $currentfile_data, because Get-Content by default streams the target file's lines one by one.

$currentfile_data -Replace '...'

This makes the -replace operator act on each element of the array stored in $currentfile_data, i.e. on each line of the original file rather than across lines.


A PowerShell (Core) 7+ alternative:

As Santiago Squarzon points out, PowerShell (Core) 7+ ships with the ConvertFrom-Markdown cmdlet, which can directly transform your input file to HTML (albeit not in the exact same format):

(@'
| th1 | th2 | th3 |
| :-: | :-: | :-: |
| td1 | td2 | td3 |
'@ | ConvertFrom-Markdown).Html

Output:

<table>
<thead>
<tr>
<th style="text-align: center;">th1</th>
<th style="text-align: center;">th2</th>
<th style="text-align: center;">th3</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center;">td1</td>
<td style="text-align: center;">td2</td>
<td style="text-align: center;">td3</td>
</tr>
</tbody>
</table>