My goal is to parse an html file retrieved with Invoke-WebRequest
. If possible I'd like to avoid any external libraries.
The problem I am facing is, that Invoke-WebRequest
returns a BasicHtmlWebResponseObject
instead of a HtmlWebResponseObject
since Powershell 6. The Basic
version misses the ParsedHtml
property. Is there a good alternative to parse html in Powershell Core 6?
I've tried to use Select-Xml
but my html is not entirely valid (e.g. a missing closing tag), hence this fails to parse the result.
Another alternative I've found is to use New-Object -ComObject "HTMLFile"
but from my understanding this relies on Internet Explorer for parsing which I'd like to avoid.
There is a very similar question here but sadly this question had no answer or activity since 8 months.
As mentioned in the comments it is not really possible without a library. One very good library you could use it the AngleSharp library for dotnet. It has great html parsing capabilities and dotnet code interacts very friendly with powershell, have a look at this link.
Here is an example from their website: