I'm in the middle of integrating MarkdownSharp, a serverside Markdown compilation library. I have that working, but now I need to sanitize the generated html.
I took a look at the Stack Exchange Data Explorer source code to see how they sanitize their html, and see that they use the following regex to sanitize images post-conversion:
private static readonly Regex _whitelist_img =
new Regex(
@"
^<img\s
src=""https?://[-a-z0-9+&@#/%?=~_|!:,.;\(\)]+""
(\swidth=""\d{1,3}"")?
(\sheight=""\d{1,3}"")?
(\salt=""[^""<>]*"")?
(\stitle=""[^""<>]*"")?
\s?/?>$",
RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled |
RegexOptions.IgnorePatternWhitespace);
I've been wrestling with how to do write an analagous regex for whitelist_iframe - that ensures that the iframe contains a link from youtube or vimeo. The following links are examples of what I'd like to embed:
<iframe width="560" height="315" src="//www.youtube.com/embed/IZ_ScEebDOM?rel=0" frameborder="0" allowfullscreen></iframe>
<iframe src="//player.vimeo.com/video/80825843?title=0&byline=0&portrait=0" width="500" height="281" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>
So I believe the above needs to be modified to:
- Account for
//instead of http or https - Account for
</iframe>closing tag - Account for
//www.youtube.comor//player.vimeo.combeing required in the beginning of thesrctag.
I'm in the middle of butchering this up as my first regex... any help with this would be much appreciated.
Note that I am not looking to introduce additional libraries or complexity here with a better overall approach, I just want to augement code that's already working, with regex.
Account for // instead of http or https
Remove the "https?:" from the existing regex:
Account for closing tag
Add the closing tag after the end of your input:
Account for //www.youtube.com or //player.vimeo.com being required in the beginning of the src tag.
Add the desired domains in a selection list: