I need to do some grammar validation, for example add spaces after dots. Problem is that it shouldn't be done everywhere like in e.g. or www.example.co. or some more advanced exceptions like 999.77.SA.
My idea was to use preg_replace() and this one works perfectly for everything, but there's no place for exceptions.
//add space after a dot
$string = preg_replace('/(?<=[.,])(?=\p{L}+)/u', ' ',$string);
I could try adding those exceptions to the regexp itself, but we have many of those and the expression would be terribly complicated.
I tried also with preg_match() or preg_match_callback(), but the match array returns only empty strings so that doesn't help.
Example text
Hello.This is my example.In some cases space shouldn't be added. Like in e.g. or www.example.com or 88.ASD
Should be changed to:
Hello. This is my example. In some cases space shouldn't be added. Like in e.g. or www.example.com or 88.ASD
Do you have some idea how to do this the cleanest way?
This task is not solvable in general way with simple regex (or even very complicated one).
Problem is that without semantic understanding of sentence there is no way to classify if
.is part of abbreviation or separator of sentences.Take for example something similar to your example:
77.I: depending on the context it could be abbreviationposition with code 77.I named "Other". Or Parts of two sentences.Elm street 77.I leave there.Also, consider that
.may have additional meanings in other languages. For example, in Latvian2.Ameans "second A", and while space there might be acceptable, there is no clear reason for it to appear.