itextpdf insert space beetwen 7 and dot after extract text

696 views Asked by At

My problem describe this image http://185.49.12.119/~pogdan/7spacedot/7spacedot.jpg input file http://185.49.12.119/~pogdan/7spacedot/monitor_2016_99.pdf

output file http://185.49.12.119/~pogdan/7spacedot/monitor_2016_99.txt

all set files with jar and java http://185.49.12.119/~pogdan/7spacedot/

Why itextpdf insert space? how remove it? Replace 7 . -> 7. not solved for me.

1

There are 1 answers

5
mkl On

Why itextpdf insert space?

iText inserts spaces whenever there is a gap between two consecutive text chunks which is larger than a certain amount, or if two consecutive text chunks overlap. It does so to signal that the chunks do not follow each other in a normal way.

In case of your document a dot following a seven often is moved left as far as possible so that the character bounding boxes overlap:

Sample overlapping 7 and .

how remove it?

If you don't want this, you have to adjust the text extraction strategy you use accordingly.

In the current 5.5.9 the code looks like this:

if (result.charAt(result.length()-1) != ' ' && renderInfo.getText().length() > 0 && renderInfo.getText().charAt(0) != ' '){ // we only insert a blank space if the trailing character of the previous string wasn't a space, and the leading character of the current string isn't a space
    float spacing = lastEnd.subtract(start).length();
    if (spacing > renderInfo.getSingleSpaceWidth()/2f){
        appendTextChunk(" ");
        //System.out.println("Inserting implied space before '" + renderInfo.getText() + "'");
    }
}

The source of your ancient iText version might still look similar here. And this is where you have to change the logic to not insert spaces for backsteps or at least only for larger ones.


As the OP explained in a comment, using

float spaceWidth = renderInfo.getSingleSpaceWidth() * 3f/2f;
float diffI1 = start.subtract(lastEnd).get(Vector.I1);
if (spacing > spaceWidth && diffI1 > 0)
{
    result.append(" ");
}

works well in his case. This does not mean, though, that one should generally change the strategy code this way as it assumes writing oriented in the direction of the positive x axis. Furthermore, the optimal value of the constant by which renderInfo.getSingleSpaceWidth() is multiplied, also depends on thedocument type at hand, cf. e.g. this case.