Using pypandoc.convert_file from docx to txt - how to prevent text wrapping in table columns?

748 views Asked by At

I am using pypandoc to convert docx files to txt:

f = 'some file.docx'
o = pypandoc.convert_file(f, 'plain', outputfile='file.txt')
assert o == '', o

The problem is that the result is best fitted for visual readability - the text in table columns wrapped and therefore can't be read programmatically.

For example, word "similar" wraps into "s", then go spaces, then go words from other columns and then on the next line the word "imilar" appears, like this:

|s |words|words|

|imilar|words|words|

So it is impossible to read programmatically the word "similar".

I need a result like MS Word provides by saving docx as txt - non-wrapped text. Unfortunately, I am limited in the choice of python libraries.

Is it possible to turn off word wrapping in pypandoc.convert_file?

1

There are 1 answers

0
Abhijeet Banerjee On

You can add extra argument --wrap=none

extra_args=('--standalone','--wrap=none')

so it will look like this

pypandoc.convert_file(f, 'plain',extra_args=('--standalone','--wrap=none'), outputfile='file.txt')