How to convert fixed position PDF documents to free flowing text like TXT RTF DOC etc. with pdftotext on Linux


Have you ever gotten a document that was saved in PDF format, and you wanted to edit it? So you converted it to text using a utility, but each line has a newline at the end so you would have to manually remove each newline. Really annoying right?

I thought I would record how I did it even though it's unlikely that anyone will find this file through a search engine. Imagine if everyone who found solutions like this published their work!

First take the PDF document and convert it to formatted text with a single byte encoding. The text document must be in a format where each paragraph ending has two newline characters in a row, else this won't work. Then convert all "\n" characters to = characters or some other character that isn't used in the document. Use sed to convert all single instances of = in to spaces but ignore multiple instances like ==. Then convert = back in to "\n". Now the document is in free flowing text. Open it in a word processor and save it as any file type that you like. You'll have to manually recreate bold headings and any other text styles that were lost in the conversion to text. Here are the commands for a GUN/Linux system:
 
$ pdftotext -enc Latin1 -eol unix -layout file.pdf file.converted.pdf.txt
$ cat file.converted.pdf.txt | tr "\n" "=" | sed 's/=\{1,\}/\x0&\x1/g; s/\x0=\x1/ /g; s/[\x0\x1]//g;' | tr "=" "\n" > file_flowing_text.txt
$ Ted file_flowing_text.txt # Open in a word processor and save as anything you like.

You can now edit the file and read it on any screen width!. I hope you enjoyed! I hope you found this useful.
People send me more methods:
go|dfish: $ echo '= == === ==== = =' | perl -pe 's/(?<!=)=(?!=)/X/g'

perl -p -00 -E 's/(?<!\n)\n(?!\n)/ /g' file.converted.pdf.txt > file_flowing_text.txt

Once you find a solution the other solutions from people just keep coming in!
The vim editor solution:
Open the file in vim, set the text width really wide, and gqG, then save it, like this:
vim file.txt
:set tw=10000
gqG
:w file_free_flowing.txt
:q