Page 1 of 1

Compare split lines with non-split lines

Posted: Wed Jul 03, 2019 10:50 pm
by fiddyschmitt
Hi,

I'm comparing a word document (left) with a PDF file (right). The PDF file was generated from a newer revision of the word document.

The PDF has hard line-breaks in places that the word document just visually wraps to the next line. As such, ExamDiff concludes that they are very different.

Is there a way to get ExamDiff to examine sentences rather than lines?

I've tried:
  • Replacing \n with spaces in ExamDiff
  • Playing with the wrap type and width settings in ExamDiff
  • Converting the word document to PDF, but this ends up producing a PDF with slightly different wrapping to the other PDF
  • Playing with the margins in Microsoft Word to achieve the same PDF wrapping
but none of that worked...

Thanks,
Fidel

Re: Compare split lines with non-split lines

Posted: Thu Jul 04, 2019 10:04 am
by psguru
No, there's really no way to do it in EDP. If there was a tool that converts linebreaks within sentences to spaces, it could be used as a plug-in in EDP, but I don't know of such tool.

Re: Compare split lines with non-split lines

Posted: Thu Jul 04, 2019 2:58 pm
by fiddyschmitt
No worries, thanks guru.

In the end I used this process to make the files comparable:

- From ExamDiff, copy the text from the left pane into Notepad++
- In Notepad++
   //The following removes carriage returns
   Press Ctrl+H
      Find what: \r\n
      Replace with: (leave this blank)
      Search mode: Regular Expression
      Click 'Replace All'

   //The following places each sentence on its own line
   Press Ctrl+H
      Find what: \.
      Replace with: \n
      Search mode: Regular Expression
      Click 'Replace All'

- Do the same for the PDF content from ExamDiff
- Now create a new ExamDiff window and compare the two texts from Notepad++

Re: Compare split lines with non-split lines

Posted: Thu Jul 04, 2019 3:31 pm
by psguru
If it's the process you described, you could write two scripts, one for .DOC files and the other for .PDF, using, say, sed, and use them as additional plug-ins for these respective file types (Options | Tools | Plug-ins).

Re: Compare split lines with non-split lines

Posted: Thu Jul 04, 2019 5:15 pm
by fiddyschmitt
Brilliant, thanks guru