Page 1 of 1

Bug: Silently changing UTF-8 characters

Posted: Sun Nov 10, 2019 1:06 pm
by Alexo
Version: EDP 10.0.1.17 64-bit
Use plug-ins: off
Use document type-settings: on

Scenario:
Comparing 2 directories.
Comparing one of the changed .XML files by doubleclicking on it.
The has some non-encoded UTF-8 emojis (specifically: F0-9F-91-91) -- those are shown correctly in Notepad++, Windows 10 Notepad and other editors.
Saving the file from EDP silently changes the character from F0-9F-91-91 to ED-A0-BD-ED, which breaks the application that uses that file since it considers it to be a different string.

I found no setting related to UTF-8 that will allow me to save the file without the silent changes.

Result:
It is impossible to reconcile the files with EDP.

Expected behaviour:
EDP should not change any parts of the file behind the user's back.

Note:
The file is not a "real" XML file, it is just a settings file used by the application which has an XML structure. It has no BOM and can include non-encoded UTF-8 characters. I cannot make arbitrary changes to the file (like adding a BOM) since it will likewise break the application.

Note2: Treating the files as binary is not feasible.

Please fix ASAP!

Re: Bug: Silently changing UTF-8 characters

Posted: Sun Nov 10, 2019 2:33 pm
by psguru
We'll need your pair of files to debug this issue. Is it possible for you to attach them here? If not, please email to examdiffpro [at] prestosoft.com.

Re: Bug: Silently changing UTF-8 characters

Posted: Sun Nov 10, 2019 2:44 pm
by Alexo
Those files contain private information but I'll see if I can extract just the relevant parts.

Re: Bug: Silently changing UTF-8 characters

Posted: Sun Nov 10, 2019 3:18 pm
by Alexo
Fines and settings sent in PM.

The "You cannot make another post so soon after your last." is annoying as hell.

Re: Bug: Silently changing UTF-8 characters

Posted: Mon Nov 11, 2019 12:33 pm
by MSpagni
Maybe (but only maybe) I had the same problem, but I was in a hurry and I didn't investigate.
The "You cannot make another post so soon after your last." is annoying as hell.
I agree!

Re: Bug: Silently changing UTF-8 characters

Posted: Mon Nov 11, 2019 12:37 pm
by psguru
Alexo wrote: Sun Nov 10, 2019 3:18 pm Fines and settings sent in PM.
There were no attachments in your PM.
The "You cannot make another post so soon after your last." is annoying as hell.
It's for spam protection. I changed the setting to 60 seconds.

Re: Bug: Silently changing UTF-8 characters

Posted: Mon Nov 11, 2019 3:28 pm
by psguru
Got the files, thanks. Yes, since the files are treated as UTF-8, due to UTF-8 markers within them (Notepad also thinks they are UTF-8), EDP tries to save then as such. However, due to a bug in UTF-8 saving in such specific circumstances that your files present, it saves incorrectly.

The fix will appear in the next builds of 10.0 and 11.0 Beta.

Re: Bug: Silently changing UTF-8 characters

Posted: Mon Nov 11, 2019 3:41 pm
by Alexo
Thanks!

I hope it arrives soon, I need to mitigate that problem often.

Re: Bug: Silently changing UTF-8 characters

Posted: Mon Nov 11, 2019 3:48 pm
by psguru
We are targeting tomorrow.

Re: Bug: Silently changing UTF-8 characters

Posted: Tue Nov 12, 2019 11:22 am
by psguru

Re: Bug: Silently changing UTF-8 characters

Posted: Tue Nov 12, 2019 11:53 am
by MSpagni
psguru wrote: Mon Nov 11, 2019 12:37 pmIt's for spam protection.
I had no doubt about it! :D