Page 1 of 1

comparison is triping

Posted: Thu Jan 13, 2022 3:19 pm
by cmal
I compared a larger file with a smaller;
the larger contains most of the smaller two times the first occurrence is not found:

Image

the small file has 4 leading bytes ("0x00000001") which can be found in part after the start of the first block (marked dark blue); that's why only the second block is found; that seems to be a bug;

by the way: I had to use a camera to take the picture, because as soon as the program looses focus the higlighting (dark blue) is disabled; it would be useful to keep the highlighting if one needs to mark something and then work in a third party software...

Re: comparison is triping

Posted: Fri Jan 14, 2022 10:41 am
by psguru
The diff algorithm finds equivalences in the order of the files, so it found those 00 first, when preventing the match you expected. The result is still correct albeit not as pleasing to the eye.

BTW, you will get a different (better?) result using byte-to-byte comparison (Binary options).

Re: comparison is triping

Posted: Fri Jan 14, 2022 11:54 am
by cmal
I don't think that byte by byte gives better results:

Image

and I would expect that it finds the first match; to show the second would indicate there is nothing before...

Re: comparison is triping

Posted: Fri Jan 14, 2022 11:56 am
by psguru
Byte-by-byte is dumb, it simply compares bytes in the same positions. Hence I used "?".

Re: comparison is triping

Posted: Sat Jan 15, 2022 10:05 am
by cmal
something more:
please check the following binary files in the zip...
https://mega.nz/file/vp9kmJ7S#UYGyo58UA ... 20qo0Y-8lw

- main_0.prt -> a larger file containing the data of the other two files
- main_0_1.prt -> is the first portion of main_0.prt
- sub.prt -> ist part of main_0_1.prt (therefore obviously also of main_0.prt)

if you compare main_0 against main_0_1 it will be detected;
if you compare sub against main_0 it will not be detected (or at least not displayed correctly)
if you compare sub against main_0_1 it will be detected and displayed correctly

Re: comparison is triping

Posted: Sat Jan 15, 2022 11:49 am
by psguru
Please post screenshots with results. My guess is that, like in the first case, comparison is correct but not what a human would necessarily do.

Re: comparison is triping

Posted: Sat Jan 15, 2022 12:21 pm
by MSpagni
Probably a naive question: isn't it possible to apply to binary files the (or, at least, some of) the algorithms now selectable for text?

Re: comparison is triping

Posted: Sat Jan 15, 2022 12:44 pm
by psguru
I should have predicted this question :). I'll have to check on this.

Re: comparison is triping

Posted: Sat Jan 15, 2022 1:10 pm
by psguru
I can confirm that binary comparison uses only the classic diff algorithm. This is due to the design limitations of the diff library.

Re: comparison is triping

Posted: Sat Jan 15, 2022 1:35 pm
by MSpagni
What a pity... :(

Re: comparison is triping

Posted: Sat Jan 15, 2022 4:04 pm
by cmal
psguru wrote: Sat Jan 15, 2022 11:49 am Please post screenshots with results. My guess is that, like in the first case, comparison is correct but not what a human would necessarily do.
if you download the files and perform the comparison as described you will see by your self.
the larger file is bigger than one screen; so, no single screenshot will catch the problem...

to the "first case" :
I purchased the tool for analytical tasks. from my point of view it is not acceptable that it overlooks a 52-byte block because it has found 4 isolated bytes before... one could not rely on the outcome of any comparison... -> useless for serious work.

Re: comparison is triping

Posted: Sat Jan 15, 2022 4:29 pm
by psguru
Diff results are subjective. The guarantee of the tool is their correctness, that is that you can mentally convert one file to the other using the results.

We are also looking into a possibility of using advanced diff algorithms for binary comparison. It will require more memory but at least it may be available. The change, if we go ahead with it, will appear in the next major version.

Re: comparison is triping

Posted: Sun Jan 16, 2022 1:28 am
by JeremyNicoll
cmal wrote: Sat Jan 15, 2022 4:04 pm the larger file is bigger than one screen; so, no single screenshot will catch the problem...
Two things:
1. you should be able to create a test case where the files are somewhat smaller
2. a temporary change to the resolution/scaling of your desktop might enable you to fit a bigger window into one screenful

Re: comparison is triping

Posted: Sun Jan 16, 2022 10:11 am
by cmal
JeremyNicoll wrote: Sun Jan 16, 2022 1:28 am
cmal wrote: Sat Jan 15, 2022 4:04 pm the larger file is bigger than one screen; so, no single screenshot will catch the problem...
Two things:
1. you should be able to create a test case where the files are somewhat smaller
2. a temporary change to the resolution/scaling of your desktop might enable you to fit a bigger window into one screenful
the "large" file is 3 MB... is such a file already considered too big? maybe the size is a key in this problem.

Edit:
to "be able to create a test case " one needs to know already what the root cause is... that is is not a task for the customer but for the developer...
in general in the very few days with the tool (which looks promising, don't take me wrong) I'm under the impression that it struggles with files which are of (significantly) different size...

Re: comparison is triping

Posted: Sun Jan 16, 2022 11:42 am
by JeremyNicoll
cmal wrote: Sun Jan 16, 2022 10:11 am
JeremyNicoll wrote: Sun Jan 16, 2022 1:28 am
cmal wrote: Sat Jan 15, 2022 4:04 pm the larger file is bigger than one screen; so, no single screenshot will catch the problem...
Two things:
1. you should be able to create a test case where the files are somewhat smaller
2. a temporary change to the resolution/scaling of your desktop might enable you to fit a bigger window into one screenful
the "large" file is 3 MB... is such a file already considered too big? maybe the size is a key in this problem.
Of course the file is not "too big" from a compare point of view. But it was YOU who said that the larger file is more than a screenful, for taking screenshots of the problem. So try and invent demonstration files that are small enough to fit into a single screen for screenshots.