trying to understand...

General questions about using ExamDiff Pro, ideas for new features, bug reports, and usage tips.
Post Reply
differ
Junior Member
Posts: 11
Joined: Thu Sep 09, 2004 12:26 pm

trying to understand...

Post by differ »

Hi.

In order to get more out of ExamDiff Pro, I'm trying to understand how it works. I could not find any high-level discussion on the website of the algorithms used. Some of the questions I have are:

* might the order of directory/file selection effect efficiency?

* why are CRCs calculated even after comparison results are complete? just for fun?

* what's the relationship of number/depth of directory/files to memory usage and overall compare time (linear?)

* what's the relationsihp of file size to memory usage and overall compare time (they do not appear linear!)

* what's the practical difference between "Consider files with different CRCs different" and "Ignore the ignores"?

* how are files' text/binary type determined? file extension? non-alphanumeric ascii codes?

Thanks.
User avatar
psguru
Site Admin
Posts: 2231
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Post by psguru »

> * might the order of directory/file selection effect efficiency?

No (unless I misunderstood you definition of "order of directory/file selection").

> * why are CRCs calculated even after comparison results are complete? just for
> fun?

CRCs are only calculated after the comparison is done only if CRC column is
selected to be displayed (Options | Display | Dir Columns).

> * what\'s the relationship of number/depth of directory/files to memory usage
> and overall compare time (linear?)

Depends on options. If full (content-based) file comparison (see Options |
Dir Comparison) is used in directory comparison, then time is a linear function
of total file size. Otherwise, it's a linear function of total file count.

> * what\'s the relationsihp of file size to memory usage and overall compare
> time (they do not appear linear!)

Linear function of file size. However, the slope is much steeper if you choose
to compare binary files in HEX form vs. comparison text files.

> * what\'s the practical difference between \"Consider files with different CRCs
> different\" and \"Ignore the ignores\"?

These are completely different options. "Consider files with different CRCs
different" is one of the ways to avoid content-based (thus slower) file comparison
during directory comparison. When chosen, CRCs of matching files are calculated, and if
they a different, the files are also marked as different.

"Ignore the ignores" command may be useful after a comparison if you selected
some "Ignore..." options in Options | Compare, and then you want to see comparison
results as if none of "Ignore..." options were selected. This applies to both
text file comparison, and to directory comparison where content-based
file comparison is used.

> * how are files\' text/binary type determined? file extension?
> non-alphanumeric ascii codes?

Files are considered binary based on a heuristic that finds and counts non-alphanumeric
characters in a file.
psguru
PrestoSoft
differ
Junior Member
Posts: 11
Joined: Thu Sep 09, 2004 12:26 pm

Post by differ »

Thanks for your reply! Now I have a comment, a follow-up, and a new question. :lol:


<comment> "order of directory/file selection"

I meant simply which file/dir is on the right or left in the EDP display


<follow-up> CRCs

yes, but if I have no ignores (equivalent to "ignore the ignores"), are not files with different CRCs always different?


<new question> why does EDP re-compare files it already knows to be identical when I "dig down" into a subdirectory

i.e. I have selected "Compare subdirectories only to determine their status", I see a subdirectory with a difference, so I double click it to compare deeper, and even though there may be only 1 file out of many, perhaps several subdirectory layers down, all files are re-compared

is there some reason, something I'm missing?

Once again, thanks!
User avatar
psguru
Site Admin
Posts: 2231
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Post by psguru »

> I meant simply which file/dir is on the right or left in the EDP display

No, this will not affect efficiency of comparison.

> yes, but if I have no ignores (equivalent to "ignore the ignores"), are not files with
> different CRCs always different?

Yes. But I see no harm in asking for "Consider files with different CRCs different" option. Perhaps you are right though, and this could inferred from having nothing to ignore. Let me think about it.

> <new question> why does EDP re-compare files it already knows to be identical when I
> "dig down" into a subdirectory i.e. I have selected "Compare subdirectories only to
> determine their status", I see a subdirectory with a difference, so I double click
> it to compare deeper, and even though there may be only 1 file out of many, perhaps
> several subdirectory layers down, all files are re-compared

When you re-compare, or compare a new pair, EDP always compares as if it's the first time: some files might have changed. It doesn't keep track of previous comparisons. When you double-click on a directory after a comparison, EDP will simply start a new comparison. You could argue that nothing could change in such short time, but how would EDP know what "short time" is? Also, keeping track of previous comparisons would be tricky.
psguru
PrestoSoft
differ
Junior Member
Posts: 11
Joined: Thu Sep 09, 2004 12:26 pm

Post by differ »

RE the re-compare issue, I see it simply as a data display issue. Sometimes I prefer to see the results hierarchically (i.e. sort of a "Windows Explorer" view), and other times linearly (a "Compare subdirectories recursively" view). In fact it would be awesome if one could toggle back-and-forth between the two. Maybe I'll add it to the Wish List one of these days. :D

Well, I've taken enough of your time. Thanks for responding to all of my recent posts.

ExamDiff Pro is a great product!

Best regards.

----------
Jim

BTW, I just upgraded from my original registration back in January of 2000. Hope to see you again in about 5 years! :wink:
User avatar
psguru
Site Admin
Posts: 2231
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Post by psguru »

There's already feature on the wish list that seems to fit your needs: "Option to switch between list views and collapsible/expandable tree views in recursive directory comparison". Perhaps you need to add your vote there.
psguru
PrestoSoft
User avatar
psguru
Site Admin
Posts: 2231
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Post by psguru »

A correction regarding:

> > yes, but if I have no ignores (equivalent to "ignore the ignores"), are not files with
> > different CRCs always different?
>
> Yes. But I see no harm in asking for "Consider files with different CRCs different"
> option. Perhaps you are right though, and this could inferred from having nothing
> to ignore. Let me think about it.

If full file comparison is selected (no "Do not perform file comparison" options are
used under Options | Dir Comparison), EDP calculates full diff statistics, including
total number of different lines in text files). Using CRCs to determine whether files
are different (when no ignores are specified) will not allow to generate proper stats.
So when you mean to use CRCs, you need to say it explicitly in "Do not perform file
comparison" options.
psguru
PrestoSoft
Post Reply