Page 1 of 1

calling lines "changed" is weird

Posted: Sat Dec 10, 2011 9:40 am
by DyNama
hi, gang! new user of ExamDiff!

i'm using ExamDiff to compare my newly downloaded tv listings with an archived version just to see what's been added and subtracted. the 2 files are just an alphabetical lists of tv shows, 1 show per line. when run, ED calls a number of lines "changed" which is really weird.

i have no problem when "Ann King Sterling and 18K Gold Jewelry" is marked changed from "Ann King Sterling And 18K Gold Jewelry", that is changed. but the results also calls these lines changed:
Beverly Hills Fabulous => Beverly Hills Cop
BrainSurge => Bram Stoker's Way of the Vampire
Buddha's Birthday Jade Jewelry Event => Buddy Holly: Listen to Me -- The Ultimate Buddy Party

since lines before and after were added and deleted, i don't know why ED picked 1 line from the 1st file to compare to another in the 2nd file--the line numbers are not the same.

this would under-report the number of changes to the files as the 1st of each pair was actually deleted and the 2nd of each pair was actually added, so there are 6 changes to the files rather than 3.

i'd just as soon eliminate the category of "changed" lines, or somehow specify stricter criteria for picking lines to compare, but, since i'm just looking at this for the novelty, it isn't all that important in my particular case. just thought i'd mention it. thanx for ExamDiff!

Re: calling lines "changed" is weird

Posted: Sat Dec 10, 2011 3:52 pm
by psguru
This is how most diff algorithms work. "Changes" are blocks that are different in both files and are surrounded by blocks that are identical in both files. If the matching diff block is missing in one of the files, the change is considered added or deleted.