EBCDIC is not being displayed correctly

General questions about using ExamDiff Pro, ideas for new features, bug reports, and usage tips.
David B. Trout
Full Member
Posts: 28
Joined: Wed Jan 06, 2010 4:21 am

EBCDIC is not being displayed correctly

Post by David B. Trout »

EDP has a Binary Comparison Character set option to display binary data in either EBCDIC or ASCII. As an IBM Mainframe programmer, I use EBCDIC a lot, and noticed some characters are not displaying correctly: :(

edp-ebcdic.png
edp-ebcdic.png (90.46 KiB) Viewed 5231 times

On the left is the string: "Success ! CDSG, STPQ and LPQ: OK".
On the right is the string: "Success! CDSG, STPQ and LPQ: OK!".

(A blank was removed before the first exclamation mark and added to the end of the string after "OK".)

As you can see, the exclamation-mark is being incorrectly displayed as a right-square-bracket instead of as an exclamation-mark.

I don't know what Code Page EDP is using, but in the CP037 Code Page (which is the one I would expect to be used), hex 5A is an exclamation mark (ASCII hex 21), not a right square bracket:

* https://www.kreativekorp.com/charset/encoding/CP037/
* https://en.wikipedia.org/wiki/Code_page_37

Can this either be fixed or a new option provided so the user can choose which Code Page they prefer to be used instead of whatever code page EDP is currently using?

Thanks!

Keep up the otherwise good work! :)
Last edited by David B. Trout on Thu Jan 05, 2023 7:03 pm, edited 1 time in total.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
David B. Trout
Full Member
Posts: 28
Joined: Wed Jan 06, 2010 4:21 am

Re: EBCDIC is not being displayed correctly

Post by David B. Trout »

.
FYI: Other programs seem to display EBCDIC data just fine:

HXD.png
HXD.png (51.13 KiB) Viewed 5230 times
.
hexedit.png
hexedit.png (49.31 KiB) Viewed 5230 times
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
David B. Trout
Full Member
Posts: 28
Joined: Wed Jan 06, 2010 4:21 am

Re: EBCDIC is not being displayed correctly

Post by David B. Trout »

P.S. It would also be nice if the left hand file offset column wasn't so wide too. In the EDP comparison example I posted, the file is only 224 bytes in size. Yet, the left hand file offset column is 16 hexadecimal digits wide!

I seriously doubt anyone would be comparing two 64-petabyte binary files with EDP. :P

IMHO, an 8 character (8 hex digits = 32-bits) wide file offset column should be plenty. :wink:
Last edited by David B. Trout on Thu Jan 05, 2023 7:08 pm, edited 2 times in total.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
User avatar
psguru
Site Admin
Posts: 2228
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: EBCDIC is not being displayed correctly

Post by psguru »

We use a third-party Hex Editor library, and here's their conversion table:

Code: Select all

const int e2a [256] =
{
//0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
  0,   1,   2,   3, 156,   9, 134, 127, 151, 141, 142,  11,  12,  13,  14,  15,	// 0
 16,  17,  18,  19, 157, 133,   8, 135,  24,  25, 146, 143,  28,  29,  30,  31,	// 1
128, 129, 130, 131, 132,  10,  23,  27, 136, 137, 138, 139, 140,   5,   6,   7,	// 2
144, 145,  22, 147, 148, 149, 150,   4, 152, 153, 154, 155,  20,  21, 158,  26,	// 3
' ', 160, 161, 162, 163, 164, 165, 166, 167, 168,  91, '.', '<', '(', '+',  33,	// 4
'&', 169, 170, 171, 172, 173, 174, 175, 176, 177,  93, '$', '*', ')', ';',  94,	// 5
'-', '/', 178, 179, 180, 181, 182, 183, 184, 185, 124, ',', '%',  95, '>', '?',	// 6
186, 187, 188, 189, 190, 191, 192, 193, 194,  96, ':', '#', '@',  39, '=',  34,	// 7
195, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 196, 197, 198, 199, 200, 201,	// 8
202, 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 203, 204, 205, 206, 207, 208,	// 9
209, 126, 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 210, 211, 212, 213, 214, 215,	// A
216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231,	// B
123, 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 232, 233, 234, 235, 236, 237,	// C
125, 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 238, 239, 240, 241, 242, 243,	// D
 92, 159, 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 244, 245, 246, 247, 248, 249,	// E
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 250, 251, 252, 253, 254, 255	// F
};
So yes, 5A is converted to ASCII code 93, which is the closing bracket. We can change it to '!' but there may be other problems in this table, so perhaps you could take a look.

As for the address column, it's the standard address length for 64 bits. A 32-bit build has half of this length.
psguru
PrestoSoft
David B. Trout
Full Member
Posts: 28
Joined: Wed Jan 06, 2010 4:21 am

Re: EBCDIC is not being displayed correctly

Post by David B. Trout »

psguru wrote: Thu Jan 05, 2023 3:25 pm We use a third-party Hex Editor library, and here's their conversion table:
Just out of curiosity, do they document where THEY got it from?

psguru wrote: Thu Jan 05, 2023 3:25 pm We can change it to '!' but there may be other problems in this table, so perhaps you could take a look.
I do not wish to be rude, but is there a reason why you guys can't do that? I posted the URLs to the official CP037 table, which is the one you should be using IMO:

..... https://www.kreativekorp.com/charset/encoding/CP037/
..... https://en.wikipedia.org/wiki/Code_page_37

Nevertheless, I shall take a look at it myself and will let you know which code points appear to be incorrect. Thanks.

psguru wrote: Thu Jan 05, 2023 3:25 pm As for the address column, it's the standard address length for 64 bits. A 32-bit build has half of this length.
Duh! :P

My point was that you do not need such a wide file offset column since the size of the files being compared are highly unlikely to be so large as to need it.

Regardless of the size of the host operating system or its file system (32 vs. 64), the size of the FILES being compared are almost always going to be much LESS than 4GB. So there's no need to have such a wide file offset column. The width of the file offset column is dependent on the size of the files being compared, not on the bitness (size) of the host operating system or file system on which the files reside.

So IMHO, the default should be to use only a 32-bit file offset column width, and only switch to a 64-bit file offset column width if/when such is actually needed. (which in my opinion is likely to be never)

Does that make sense now?

In any case, I thank you for your response. I really appreciate it. I will post my analysis of your "e2a" table in a few minutes.

Thanks.
Last edited by David B. Trout on Wed Mar 01, 2023 8:03 pm, edited 1 time in total.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
User avatar
psguru
Site Admin
Posts: 2228
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: EBCDIC is not being displayed correctly

Post by psguru »

Code: Select all

I do not wish to be rude, but is there a reason why you guys can't do that?
Because it's not something we know well. We did look at the web resources, and they seem to be not very clear, at leas with our level of knowledge of EBCDIC encoding.

Code: Select all

So IMHO, the default should be to use only a 32-bit file offset column width, and only switch to a 64-bit file offset column width if/when such is actually needed. (which in my opinion is likely to be never)
Unfortunately, the code in the library is not easy to change in this area, so it's likely to stay as is.

Code: Select all

In any case, I thank you for your response. I really appreciate it. I will post my analysis of your "e2a" table in a few minutes.
Thank you.
psguru
PrestoSoft
David B. Trout
Full Member
Posts: 28
Joined: Wed Jan 06, 2010 4:21 am

Re: EBCDIC is not being displayed correctly

Post by David B. Trout »

Nevertheless, I shall take a look at it myself and will let you know which code points appear to be incorrect. Thanks.
I will post my analysis of your "e2a" table in a few minutes.

I THINK I FOUND THE PROBLEM!

EDP appears to be using code page 500! (not 37):

(https://www.kreativekorp.com/charset/encoding/CP500/)
(https://en.wikipedia.org/wiki/Code_page ... code_pages):
Code page 500, known as "International EBCDIC", "International Latin-1" or "International Number 5", is the other major EBCDIC encoding for the ISO/IEC 8859-1 repertoire. It is used in Belgium, Switzerland and on AS/400 systems in Canada. It is related to code page 37 and has the same repertoire, but differs in seven positions; in particular, it encodes [ and ] at 4A hex and 5A hex respectively, which are used for the cent sign (¢) and exclamation point (!) in code page 37. The caret (^) is also encoded at 5F hex, similarly to code page 1047. The ¢ is encoded at B0 hex, the ¬ at BA hex, the ! at 4F hex and the pipe character (|) at BB hex.
Which exactly matches the translation table you posted.


BUT... according to Wikipedia, code page 37 is actually the most used and best supported EBCDIC code page in the world:

(https://www.kreativekorp.com/charset/encoding/CP037/)
(https://en.wikipedia.org/wiki/Code_page_37):
Code page 37 is one of the most-used and best-supported EBCDIC code pages. It is used as the default z/OS code page in the United States and other English speaking countries. It is considered the "required" EBCDIC code page for the United States, and also used in Australia, New Zealand, the Netherlands, Portugal and Brazil, and on ESA/390 systems in Canada, but not on Canadian AS/400 systems, which use Code page 500 instead. It is one of four EBCDIC code pages (alongside 500, 875 and 1026) with mapping data supplied by Microsoft to the Unicode Consortium, and one of seven (alongside 273, 424, 500, 875, 1026 and 1140) supported by Python as standard.

So in my opinion the default table that EDP should be using should be 37 (not 500), and you should provide an option (two radio buttons?) to allow the user to choose which code page they prefer (37 or 500).

Doing that would provide EDP with the widest compatibility range possible, and should make the largest number of customers happy: those who prefer code page 500 and those who, like me, prefer code page 37 (the most widely used and best supported EBCDIC code page in the world).

Is there any chance of that maybe happening at some point in the future? I'm a Windows C/C++ GUI programmer myself, and the change in my experience seems in all honesty to to be fairly simple and straightforward.

Thank you for listening, and thank you for considering this change (bug fix?) request! :D
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
MSpagni
Expert Member
Posts: 537
Joined: Mon Mar 30, 2009 12:53 am
Location: Italy

Re: EBCDIC is not being displayed correctly

Post by MSpagni »

It is one of four EBCDIC code pages (alongside 500, 875 and 1026) with mapping data supplied by Microsoft to the Unicode Consortium, and one of seven (alongside 273, 424, 500, 875, 1026 and 1140) supported by Python as standard.
Wow! The best to create a mess, I think. :D

EBCDIC... And you call me archaic! :lol:

I agree with David: what's the use of so many digits for the file offset? A lot of screen real estate is wasted.
(N.B. I use most often the 32 bitter version, so I'm not particularly concerned with this problem, but anyway...)
User avatar
psguru
Site Admin
Posts: 2228
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: EBCDIC is not being displayed correctly

Post by psguru »

We'll add the following requests to the list of planned features:

Binary comparison improvements
  • Ability to switch between code pages 500 and 37 fro EBCDIC encoding
  • Reduce the size of the address column
psguru
PrestoSoft
David B. Trout
Full Member
Posts: 28
Joined: Wed Jan 06, 2010 4:21 am

Re: EBCDIC is not being displayed correctly

Post by David B. Trout »

psguru wrote: Fri Jan 06, 2023 11:17 am We'll add the following requests to the list of planned features:

Binary comparison improvements
Ability to switch between code pages 500 and 37 fro EBCDIC encoding
Reduce the size of the address column
THANK YOU!! :D

You guys are the greatest!

EDP totally rocks!

(And it keeps getting better!) :D :D
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
User avatar
psguru
Site Admin
Posts: 2228
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: EBCDIC is not being displayed correctly

Post by psguru »

Could you please provide some sample files that encoded with code page 37? And maybe a couple of code page 500 files?
psguru
PrestoSoft
David B. Trout
Full Member
Posts: 28
Joined: Wed Jan 06, 2010 4:21 am

Re: EBCDIC is not being displayed correctly

Post by David B. Trout »

.
Just create a 256-byte binary file contains the values hex 00 to hex FF.

Then translate it from code page 37 to ASCII, and display/dump the results.

Then do the exact same thing, but translate from code page 500 to ASCII.

Then simply eyeball each result to make sure each byte was translated correctly.

Then do the same thing, but in reverse: translate the same hex 00 to hex FF table from ASCII to code page 37, then to code page 500, and display/dump the results, and compare (eyeball) each against the ASCII output of the first test to, again, make sure things are being translated properly.

I'm sure I could certainly sit down and write such a program if I had the time to do so, but I'm not grasping why I'm the one that needs to do it.

I understand that it is me that is requesting the change to EDP, but if you're already in the process of adding code to your product to perform such translations, wouldn't it then be trivially easy to test such code using the technique I described?

(sigh) Give me some time and I will try to create some test files for you. :(

In the mean time (in the interim), while you are waiting for me, please give the hex 00 to hex FF table a try. I'm sure it should work just as well as any test file I could provide to you.

I do appreciate that you are making this change for me! Thank you for that! :D
.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
David B. Trout
Full Member
Posts: 28
Joined: Wed Jan 06, 2010 4:21 am

Re: EBCDIC is not being displayed correctly

Post by David B. Trout »

.
Here are two test files for you:
.
cp037-cp500.zip
Two test files, one encoded in CP037, the other in CP500.
(1.27 KiB) Downloaded 243 times
.
I hope that helps!
.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
User avatar
psguru
Site Admin
Posts: 2228
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: EBCDIC is not being displayed correctly

Post by psguru »

Thanks. We were thinking about these code pages... Here's an idea: why should EBCDIC files treated as second-class citizens and compared as binary files? Or, for that matter, any other non-Unicode (ANSI) code pages? So one potential approach would be to have an option in EDP to define the default code page (with the default set to the Windows system page, typically 1252 in the US). This would allow, e.g, EBCDIC files to be opened and saved as text files, not as binary. Of course, with the option set to, say, page 37, this will make all "regular" text files look like garbage.

Another (perhaps a future) approach is to specify file's code page in the File Open dialog, to override the default setting. This way you could compare EBCDIC files by setting the page to 37/500 just for them.
psguru
PrestoSoft
David B. Trout
Full Member
Posts: 28
Joined: Wed Jan 06, 2010 4:21 am

Re: EBCDIC is not being displayed correctly

Post by David B. Trout »

psguru wrote: Fri Feb 24, 2023 9:58 amThanks.
You're very welcome. Because I very much want to see this option implemented, I took a break from my busy schedule and decided to try and create the test files for you. (I needed a break anyway.)

It was easier than I expected. I didn't even need to write any program at all! Our existing product (https://en.wikipedia.org/wiki/Hercules_(emulator)) allowed me to create them very quickly and easily. It supports a multitude of different code pages.

FYI: I also have a test file for EBCDIC code page 1047 too, if you're interested in it. It's another semi-popular EBCDIC code page.
psguru wrote: Fri Feb 24, 2023 9:58 am We were thinking about these code pages... Here's an idea: why should EBCDIC files treated as second-class citizens and compared as binary files? Or, for that matter, any other non-Unicode (ANSI) code pages?
Precisely! :D
psguru wrote: Fri Feb 24, 2023 9:58 am So one potential approach would be to have an option in EDP to define the default code page (with the default set to the Windows system page, typically 1252 in the US). This would allow, e.g, EBCDIC files to be opened and saved as text files, not as binary.
That sounds ideal! :D
psguru wrote: Fri Feb 24, 2023 9:58 am Of course, with the option set to, say, page 37, this will make all "regular" text files look like garbage.
True, but that's to be expected. When you save a text file using an EBCDIC code page, "it looks like garbage" when you open it in e.g. Notepad too, because it's not in (Duh!) ASCII. It's in EBCDIC.

But oftentimes one needs to deal with EBCDIC files when working on mainframes. While the mainframes themselves are always EBCDIC, many mainframers use Windows, so when a file is transfered from the mainframe to Windows (and you want the file you receive to be an EXACT copy of what's on the mainframe and thus transfer the file in binary mode), it'd be nice if there was a tool such as EDP that could properly deal with these EBCDIC files. Hence my request.
psguru wrote: Fri Feb 24, 2023 9:58 am Another (perhaps a future) approach is to specify file's code page in the File Open dialog, to override the default setting. This way you could compare EBCDIC files by setting the page to 37/500 just for them.
That would work too.

I REALLY appreciate you guys looking into and seriously considering this request!

EDP ROCKS! :D :D :D
Last edited by David B. Trout on Fri Feb 24, 2023 11:10 am, edited 1 time in total.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
Post Reply