Page 1 of 1

EBCDIC is not being displayed correctly

Posted: Wed Jan 04, 2023 7:49 pm
by David B. Trout
EDP has a Binary Comparison Character set option to display binary data in either EBCDIC or ASCII. As an IBM Mainframe programmer, I use EBCDIC a lot, and noticed some characters are not displaying correctly: :(

edp-ebcdic.png
edp-ebcdic.png (90.46 KiB) Viewed 178 times

On the left is the string: "Success ! CDSG, STPQ and LPQ: OK".
On the right is the string: "Success! CDSG, STPQ and LPQ: OK!".

(A blank was removed before the first exclamation mark and added to the end of the string after "OK".)

As you can see, the exclamation-mark is being incorrectly displayed as a right-square-bracket instead of as an exclamation-mark.

I don't know what Code Page EDP is using, but in the CP037 Code Page (which is the one I would expect to be used), hex 5A is an exclamation mark (ASCII hex 21), not a right square bracket:

* https://www.kreativekorp.com/charset/encoding/CP037/
* https://en.wikipedia.org/wiki/Code_page_37

Can this either be fixed or a new option provided so the user can choose which Code Page they prefer to be used instead of whatever code page EDP is currently using?

Thanks!

Keep up the otherwise good work! :)

Re: EBCDIC is not being displayed correctly

Posted: Wed Jan 04, 2023 7:58 pm
by David B. Trout
.
FYI: Other programs seem to display EBCDIC data just fine:

HXD.png
HXD.png (51.13 KiB) Viewed 177 times
.
hexedit.png
hexedit.png (49.31 KiB) Viewed 177 times

Re: EBCDIC is not being displayed correctly

Posted: Wed Jan 04, 2023 8:15 pm
by David B. Trout
P.S. It would also be nice if the left hand file offset column wasn't so wide too. In the EDP comparison example I posted, the file is only 224 bytes in size. Yet, the left hand file offset column is 16 hexadecimal digits wide!

I seriously doubt anyone would be comparing two 64-petabyte binary files with EDP. :P

IMHO, an 8 character (8 hex digits = 32-bits) wide file offset column should be plenty. :wink:

Re: EBCDIC is not being displayed correctly

Posted: Thu Jan 05, 2023 3:25 pm
by psguru
We use a third-party Hex Editor library, and here's their conversion table:

Code: Select all

const int e2a [256] =
{
//0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
  0,   1,   2,   3, 156,   9, 134, 127, 151, 141, 142,  11,  12,  13,  14,  15,	// 0
 16,  17,  18,  19, 157, 133,   8, 135,  24,  25, 146, 143,  28,  29,  30,  31,	// 1
128, 129, 130, 131, 132,  10,  23,  27, 136, 137, 138, 139, 140,   5,   6,   7,	// 2
144, 145,  22, 147, 148, 149, 150,   4, 152, 153, 154, 155,  20,  21, 158,  26,	// 3
' ', 160, 161, 162, 163, 164, 165, 166, 167, 168,  91, '.', '<', '(', '+',  33,	// 4
'&', 169, 170, 171, 172, 173, 174, 175, 176, 177,  93, '$', '*', ')', ';',  94,	// 5
'-', '/', 178, 179, 180, 181, 182, 183, 184, 185, 124, ',', '%',  95, '>', '?',	// 6
186, 187, 188, 189, 190, 191, 192, 193, 194,  96, ':', '#', '@',  39, '=',  34,	// 7
195, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 196, 197, 198, 199, 200, 201,	// 8
202, 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 203, 204, 205, 206, 207, 208,	// 9
209, 126, 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 210, 211, 212, 213, 214, 215,	// A
216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231,	// B
123, 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 232, 233, 234, 235, 236, 237,	// C
125, 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 238, 239, 240, 241, 242, 243,	// D
 92, 159, 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 244, 245, 246, 247, 248, 249,	// E
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 250, 251, 252, 253, 254, 255	// F
};
So yes, 5A is converted to ASCII code 93, which is the closing bracket. We can change it to '!' but there may be other problems in this table, so perhaps you could take a look.

As for the address column, it's the standard address length for 64 bits. A 32-bit build has half of this length.

Re: EBCDIC is not being displayed correctly

Posted: Thu Jan 05, 2023 4:35 pm
by David B. Trout
psguru wrote: Thu Jan 05, 2023 3:25 pm We use a third-party Hex Editor library, and here's their conversion table:
Just out of curiosity, do they document where THEY got it from?

psguru wrote: Thu Jan 05, 2023 3:25 pm We can change it to '!' but there may be other problems in this table, so perhaps you could take a look.
I do not wish to be rude, but is there a reason why you guys can't do that? I posted the URLs to the official CP037 table, which is the one you should be using IMO:

..... https://www.kreativekorp.com/charset/encoding/CP037/
..... https://en.wikipedia.org/wiki/Code_page_37

Nevertheless, I shall take a look at it myself and will let you know which code points appear to be incorrect. Thanks.

psguru wrote: Thu Jan 05, 2023 3:25 pm As for the address column, it's the standard address length for 64 bits. A 32-bit build has half of this length.
Duh! :P

My point was that you do not need such a wide file offset column since the size of the files being compared are highly unlikely to be so large as to need it.

Regardless of the size of the host operating system or its file system (32 vs. 64), the size of the FILES being compared are almost always going to be much LESS than 4GB. So there's no need to have such a wide file offset column. The width of the file offset column is dependent on the size of the files being compared, not on the bitness (size) of the host operating system or file system on which the files reside.

So IMHO, the default should be to use only a 32-bit file offset column width, and only switch to a 64-bit file offset column width if/when such is actually needed. (which in my opinion is likely to be never)

Does that make sense now?

In any case, I thank you for your response. I really appreciate it. I will post my analysis of your "e2a" table in a few minutes.

Thanks.

Re: EBCDIC is not being displayed correctly

Posted: Thu Jan 05, 2023 4:58 pm
by psguru

Code: Select all

I do not wish to be rude, but is there a reason why you guys can't do that?
Because it's not something we know well. We did look at the web resources, and they seem to be not very clear, at leas with our level of knowledge of EBCDIC encoding.

Code: Select all

So IMHO, the default should be to use only a 32-bit file offset column width, and only switch to a 64-bit file offset column width if/when such is actually needed. (which in my opinion is likely to be never)
Unfortunately, the code in the library is not easy to change in this area, so it's likely to stay as is.

Code: Select all

In any case, I thank you for your response. I really appreciate it. I will post my analysis of your "e2a" table in a few minutes.
Thank you.

Re: EBCDIC is not being displayed correctly

Posted: Thu Jan 05, 2023 6:55 pm
by David B. Trout
Nevertheless, I shall take a look at it myself and will let you know which code points appear to be incorrect. Thanks.
I will post my analysis of your "e2a" table in a few minutes.

I THINK I FOUND THE PROBLEM!

EDP appears to be using code page 500! (not 37):

(https://www.kreativekorp.com/charset/encoding/CP500/)
(https://en.wikipedia.org/wiki/Code_page ... code_pages):
Code page 500, known as "International EBCDIC", "International Latin-1" or "International Number 5", is the other major EBCDIC encoding for the ISO/IEC 8859-1 repertoire. It is used in Belgium, Switzerland and on AS/400 systems in Canada. It is related to code page 37 and has the same repertoire, but differs in seven positions; in particular, it encodes [ and ] at 4A hex and 5A hex respectively, which are used for the cent sign (¢) and exclamation point (!) in code page 37. The caret (^) is also encoded at 5F hex, similarly to code page 1047. The ¢ is encoded at B0 hex, the ¬ at BA hex, the ! at 4F hex and the pipe character (|) at BB hex.
Which exactly matches the translation table you posted.


BUT... according to Wikipedia, code page 37 is actually the most used and best supported EBCDIC code page in the world:

(https://www.kreativekorp.com/charset/encoding/CP037/)
(https://en.wikipedia.org/wiki/Code_page_37):
Code page 37 is one of the most-used and best-supported EBCDIC code pages. It is used as the default z/OS code page in the United States and other English speaking countries. It is considered the "required" EBCDIC code page for the United States, and also used in Australia, New Zealand, the Netherlands, Portugal and Brazil, and on ESA/390 systems in Canada, but not on Canadian AS/400 systems, which use Code page 500 instead. It is one of four EBCDIC code pages (alongside 500, 875 and 1026) with mapping data supplied by Microsoft to the Unicode Consortium, and one of seven (alongside 273, 424, 500, 875, 1026 and 1140) supported by Python as standard.

So in my opinion the default table that EDP should be using should be 37 (not 500), and you should provide an option (two radio buttons?) to allow the user to choose which code page they prefer (37 or 500).

Doing that would provide EDP with the widest compatibility range possible, and should make the largest number of customers happy: those who prefer code page 500 and those who, like me, prefer code page 37 (the most widely used and best supported EBCDIC code page in the world).

Is there any chance of that maybe happening at some point in the future? I'm a Windows C/C++ GUI programmer myself, and the change in my experience seems in all honesty to to be fairly simple and straightforward.

Thank you for listening, and thank you for considering this change (bug fix?) request! :D

Re: EBCDIC is not being displayed correctly

Posted: Fri Jan 06, 2023 2:42 am
by MSpagni
It is one of four EBCDIC code pages (alongside 500, 875 and 1026) with mapping data supplied by Microsoft to the Unicode Consortium, and one of seven (alongside 273, 424, 500, 875, 1026 and 1140) supported by Python as standard.
Wow! The best to create a mess, I think. :D

EBCDIC... And you call me archaic! :lol:

I agree with David: what's the use of so many digits for the file offset? A lot of screen real estate is wasted.
(N.B. I use most often the 32 bitter version, so I'm not particularly concerned with this problem, but anyway...)

Re: EBCDIC is not being displayed correctly

Posted: Fri Jan 06, 2023 11:17 am
by psguru
We'll add the following requests to the list of planned features:

Binary comparison improvements
  • Ability to switch between code pages 500 and 37 fro EBCDIC encoding
  • Reduce the size of the address column

Re: EBCDIC is not being displayed correctly

Posted: Fri Jan 06, 2023 8:23 pm
by David B. Trout
psguru wrote: Fri Jan 06, 2023 11:17 am We'll add the following requests to the list of planned features:

Binary comparison improvements
Ability to switch between code pages 500 and 37 fro EBCDIC encoding
Reduce the size of the address column
THANK YOU!! :D

You guys are the greatest!

EDP totally rocks!

(And it keeps getting better!) :D :D