CHD Disk Image Format (MAME)

Post by **nocash** » September 10th, 2022, 3:14 am

null wrote: ↑September 9th, 2022, 4:04 am Oh did you have browser problems with GitHub?

Always.

  v0.146 V5  CHT2  bad ;\says output file already exists (crashes on -f force)
  v0.150 V5  CHT2  bad ;/
  v0.159 V5  CHT2  bad ;-crashes instantly
  v0.174 V5  CHT2  bad ;\missing KERNEL32.DLL:AddVectoredExceptionHandler
  v0.217 V5  CHT2  bad ;/
  v0.218 V5  CHT2  ?   ;-64bit?
  v0.246 V5  CHT2  bad ;-64bit? ;-requires "newer version of windows"

I am afraid that V5 is seamlessly switching from one problem to another.
But maybe something between v0.150 and v0.159 could work.
Or something between v0.159 and v0.174 (you could narrow those down with a hex editor and searching for the first/last version with/without string "AddVectoredExceptionHandler").
Having v0.218 would be neat, too, I guess it won't help for anything, but it would be nice to see if it's giving the same error message as v0.246.

I've got the "cdzl" compressed file working, too. It does have ECC bytes filtered out (which is good), and some sectors do also have the 12-byte sync mark filtered out... but most sectors do contain the original unfiltered sync-marks (looks like a bug in the compressor), the sector header with MM:SS:FF:Mode bytes isn't filtered at all. It doesn't waste too much memory, but the sectors could have been a few bytes smaller with better filtering.

I'll have to learn about LZMA and FLAC decompression next. And create some MODE1/MODE2 toc/bin images with different sector sizes for examining the old CHCD metadata.

The rust source code looks nice. The comments are neat, and the source looks more compact, I haven't looked too close at the actual code and don't know yet if I can understand the rust language. Then on the other hand, I am not really familar with c languages either.

For the CHCD stuff, the rust metadata.rs file assingns "CdRomOld = make_tag(b"CHCD")" but I can't see any code that is handling that CdRomOld type (maybe it's unsupported?) but I haven't found any code that handles the newer CDT2 stuff with things like PREGAP either (and I guess it must support that somewhere?).

Post by **null** » September 10th, 2022, 12:08 pm

ZIP contains:
CHDMAN v0.151 - v0.158, v0.160 - v0.173, v0.218, v0.247

Post by **nocash** » September 10th, 2022, 6:40 pm

Note to self: github "raw" pages are still working with old browsers, for lzma that would be...
https://raw.githubusercontent.com/ilyak ... ic.x86.asm
https://raw.githubusercontent.com/upx/u ... 6/lzma_d.S

Post by **nocash** » September 11th, 2022, 5:19 am

Okay, there's really no working V5 version. They are all having the same problems. Interestingly, "crashes instantly" was only temporariiy implemented in v0.155-v0.160, and v0.161 did then fall back to the old "file already exists" feature.

Code: Select all

  v0.146 V5  CHT2  bad ;\says output file already exists (crashes on -f force)
  v0.154 V5  CHT2  bad ;/
  v0.155 V5  CHT2  bad ;\crashes instantly (shortly before CreateEventW)
  v0.160 V5  CHT2  bad ;/
  v0.161 V5  CHT2  bad ;\says output file already exists (crashes on -f force)
  v0.169 V5  CHT2  bad ;/
  v0.170 V5  CHT2  bad ;\missing KERNEL32.DLL:AddVectoredExceptionHandler
  v0.217 V5  CHT2  bad ;/
  v0.218 V5  CHT2  bad ;\requires "newer version of windows" (64bit)
  v0.247 V5  CHT2  bad ;/

I am still struggling with getting an idea how LZMA is working.

The upx asm source code doesn't contain any asm source code at all (apart from a series of "push" instructions). It's also using two include files, which look like weird look-up tables... or maybe they are meant to contain the actual 80x86 decompression code in binary form.

The static asm source code is mean. It does (almost) look like a compact simple decompression function... but it contains several nonsense immediates (like 0x12345678 and 0x55), I guess one is supposed to patch the binary and replace those values by whatever parameters... but the comments and documentation don't tell what is to be replaced by what.

Post by **null** » September 12th, 2022, 8:09 pm

Okay I found one that works on WinXP 32bit. I don't know if it works on Win98.

It's also working on my Win7 32bit.

This build of CHDMAN is unofficial because MAME stops supporting OS from Windows 7 32bit and below.

This build is compiled by Retro Danuart.

ZIP contains:
Unofficial CHDMAN v0.247

Post by **null** » September 13th, 2022, 2:56 am

nocash wrote: ↑September 11th, 2022, 5:19 am The upx asm source code doesn't contain any asm source code at all.

Did you checked this one?
http://raw.githubusercontent.com/upx/up ... zma_d_cn.S

nocash wrote: ↑September 11th, 2022, 5:19 am The static asm source code is mean. It does (almost) look like a compact simple decompression function... but the comments and documentation don't tell what is to be replaced by what.

Yup, there's no proper documentation and comments.

The micro-lzmadec is designed to be compact at the cost of speed. While the upx one, is more focus on its speed at the cost of its size. I don't know if there's a way combine those contrasting features.

Post by **nocash** » September 13th, 2022, 8:09 am

Thanks for the unofficial chdman build... but it doesn't work : \
It's just doing the missing KERNEL32.DLL:AddVectoredExceptionHandler thing.

null wrote: ↑September 13th, 2022, 2:56 am Did you checked this one?
http://raw.githubusercontent.com/upx/up ... zma_d_cn.S

Yes, that's the problem. It's definitely not source code.

But I have mostly figured out how to use the micro/static source code.
The five "_rel_xxx" labels are meant to indicate the patch locations (whereas. the labels are pointing to the next line after the patch location).
The comments at the begin of the source do decribe how to obtain four of the five the patch values (code, lc, lp, pb). The only unknown value is "tsize", it's the size/2 of the Temp buffer, but I don't how to obtain the required size... it might be getting more obvious when gazing at the lzip manual.

Yeah, it's size-optimized, I don't know if that's making things significantly faster or slower (the basic rule is "small=fast", except one could use macros and perhaps some look-up tables here or there).
And I'll remove some of the trickery like POP+PUSH and CALL+POP, I guess the speed will be about same, but the source will be easier to understand without that stuff.
For now I'll be happy to get it working... and then check if it's fast enough.

Post by **nocash** » September 16th, 2022, 10:37 pm

I got the LZMA decompression working, I can now decompress .lzma files (lzma_alone), .lz files (lzip), and the dball.chd file from chdman v0.146.

The micro lzma asm code was really helpful, although it did include some obstacles:

The secret formula for computing tsize is "tsize=(300h shl (lc+lp))+800h", and it is hidden in the "test_static.c" file (probably the last place where one would look at when writing asm code).

The _rel_lp and _rel_pb variables are slightly misnamed, lp_mask and pb_mask would be more appropriate (because they contain masks computed from the actual lp and pb values).

The "cdq" opcodes are sign-expanding eax to edx:eax, after lots of reasearch, I think that the source code is always intending to use them with positive numbers in eax, so "cdq" is used as a 1-byte opcode to set edx=0, that would be a really neat programming trick - if it were done with a comment saying ";edx=0". The most confusing trickery is this part:

Code: Select all

        call    _rc_bit
        ...
        call    _rc_norm
        ...
        cdq             ; Align

The "Align" comment is clearly suggesting that cdq is used to compute an alignment offset.
But... the "_rc_bit" function does always return eax=positive, so cdq will simply set edx=0.
But...... the " _rc_norm" function (in the 32bit source code version) does occassionally set eax=source address.
I guess that's done unintentionally, although it might actually work "stable" in practice (assuming that most OSes may tend to allocate memory in lower half of the 4GB memory space, or at least the OS might do so if there isn't any other already allocated memory occupying that region).

And then there are bunch of LZMA variants (with different headers, optional end codes, and trailing dummy bytes), the asm code lacks support (and comments/cautions) for those things:
It's apparently expecting lzma_alone headers (lzip and chd don't have such headers, and they do both use hardcoded lc=3, lp=0, pb=2).
It's always expecting an EOS end code (chd files and (some) lzma_alone sample files in the LZMA SDK don't have that EOS end code, they do instead require to check the destination end address).
LZMA seems to be always doing the "normalization" after reading from the bitstream, that means the compressed data will include a trailing dummy byte (if the normalization happens to fetch an unused extra byte after the last compression code). When needing to know the exact end of the compressed data, one would need to issue a final "call _rc_norm" after decompression (in case of .lzma files, one would additionally need to skip the optional end code, if it's there) (in case of .chd files, dball.chd does have those trailing dummy bytes after some sectors, but one could somewhat ignore them because chd is also containing a compressed size entry, which could be used to find the start of the compressed subchannels).

One oddity in the LZMA format itself is that it does always start with an "ignored" byte (which seems to be usually/always 00h). I've no idea if that's officially documented anywhere, and if it's some kind of alignment padding, version number, reserved for future, optional flags, design mistake... or whatever. It's basically just wasting one byte of memory... but it's apparently always included in the compressed data (.lzma and .lz and .chd are all having those leading ignored bytes).

Oh, and I've roughly figured out what LZMA is doing... I would describe it as so:

LZMA is combining LZ+Huffman+Probabilities. The LZ+Huffman bitstream is rather simple (using hardcoded huffman trees), the high compression ratio is reached by predicting probabilities for the bitstream values (that is, the final compressed data is smaller than the bitstream).

Post by **null** » September 17th, 2022, 12:11 am

Nice, thanks for these great analysis!!

There are news about the future of CHDv6 but they didn't implemented it yet. It's been in hiatus for 2 years now, R. Belmont already confirmed this. But if you also want to document it I'll put it here.

http://github.com/mamedev/mame/issues/7402

-claunia is going to collaborate in making the new chdman.
-They are planning to implement the Zstandard and LZMA2 algorithm. Zstd is made by a Facebook employee, compression ratio is comparable to DEFLATE but faster decompression. LZMA2 the successor of LZMA, but the compression is worse than LZMA for bigger files.
-Support for merging multiple disc games GameCube, PSX, etc.

Post by **null** » September 18th, 2022, 3:11 am

CHD samples for every type of compressions used in the latest CHDMAN v0.247.

Code: Select all

CHDMAN v0.247
default - default compression
default - CREATECD
D:\CHDMAN>chdman createcd -i DBALL.cue -o DBALL.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Output CHD:   DBALL.chd
Input file:   DBALL.cue
Input tracks: 1
Input length: 00:07:58
Compression:  cdlz (CD LZMA), cdzl (CD Deflate), cdfl (CD FLAC)
Logical size: 1,429,632
Compression complete ... final ratio = 17.6%
default - INFO
D:\CHDMAN>chdman info -i DBALL.chd -v
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Input file:   DBALL.chd
File Version: 5
Logical size: 1,429,632 bytes
Hunk Size:    19,584 bytes
Total Hunks:  73
Unit Size:    2,448 bytes
Total Units:  584
Compression:  cdlz (CD LZMA), cdzl (CD Deflate), cdfl (CD FLAC)
CHD size:     252,104 bytes
Ratio:        17.6%
SHA1:         557b44e9efc6a2740b906876614d0b17dd183d4f
Data SHA1:    6753a724b5c9b70be1220a867d2d2d57856b536f
Metadata:     Tag='CHT2'  Index=0  Length=90 bytes
              TRACK:1 TYPE:MODE2_RAW SUBTYPE:NONE FRAMES:583 PREGAP:0 PGTYPE:MODE1 PGSUB:NONE POSTGAP:0.

     Hunks  Percent  Name
----------  -------  ------------------------------------
        65    89.0%  CD LZMA
         8    11.0%  CD Deflate
none - uncompressed
none - CREATECD
D:\CHDMAN>chdman createcd -c none -i DBALL.cue -o DBALL.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Output CHD:   DBALL.chd
Input file:   DBALL.cue
Input tracks: 1
Input length: 00:07:58
Compression:  none
Logical size: 1,429,632
Compression complete ... final ratio = 100.0%
none - INFO
D:\CHDMAN>chdman info -i DBALL.chd -v
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Input file:   DBALL.chd
File Version: 5
Logical size: 1,429,632 bytes
Hunk Size:    19,584 bytes
Total Hunks:  73
Unit Size:    2,448 bytes
Total Units:  584
Compression:  none
CHD size:     1,449,216 bytes
Metadata:     Tag='CHT2'  Index=0  Length=90 bytes
              TRACK:1 TYPE:MODE2_RAW SUBTYPE:NONE FRAMES:583 PREGAP:0 PGTYPE:MODE1 PGSUB:NONE POSTGAP:0.

     Hunks  Percent  Name
----------  -------  ------------------------------------
        73   100.0%  Uncompressed
zlib
zlib - CREATECD
D:\CHDMAN>chdman createcd -c zlib -i DBALL.cue -o DBALL.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Output CHD:   DBALL.chd
Input file:   DBALL.cue
Input tracks: 1
Input length: 00:07:58
Compression:  zlib (Deflate)
Logical size: 1,429,632
Compression complete ... final ratio = 19.4%
zlib - INFO
D:\CHDMAN>chdman info -i DBALL.chd -v
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Input file:   DBALL.chd
File Version: 5
Logical size: 1,429,632 bytes
Hunk Size:    19,584 bytes
Total Hunks:  73
Unit Size:    2,448 bytes
Total Units:  584
Compression:  zlib (Deflate)
CHD size:     277,522 bytes
Ratio:        19.4%
SHA1:         557b44e9efc6a2740b906876614d0b17dd183d4f
Data SHA1:    6753a724b5c9b70be1220a867d2d2d57856b536f
Metadata:     Tag='CHT2'  Index=0  Length=90 bytes
              TRACK:1 TYPE:MODE2_RAW SUBTYPE:NONE FRAMES:583 PREGAP:0 PGTYPE:MODE1 PGSUB:NONE POSTGAP:0.

     Hunks  Percent  Name
----------  -------  ------------------------------------
        73   100.0%  Deflate
lzma
lzma - CREATECD
D:\CHDMAN>chdman createcd -c lzma -i DBALL.cue -o DBALL.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Output CHD:   DBALL.chd
Input file:   DBALL.cue
Input tracks: 1
Input length: 00:07:58
Compression:  lzma (LZMA)
Logical size: 1,429,632
Compression complete ... final ratio = 17.7%
lzma - INFO
D:\CHDMAN>chdman info -i DBALL.chd -v
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Input file:   DBALL.chd
File Version: 5
Logical size: 1,429,632 bytes
Hunk Size:    19,584 bytes
Total Hunks:  73
Unit Size:    2,448 bytes
Total Units:  584
Compression:  lzma (LZMA)
CHD size:     253,836 bytes
Ratio:        17.8%
SHA1:         557b44e9efc6a2740b906876614d0b17dd183d4f
Data SHA1:    6753a724b5c9b70be1220a867d2d2d57856b536f
Metadata:     Tag='CHT2'  Index=0  Length=90 bytes
              TRACK:1 TYPE:MODE2_RAW SUBTYPE:NONE FRAMES:583 PREGAP:0 PGTYPE:MODE1 PGSUB:NONE POSTGAP:0.

     Hunks  Percent  Name
----------  -------  ------------------------------------
        73   100.0%  LZMA
flac
flac - CREATECD
D:\CHDMAN>chdman createcd -c flac -i DBALL.cue -o DBALL.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Output CHD:   DBALL.chd
Input file:   DBALL.cue
Input tracks: 1
Input length: 00:07:58
Compression:  flac (FLAC)
Logical size: 1,429,632
Compression complete ... final ratio = 53.6%
flac - INFO
D:\CHDMAN>chdman info -i DBALL.chd -v
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Input file:   DBALL.chd
File Version: 5
Logical size: 1,429,632 bytes
Hunk Size:    19,584 bytes
Total Hunks:  73
Unit Size:    2,448 bytes
Total Units:  584
Compression:  flac (FLAC)
CHD size:     770,345 bytes
Ratio:        53.9%
SHA1:         557b44e9efc6a2740b906876614d0b17dd183d4f
Data SHA1:    6753a724b5c9b70be1220a867d2d2d57856b536f
Metadata:     Tag='CHT2'  Index=0  Length=90 bytes
              TRACK:1 TYPE:MODE2_RAW SUBTYPE:NONE FRAMES:583 PREGAP:0 PGTYPE:MODE1 PGSUB:NONE POSTGAP:0.

     Hunks  Percent  Name
----------  -------  ------------------------------------
        73   100.0%  FLAC
huff - huffman
huff - CREATECD
D:\CHDMAN>chdman createcd -c huff -i DBALL.cue -o DBALL.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Output CHD:   DBALL.chd
Input file:   DBALL.cue
Input tracks: 1
Input length: 00:07:58
Compression:  huff (Huffman)
Logical size: 1,429,632
Compression complete ... final ratio = 100.0%
huff - INFO
D:\CHDMAN>chdman info -i DBALL.chd -v
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Input file:   DBALL.chd
File Version: 5
Logical size: 1,429,632 bytes
Hunk Size:    19,584 bytes
Total Hunks:  73
Unit Size:    2,448 bytes
Total Units:  584
Compression:  huff (Huffman)
CHD size:     556,819 bytes
Ratio:        38.9%
SHA1:         557b44e9efc6a2740b906876614d0b17dd183d4f
Data SHA1:    6753a724b5c9b70be1220a867d2d2d57856b536f
Metadata:     Tag='CHT2'  Index=0  Length=90 bytes
              TRACK:1 TYPE:MODE2_RAW SUBTYPE:NONE FRAMES:583 PREGAP:0 PGTYPE:MODE1 PGSUB:NONE POSTGAP:0.

     Hunks  Percent  Name
----------  -------  ------------------------------------
        73   100.0%  Huffman
cdzl - cd deflate
cdzl - CREATECD
D:\CHDMAN>chdman createcd -c cdzl -i DBALL.cue -o DBALL.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Output CHD:   DBALL.chd
Input file:   DBALL.cue
Input tracks: 1
Input length: 00:07:58
Compression:  cdzl (CD Deflate)
Logical size: 1,429,632
Compression complete ... final ratio = 19.2%
cdzl - INFO
D:\CHDMAN>chdman info -i DBALL.chd -v
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Input file:   DBALL.chd
File Version: 5
Logical size: 1,429,632 bytes
Hunk Size:    19,584 bytes
Total Hunks:  73
Unit Size:    2,448 bytes
Total Units:  584
Compression:  cdzl (CD Deflate)
CHD size:     275,396 bytes
Ratio:        19.3%
SHA1:         557b44e9efc6a2740b906876614d0b17dd183d4f
Data SHA1:    6753a724b5c9b70be1220a867d2d2d57856b536f
Metadata:     Tag='CHT2'  Index=0  Length=90 bytes
              TRACK:1 TYPE:MODE2_RAW SUBTYPE:NONE FRAMES:583 PREGAP:0 PGTYPE:MODE1 PGSUB:NONE POSTGAP:0.

     Hunks  Percent  Name
----------  -------  ------------------------------------
        73   100.0%  CD Deflate
cdlz - cd lzma
cdlz - CREATECD
D:\CHDMAN>chdman createcd -c cdlz -i DBALL.cue -o DBALL.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Output CHD:   DBALL.chd
Input file:   DBALL.cue
Input tracks: 1
Input length: 00:07:58
Compression:  cdlz (CD LZMA)
Logical size: 1,429,632
Compression complete ... final ratio = 17.6%
cdlz - INFO
D:\CHDMAN>chdman info -i DBALL.chd -v
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Input file:   DBALL.chd
File Version: 5
Logical size: 1,429,632 bytes
Hunk Size:    19,584 bytes
Total Hunks:  73
Unit Size:    2,448 bytes
Total Units:  584
Compression:  cdlz (CD LZMA)
CHD size:     252,156 bytes
Ratio:        17.6%
SHA1:         557b44e9efc6a2740b906876614d0b17dd183d4f
Data SHA1:    6753a724b5c9b70be1220a867d2d2d57856b536f
Metadata:     Tag='CHT2'  Index=0  Length=90 bytes
              TRACK:1 TYPE:MODE2_RAW SUBTYPE:NONE FRAMES:583 PREGAP:0 PGTYPE:MODE1 PGSUB:NONE POSTGAP:0.

     Hunks  Percent  Name
----------  -------  ------------------------------------
        73   100.0%  CD LZMA
cdfl - cd flac
cdfl - CREATECD
D:\CHDMAN>chdman createcd -c cdfl -i DBALL.cue -o DBALL.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Output CHD:   DBALL.chd
Input file:   DBALL.cue
Input tracks: 1
Input length: 00:07:58
Compression:  cdfl (CD FLAC)
Logical size: 1,429,632
Compression complete ... final ratio = 54.7%
cdfl - INFO
D:\CHDMAN>chdman info -i DBALL.chd -v
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Input file:   DBALL.chd
File Version: 5
Logical size: 1,429,632 bytes
Hunk Size:    19,584 bytes
Total Hunks:  73
Unit Size:    2,448 bytes
Total Units:  584
Compression:  cdfl (CD FLAC)
CHD size:     785,934 bytes
Ratio:        55.0%
SHA1:         557b44e9efc6a2740b906876614d0b17dd183d4f
Data SHA1:    6753a724b5c9b70be1220a867d2d2d57856b536f
Metadata:     Tag='CHT2'  Index=0  Length=90 bytes
              TRACK:1 TYPE:MODE2_RAW SUBTYPE:NONE FRAMES:583 PREGAP:0 PGTYPE:MODE1 PGSUB:NONE POSTGAP:0.

     Hunks  Percent  Name
----------  -------  ------------------------------------
        73   100.0%  CD FLAC
avhu - a/v huffman
avhu - CREATECD
D:\CHDMAN>chdman createcd -c avhu -i DBALL.cue -o DBALL.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Output CHD:   DBALL.chd
Input file:   DBALL.cue
Input tracks: 1
Input length: 00:07:58
Compression:  avhu (A/V Huffman)
Logical size: 1,429,632
Compression complete ... final ratio = 100.0%
avhu - INFO
D:\CHDMAN>chdman info -i DBALL.chd -v
chdman - MAME Compressed Hunks of Data (CHD) manager 0.247 (mame0247)
Input file:   DBALL.chd
File Version: 5
Logical size: 1,429,632 bytes
Hunk Size:    19,584 bytes
Total Hunks:  73
Unit Size:    2,448 bytes
Total Units:  584
Compression:  avhu (A/V Huffman)
CHD size:     1,430,031 bytes
Ratio:        100.0%
SHA1:         557b44e9efc6a2740b906876614d0b17dd183d4f
Data SHA1:    6753a724b5c9b70be1220a867d2d2d57856b536f
Metadata:     Tag='CHT2'  Index=0  Length=90 bytes
              TRACK:1 TYPE:MODE2_RAW SUBTYPE:NONE FRAMES:583 PREGAP:0 PGTYPE:MODE1 PGSUB:NONE POSTGAP:0.

     Hunks  Percent  Name
----------  -------  ------------------------------------
        73   100.0%  Uncompressed

Post by **nocash** » September 19th, 2022, 10:26 am

Cool, I was almost about to ask if you could make that kind of test files.
I guess things like AVHUFF and raw FLAC are never used for CDROMs, but it'd be nice to get them supported & documented for completeness.

The older V3/V4 format did apparently support five compression methods: NONE, ZLIB, ZLIB+, AV, and a secondary method that is "usually FLAC CDDA". But the old CHDMAN versions don't seem to have a commandline option for forcing a specific method (or did I miss something?).
The V3/V4 versions seem to be always having the header byte at 0014h set to 02h=ZLIB+.

For FLAC, I've found this source code file:
https://www.nayuki.io/page/simple-flac-implementation - 366 lines
I guess I'll use that one to write an FLAC decoder in ASM.

There is also something called "Tiny FLAC", but it's about 1000-2000 lines tall, which is not so tiny at all.
There is also some ASM version with MMX opcodes, which sounds interesting, but I am not sure where to download the source code )there seem to be webpages with different versions and change notes), and the CPU requirements are unclear to me (MMX or SSE2 or even SSE3), and it seems to be some ASM+C mixup (with ASM code used only in subfunctions).
There are probably some huge source packages with hundreds of files that might include ASM code, but I doubt that I could figure out which of those files are needed for making a simple flac decoder.

Post by **null** » September 19th, 2022, 5:59 pm

nocash wrote: ↑September 11th, 2022, 5:19 am But the old CHDMAN versions don't seem to have a commandline option for forcing a specific method (or did I miss something?).

That's right, it's always the default option.

Post by **nocash** » September 25th, 2022, 4:33 am

I've got the FLAC decoding working, and I can now decode most of the CHD files (except the HUFF and AVHUFF ones).

FLAC
For FLAC, I've ported the "simple-flac-implementation" to asm. CHD is using only the raw FLAC "Frames" (starting with the 14bit 3FFEh sync mark), without the FLAC file header and FLAC metadata. The CHD "hunks" do contain about four such Frames.
One oddity in the simple FLAC decoder is that the filter function is doing this:

Code: Select all

sum += result[i - 1 - j] * coefs[j];

using "-j" as sample index, and "+j" as coefficient index. That isn't too much of a problem, but I've reversed the ordering of the coefficients in memory, so the filter function can use "-j" for both samples and coefficient indices.

The arrays and variables in the "simple" code are always using type "long", which does probably mean 64bit, and that's a bit overkill when dealing with tiny 16bit samples. I've tried to change it to 16bit, but that didn't work. There are some cases where one does need 32bit (or at least 17bit precision):

when reading compressed samples, readRiceSignedInt does occassionally return values that are 17bit tall
when doing the final "chanAsgn == 10" right/side filtering, "side" needs to be 17bit to get correct "side/2" results.

I think one could calculate the final 16bit results by doing all those 32bit calculations on the fly (without ever needing to store 32bit values in temporary arrays), but the code seems to be already fast enough to play audio tracks on my old PC.
On the other hand, the Psalm69 audio seems to use FLAC only on a few audio sectors (the other audio sectors seem to use deflate or lzma compression), and audio decoding might get slower when simultaneously doing lots of GPU/CPU work. Anyways, for now it seems to be working fast enough.
For the coefficient sum, I am not sure if "sum" needs to be 32bit or 64bit. I've implemented both, but with the CHD test files, sum does never overflow 32bit range (and even if it did: positive/negative overflows might compensate each other, so one could perhaps ignore them).

CHD Audio
CHD is apparently storing all Audio sectors in big-endian format (that is opposite of normal cdrom images like CUE/BIN). I've no idea why it's doing that, maybe audio is actually stored in that form on physical discs, but it's kinda annoying when wanting to deal with normal little-endian values. One basic scenario would be:

decode FLAC audio
convert it to big-endian (because method "cdfl" works as so)
compute the CHD CRC checksum
convert it back to little-endian (because type "audio" is stored in big-endian)

One could avoid the double endianness conversion by using a special CRC function that reads data in byte-swapped order (or completely omit the CRC check), but there are also situations like this:

Audio can be compressed via zlib/lzma instead big-endian-flac
Data can be compressed via big-endian-flac
compressed "hunks" may contain a mixup of Data and Audio sectors

So it's easiest/safest to always do those endian conversions:

always convert to big-endian (when method is "cdfl")
always convert to little-endian (when metadata for current sector is type audio)

Deathball and more CHDMAN bugs
I've just noticed that Deathball isn't actually a valid CDROM image: The ECC and EDC values are just filled with placeholders (like 02,02,02,02...), which won't work on real hardware (unless the cdrom burner is fixing those values). And the compression ratios for the dball.chd files are a bit misleading: The zlib and cdzl files are almost the same size. In reality, with actuall ECC values, zlib should be a good bit bigger than cdzl.
Looking closer at the "cdzl" and "cdlz" compression: It's merely removing the ECC values, but keeps the 4-byte EDC values unchanged, so the compressed sectors will be about 4 bytes bigger than needed : /
And a funny bug, spotted when looking at dball files sizes:

Code: Select all

 1340Kbytes - uncompressed CUE/BIN
  278Kbytes - compressed CHD files
 3578Kbytes - compressed CHD files generated by CHDMAN v0.112 through v0.118, whoops.

PGTYPE
I've found where the "VAUDIO" stuff comes from. It's in https://raw.githubusercontent.com/mamed ... /cdrom.cpp

Code: Select all

  if (track->pregap > 0)
    if (pgtype[0] == 'V')                                      ;\eg."VAUDIO"
      convert_type_string_to_pregap_info(&pgtype[1], track);   ;/
    convert_subtype_string_to_pregap_info(pgsub, track);       ;-eg."RW"
  [...]
  if (toc.tracks[i].pgdatasize > 0)
    strcpy(&submode[1], get_type_string(toc.tracks[i].pgtype));
    submode[0] = 'V';   // indicate valid submode
  else
    strcpy(submode, get_type_string(toc.tracks[i].pgtype));

Older CHDMAN versions (eg. v0.146) did use nonsense "PGTYPE:MODE1" for all tracks (including audio tracks), later versions (eg. v0.246) did fix that issue; those newer files include a "V" prefix to indicate that the entry contains "valid" info (eg. "PGTYPE:VAUDIO") (except, Track 1 keeps using "PGTYPE:MODE1" without "V" and it's "MODE1" even on MODE2 discs).
Well, that's where it comes from, but I don't really know how the presence/absence of "V" will affect the actual cdrom decoding... and actually I don't even know what the PREGAP, POSTGAP, PGxxx stuff is meant to do exactly... especially, I don't know if PREGAPs are included as compressed sectors in the CHD file, or if they aren't included.
A cdrom test image with voice recordings saying "Two", "Three", "Four" on track 2-4 would be helpful for testing the starting location of the tracks and gaps.

CHD hunk size
The chd compression blocks are quite small (only 4-8 sectors), I am not sure if that's optimal... it's good for fast random access... but I am wondering if a bigger blocks (with commandline --hunksize) would compress better? Sectors are 2448 bytes so something like --hunksize 244800 or --hunksize 2448000 might be worth trying (or the closest multiple of 2448 below 512K, which appears to be the maximum size according to chd source code).

The results will probably vary for different games; depending on whether the have repeating data across several sectors.
It might even work for compressing some movies (probably not so much for normal animated movies, but it could compress very well if there are any movies with still images).

The downside is that bigger compression blocks would increase random access seek times.
The current size is so small that one could pause the emulation and decompress the whole block at once (without too much affecting the emulation frame rate).
With larger block sizes, one would need to pause the decompression after each sector and resume emulation (or use some multi-threading on dual core cpus for that).
That should work smoothly for continous reading (but annoying to implement that for all of the different methods: deflate, lzma, flac, etc.)
And it won't work too smoothly when seeking different cdrom sectors... on the other hand, seeking is kinda slow on real cdrom drives, too. So it might be acceptable if the random access isn't slower than "average" seek times on real hardware... I don't have any benchmarks for average PSX seek times to nearby (or far-away) sectors though.

That said, I am now near burn-out. The CHD stuff is getting more and more complicated... I hope I can sort out that mess and write up some kind of compact and legible CHD file format description.

Post by **null** » September 25th, 2022, 3:27 pm

nocash wrote: ↑September 11th, 2022, 5:19 am but I am wondering if a bigger blocks (with commandline --hunksize) would compress better?

From GitHub issues page wrote: chdman compression is worse with larger hunk size #7135
http://github.com/mamedev/mame/issues/7135

whocares0101 commented Aug 24, 2020

1Xtreme (USA)
http://redump.org/disc/15241/

Using mame0223b_64bit on windows 10.
chdman.exe createcd -np 12 -hs 195840

default, 19584
301 MiB

39168
338 MiB

195840
339 MiB

As the numbers show, for this game the compression gets worse with a larger hunk size. Shouldn't it at least stay about the same size if it's not better compressible? Just for reference 7z can crush this to 150 MiB so there should still be room for improvement with a larger hunk size.

DavidHaywood commented Aug 25, 2020

CHD is designed to be streamable and quick to decompress, larger hunk size defeats the purpose and will introduce stutter. Also CHD does some basic duplicate hunk checks, if you increase the size you're less likely to have small duplicate hunks, which is probably why compression gets worse.

nocash wrote: ↑September 11th, 2022, 5:19 am Looking closer at the "cdzl" and "cdlz" compression: It's merely removing the ECC values, but keeps the 4-byte EDC values unchanged, so the compressed sectors will be about 4 bytes bigger than needed : /

They've already addressed this issue on the GitHub page but the fix is for CHDv6 not on CHDv5.

nocash wrote: ↑September 11th, 2022, 5:19 am A cdrom test image with voice recordings saying "Two", "Three", "Four" on track 2-4 would be helpful for testing the starting location of the tracks and gaps.

I don't know any PSX ROMs that do that. But there are hundreds of free PSX demo alpha/beta ROMs that you can try.
http://hiddenpalace.org/Category:PlayStation_prototypes
https://tcrf.net/Category:PlayStation_ROMs

Post by **nocash** » September 25th, 2022, 7:29 pm

From GitHub issues page wrote: 1Xtreme (USA) http://redump.org/disc/15241/
19584 - 301 MiB
39168 - 338 MiB
CHD is designed to be streamable and quick to decompress, larger hunk size defeats the purpose and will introduce stutter. Also CHD does some basic duplicate hunk checks, if you increase the size you're less likely to have small duplicate hunks, which is probably why compression gets worse.

The 1Xtreme game is a weird corner case: the redump page says that the last two tracks have identical checksums, probably because the disc contains some unused padding stuff in the last track(s). For most other CDROMs that duplicated hunks won't happen...

CDROM Data sectors cannot have duplicated hunks (because all CHD versions fail to filter out the increasing MM:SS:FF values in the sector header).
CDROM Audio sectors can have duplicated huks (eg. duplicated audio tracks, or identical sections with looping data, aligned to the hunk size boundary... but that's very unlikely to happen in practice; except on the 1Xtreme disc).

The compression ratio vs hunk size depends on whether the disc contains audio tracks: Data tracks should be getting smaller with bigger hunks. Audio tracks (with FLAC compression) should stay the same (because CHD is apparently compressing each 1-2 sectors as separate FLAC Frames, regardless of the hunk size).

nocash wrote: ↑September 11th, 2022, 5:19 am A cdrom test image with voice recordings saying "Two", "Three", "Four" on track 2-4 would be helpful for testing the starting location of the tracks and gaps.

I meant like somebody taking a microphone (or ripping some voice recordings) and making a CUE/BIN or TOC/BIN file... ie. making a raw audio disc with 4 audio tracks... or appending three audio tracks to the DBALL.CUE file. A neat bonus would be a stereo track saying "Left, Right" on the corresponding sides.

Post by **null** » September 26th, 2022, 5:17 pm

nocash wrote: compression ratio vs hunk size

I think you're right.

R. Belmont wrote: In CHD terms a "hunk" is a single compressed data blob containing one or more blocks from the original medium. Larger hunks give better compression ratios, but make actually reading the data more expensive.

Additional info.

MameHaze wrote: Larger hunk sizes will introduce more 'microstutter' as you're giving the emulator more work to do each frame where something needs decompressing, meaning your frame load ends up every uneven.

For anything making a lot of small random accesses it will also have a severe negative effect on overall performance as you'll be decompressing large amounts of data you don't even need.

nocash wrote: --hunksize ??? might be worth trying

Do you need a sample of CHDs with different hunk sizes?

nocash wrote: I meant like somebody taking a microphone (or ripping some voice recordings) and making a CUE/BIN or TOC/BIN file...

I know how to make a WAV audio file using Google's AI for TTS (text to speech) but I don't know how to convert that to CUE/BIN that works properly with the real PSX hardware. Maybe the PSX SDKs can do that but I don't have any PSX SDK installed on my computer.

Post by **nocash** » September 26th, 2022, 11:37 pm

null wrote: ↑September 25th, 2022, 3:27 pm EDC: They've already addressed this issue on the GitHub page but the fix is for CHDv6 not on CHDv5.

I hope they are aware that the EDC is optional in MODE2-FORM2 sectors, there should be a flag whether or not to filter EDC in FORM2 sectors (a global flag in the file header should do it, or one could store the flag in metadata or anywhere else, if it doesn't take up too much space).
Namely, Sony seems to have invented discs with EDC alongsides when releasing "Official U.S. PlayStation Magazine Demo Disc 43" (which exists in two versions: with and without EDC).

null wrote: ↑September 26th, 2022, 5:17 pm Do you need a sample of CHDs with different hunk sizes?

Not really, I would be more interested in a list with compressed sizes for different hunk sizes for 5-10 retail games. Just to get an idea if the hunk sizes can produce significantly smaller files. Best for different types of games, for example:
Wipeout 2097/XL - has lots of CD-DA audio tracks
Pandemonium 2 - has several large STR movies
something with lots of .XA audio files
something with http://problemkaputt.de/psxspx-cdrom-fi ... ession.htm
Deathball as we've already used that a lot for testing.
Or whatever you have around, best some mixup of older and newer games and different genres and publishers.

Interesting would be these three hunk sizes:
122400 = 2448*50
244800 = 2448*100
489600 = 2448*200
The maximum hunk size should be 523872 = 2448*214 (just below the official 512K limit).

MameHaze wrote: Larger hunk sizes will introduce more 'microstutter' as you're giving the emulator more work to do each frame where something needs decompressing, meaning your frame load ends up every uneven.

No, that's wrong. Even the old crappy CDZ cdrom compression format was suggesting to use multithreading (to keep the emulation running smoothly, and to read-ahead-and-decompress new sectors in background).
I am not too motivated to implement that - but it might be worth doing it if the compression ratio turns out to be significantly better.

MameHaze wrote: For anything making a lot of small random accesses it will also have a severe negative effect on overall performance as you'll be decompressing large amounts of data you don't even need.

That's true, and a really large hunk size (like 1Mbyte or more) would screw up random access (and then one could as well use zip or 7z instead of chd for compression). But I think 128K-512K might be reasonable, especially for emulating slow CDROM drives. I have no benchmarks with exact timings, but CDROM seek works about as so:

far seek: very slow (you can hear the sled moving when seeking from first to last audio track)
medium seek: drive head moves back-and-forth and reads several sectors until it's near the desired sector.
near seek: drive is skipping a few sector(s) until reaching the desired sector.
lucky seek: drive is exactly at desired location and can return data almost instantly.

Most of that cases could be neatly emulated: You just tell the emulation that the drive mechanics are busy during decompression. The only issue would be the lucky case, if the desired sector is shortly after the recent sector then it's no problem (you should have already prefetched and decompressed the next some sectors).
But, say, the recent sector was #100, and the drive is paused/circling between sector #97 and #105, and the game does then want to read from sector #99, then real hardware might instantly return the desired sector data, whilst chd might first need to decompress sector #50-#99.

null wrote: ↑September 25th, 2022, 3:27 pm I know how to make a WAV audio file using Google's AI for TTS (text to speech) but I don't know how to convert that to CUE/BIN that works properly with the real PSX hardware. Maybe the PSX SDKs can do that but I don't have any PSX SDK installed on my computer.

Actually, a raw Audio disc image would be easier for testing - no need for PSX SDK and Data tracks or the like.
Making a CUE file in text editor is relative simple. Normally, an audio disc would look as so:

Code: Select all

FILE "Track 01.bin" BINARY
  TRACK 01 AUDIO
    INDEX 01 00:00:00
FILE "Track 02.bin" BINARY
  TRACK 02 AUDIO
    INDEX 00 00:00:00
    INDEX 01 00:02:00
FILE "Track 03.bin" BINARY
  TRACK 03 AUDIO
    INDEX 00 00:00:00
    INDEX 01 00:02:00
FILE "Track 04.bin" BINARY
  TRACK 03 AUDIO
    INDEX 00 00:00:00
    INDEX 01 00:02:00

Above does assume that Track 2-4 contain leading gaps with index 0. That's usually the case when dumping real discs. When making custom discs, it should be easier to insert those gaps via PREGAP, as so:

Code: Select all

FILE "Track 01.bin" BINARY
  TRACK 01 AUDIO
    INDEX 01 00:00:00
FILE "Track 02.bin" BINARY
  TRACK 02 AUDIO
    PREGAP 00:02:00
    INDEX 01 00:00:00
FILE "Track 03.bin" BINARY
  TRACK 03 AUDIO
    PREGAP 00:02:00
    INDEX 01 00:00:00
FILE "Track 04.bin" BINARY
  TRACK 03 AUDIO
    PREGAP 00:02:00
    INDEX 01 00:00:00

Theoretically you could also directly use .wav files instead .bin files:

Code: Select all

FILE "Track 01.wav" WAVE
  TRACK 01 AUDIO
    INDEX 01 00:00:00
FILE "Track 02.wav" WAVE
  TRACK 02 AUDIO
    PREGAP 00:02:00
    INDEX 01 00:00:00
FILE "Track 03.wav" WAVE
  TRACK 03 AUDIO
    PREGAP 00:02:00
    INDEX 01 00:00:00
FILE "Track 04.wav" WAVE
  TRACK 03 AUDIO
    PREGAP 00:02:00
    INDEX 01 00:00:00

I don't know if chdman supports PREGAP and CUE/WAV though. No$psx doesn't support CUE/WAV (anyways if chdman can convert it to CHD then the resulting CHD should work in no$psx).
There are probably tools to convert WAV to BIN. The data should be 44100Hz 16bit stereo (the tools might insist on the WAV file to use that format).
Any cdrom-dedicated tools should hopefully pad the BIN file size to a multiple of the cdrom sector size.
cdrom burning tools can probably convert formats (like load CUE/WAV, then save as CUE/BIN) (or burn CUE/WAV to virtual disc, then dump it to CUE/BIN).
The BINARY audio in CUE/BIN should be little-endian. I've just noticed that "AUDIOFILE" in TOC/BIN seems to want it in big-endian (maybe that's where the uncommon endianness in CHD comes from).

Post by **null** » September 28th, 2022, 6:16 pm

nocash wrote: A cdrom test image with voice recordings.

Okay I did it!!

These are the tracks:
01 - Left side stereo only
02 - Right side stereo only
03 - Left side stereo only
04 - Right side stereo only

I used the VLC Player to convert WAV to CDDA WAV, then I used the CDmage to convert CDDA WAV to CUE/BIN. The shortest milliseconds of audio that can be converted to an audio CD WAV is 418ms.

The CHDMAN/CDmage will permanently merged the four BIN into one BIN. And then the CUE will be modified automatically by CHDMAN/CDmage (same CUE results from both app) with correct values. If you want to split the BIN, you need some other 3rd party apps to split the one BIN back into four BIN.

The CDmage can read the audio tracks continuously but the no$psx is like this.

no$psx audio output wrote: no$psx voice log:
one, two, three, four hmmm...
two, three, four hmmm...
three, four hmmm...
four hmmm...
hmmm...

nocash wrote: List with compressed sizes for different hunk sizes for 5-10 retail games.

I'll do it later.

Post by **nocash** » September 29th, 2022, 8:22 pm

Perfect. That's a wonderful small cdrom-image, very useful for testing things!

Ah, I see, Track 1 should only say "one" (plus silence), not "one two three four hmmmm" as happening in no$psx. I am currently mirroring to "random" sectors during PREGAPs (hence the "two three four") and I am mirroring to I-don't-know-what at end of disc (the "hmmm" is probably from an empty data sector with ECC repeating at 75Hz rate). I'll look into fixing that - most PSX games don't require such pregaps & end of track handling, so I have never accurately tested & emulated those things.

The chd v0.145 disc image is using Deflate for audio tracks, that's unexpected because it's the last V4 version, and the CHD source code says that V3/V4 can use "FLAC CDDA" as secondary compression method... but apparently v0.145 doesn't do that.

Playing the two chd files in no$psx, I am currently hearing only track 1 and track 2-4 are silent. I guess chd files don't contain any compressed/zerofilled data for the pregap areas (so track 2-4 appear other locations than expected).

I've converted the v0.145 CHD file to CUE/BIN (and TOC/BIN) using the "chdman -extractcd x.chd x.cue x.bin". The CUE file looks okay. But the TOC file is almost entirely different than described at https://linux.die.net/man/1/cdrdao
It's using "CD_ROM" instead "CD_DA"
It's using "ZERO AUDIO" instead of just "SILENCE" (or "ZERO" or "PREGAP")
It's using "#70560" as an undocumented extra parameter for DATAFILE
It's expecting big-endian audio DATAFILEs (the cdrdao manual leaves the endianness for DATAFILEs unspecified).

Btw. if you have some conversion tool around, it would be interesting to have the 1-2-3-4 test disc also in formats like CCD, CDI, MDS, NRG, PBP, CUE/BIN and TOC/BIN (other than CHD's cue/toc format) or in any other formats you can think of.

Post by **nocash** » September 30th, 2022, 8:17 am

Sector Sizes
Decompressed CHD sectors are always 990h bytes tall (930h bytes sector data, plus 60h bytes subchannel data).

For the 930h-byte sector data, all my CUE/BIN disc-images are actually stored in as 930h-byte sectors format, but CHD metadata does also support other sectors like 800h bytes (2048 decimal).
Does somebody have a small CUE/BIN image with such smaller sector sizes for testing? It doesn't need to be a PSX MODE2 image, a normal PC MODE1 image would be fine, too.

For the 60h-byte subchannel data, the existing CHD files have SUBTYPE:NONE, meaning that those 60h-bytes are just zerofilled. Are there ways to create CHD files with subchannel data? Like supplying a .SUB file additionally to the .CUE file?

CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Re: CHD Disk Image Format (MAME)

Who is online

Login • Register