I've got the FLAC decoding working, and I can now decode most of the CHD files (except the HUFF and AVHUFF ones).
For FLAC, I've ported the "simple-flac-implementation" to asm. CHD is using only the raw FLAC "Frames" (starting with the 14bit 3FFEh sync mark), without the FLAC file header and FLAC metadata. The CHD "hunks" do contain about four such Frames.
One oddity in the simple FLAC decoder is that the filter function is doing this:
Code: Select all
sum += result[i - 1 - j] * coefs[j];
using "-j" as sample index, and "+j" as coefficient index. That isn't too much of a problem, but I've reversed the ordering of the coefficients in memory, so the filter function can use "-j" for both samples and coefficient indices.
The arrays and variables in the "simple" code are always using type "long", which does probably mean 64bit, and that's a bit overkill when dealing with tiny 16bit samples. I've tried to change it to 16bit, but that didn't work. There are some cases where one does need 32bit (or at least 17bit precision):
- when reading compressed samples, readRiceSignedInt does occassionally return values that are 17bit tall
- when doing the final "chanAsgn == 10" right/side filtering, "side" needs to be 17bit to get correct "side/2" results.
I think one could calculate the final 16bit results by doing all those 32bit calculations on the fly (without ever needing to store 32bit values in temporary arrays), but the code seems to be already fast enough to play audio tracks on my old PC.
On the other hand, the Psalm69 audio seems to use FLAC only on a few audio sectors (the other audio sectors seem to use deflate or lzma compression), and audio decoding might get slower when simultaneously doing lots of GPU/CPU work. Anyways, for now it seems to be working fast enough.
For the coefficient sum, I am not sure if "sum" needs to be 32bit or 64bit. I've implemented both, but with the CHD test files, sum does never overflow 32bit range (and even if it did: positive/negative overflows might compensate each other, so one could perhaps ignore them).
CHD is apparently storing all Audio sectors in big-endian format (that is opposite of normal cdrom images like CUE/BIN). I've no idea why
it's doing that, maybe audio is actually stored in that form on physical discs, but it's kinda annoying when wanting to deal with normal little-endian values. One basic scenario would be:
- decode FLAC audio
- convert it to big-endian (because method "cdfl" works as so)
- compute the CHD CRC checksum
- convert it back to little-endian (because type "audio" is stored in big-endian)
One could avoid the double endianness conversion by using a special CRC function that reads data in byte-swapped order (or completely omit the CRC check), but there are also situations like this:
- Audio can be compressed via zlib/lzma instead big-endian-flac
- Data can be compressed via big-endian-flac
- compressed "hunks" may contain a mixup of Data and Audio sectors
So it's easiest/safest to always do those endian conversions:
Deathball and more CHDMAN bugs
- always convert to big-endian (when method is "cdfl")
- always convert to little-endian (when metadata for current sector is type audio)
I've just noticed that Deathball isn't actually a valid CDROM image: The ECC and EDC values are just filled with placeholders (like 02,02,02,02...), which won't work on real hardware (unless the cdrom burner is fixing those values). And the compression ratios for the dball.chd files are a bit misleading: The zlib and cdzl files are almost the same size. In reality, with actuall ECC values, zlib should be a good bit bigger than cdzl.
Looking closer at the "cdzl" and "cdlz" compression: It's merely removing the ECC values, but keeps the 4-byte EDC values unchanged, so the compressed sectors will be about 4 bytes bigger than needed : /
And a funny bug, spotted when looking at dball files sizes:
Code: Select all
1340Kbytes - uncompressed CUE/BIN
278Kbytes - compressed CHD files
3578Kbytes - compressed CHD files generated by CHDMAN v0.112 through v0.118, whoops.
I've found where the "VAUDIO" stuff comes from. It's in https://raw.githubusercontent.com/mamed ... /cdrom.cpp
Code: Select all
if (track->pregap > 0)
if (pgtype == 'V') ;\eg."VAUDIO"
convert_type_string_to_pregap_info(&pgtype, track); ;/
convert_subtype_string_to_pregap_info(pgsub, track); ;-eg."RW"
if (toc.tracks[i].pgdatasize > 0)
submode = 'V'; // indicate valid submode
Older CHDMAN versions (eg. v0.146) did use nonsense "PGTYPE:MODE1" for all tracks (including audio tracks), later versions (eg. v0.246) did fix that issue; those newer files include a "V" prefix to indicate that the entry contains "valid" info (eg. "PGTYPE:VAUDIO") (except, Track 1 keeps using "PGTYPE:MODE1" without "V" and it's "MODE1" even on MODE2 discs).
Well, that's where it comes from, but I don't really know how the presence/absence of "V" will affect the actual cdrom decoding... and actually I don't even know what the PREGAP, POSTGAP, PGxxx stuff is meant to do exactly... especially, I don't know if PREGAPs are included as compressed sectors in the CHD file, or if they aren't included.
A cdrom test image with voice recordings saying "Two", "Three", "Four" on track 2-4 would be helpful for testing the starting location of the tracks and gaps.
CHD hunk size
The chd compression blocks are quite small (only 4-8 sectors), I am not sure if that's optimal... it's good for fast random access... but I am wondering if a bigger blocks (with commandline --hunksize) would compress better? Sectors are 2448 bytes so something like --hunksize 244800 or --hunksize 2448000 might be worth trying (or the closest multiple of 2448 below 512K, which appears to be the maximum size according to chd source code).
The results will probably vary for different games; depending on whether the have repeating data across several sectors.
It might even work for compressing some movies (probably not so much for normal animated movies, but it could compress very well if there are any movies with still images).
The downside is that bigger compression blocks would increase random access seek times.
The current size is so small that one could pause the emulation and decompress the whole block at once (without too much affecting the emulation frame rate).
With larger block sizes, one would need to pause the decompression after each sector and resume emulation (or use some multi-threading on dual core cpus for that).
That should work smoothly for continous reading (but annoying to implement that for all of the different methods: deflate, lzma, flac, etc.)
And it won't work too smoothly when seeking different cdrom sectors... on the other hand, seeking is kinda slow on real cdrom drives, too. So it might be acceptable if the random access isn't slower than "average" seek times on real hardware... I don't have any benchmarks for average PSX seek times to nearby (or far-away) sectors though.
That said, I am now near burn-out. The CHD stuff is getting more and more complicated... I hope I can sort out that mess and write up some kind of compact and legible CHD file format description.