Twilight Syndrome Tensaku-hen

Post by **imagoth2004** » May 5th, 2024, 12:55 pm

So, I think I'll be using this section to discuss what I started working on recently.

One of the games I came across when I was in Japan was the Twilight Syndrome series. This was before I was even aware of the reference in Danganronpa or spin-offs of the core series, so I decided to learn about the PS1 structure by sinking my teeth into reverse engineering it for translation.

I started it off quite simply, using mkpsxiso to extract the contents of the disc. From the base of what I saw, most of it seemed pretty straightforward. However, looking through the file format documentation, there were a ton of .cdb files, which, to my assumption, are generic constant databases. I came across the format type in the past but decided to start quickly.

I have a background in video editing and audio engineering/music production, so when I saw the video files (.str), I figured I'd start there.

The files would play easily in PSMplay, and from there, it was pretty easy to toss into Adobe Premiere and add subtitles to the raw video itself. The downside is that the newer versions of Premiere don't allow for exporting to 37.2khz audio. However, when I tested a simple extract/inject, the game ran fine, and it was saved with 44.1 kHz; I assumed MC32 (what I used to inject) could convert the file's audio. Overall, this wouldn't compress what audio was already on the file.

So I was set to go, right? I extracted the raw video with interlaced audio that had spoken word, added in my translated subtitles, and put them in. I rebuilt the disc image and ran it on an emulator and hardware. I played intro splashes and intro videos with my subtitles and looped them as they should. It wasn't until I played past the prologue that the next translated video played. Everything is going fine, and then another video begins to play at the end. "Uh... That's the final video of the game."

So I was left scratching my head: Why did it do that? The end of the prologue caused the game to think there was more video to play, so it played the next file in the list, which was a teaser trailer for the next game. So I tossed it into HxD to see what possibly changed. Well, the header was completely different. I couldn't tell exactly what it did, but I noticed the formatting was far off from what it should be.

I spent a week trying to export the video from Premiere into various other formats before tossing them back into the game, but of course, the same thing kept happening. So, I scrapped it and tried different ways to return the video to the .str format. All were fruitless, but another week passed before I realized that when I was taking the interlaced export from Adobe into MC32, MC32 wasn't compressing the audio on the file correctly.

At least, it didn't take more than an hour for me to rip the audio separately into the natural 37 kHz raw .wav, make my edits, export video only from Adobe, and use MC32 to manually combine the video/audio data and toss it back into the .str. The problem was instantly solved, and the in-game videos worked flawlessly.

--------------

It leads me to the next part: those .cdb files or, more importantly, fonts.

In the game files is a file titled "FONT1.TIM" - It seemed straightforward until I opened it and noticed it was just a generic file, most likely left over from the production side as it was only English text, but it didn't even look like the English text sprinkled in the game.

However, there is a folder helpfully labeled Font with a, you guessed it, .cdb format file. So, I began my research. Some sites talk about the file types used with PS1 games but don't explain much. Also, when comparing the file header in HxD, it seemed to match the header info referenced in an Ace Combat game. However, most of the other information didn't match what I was looking at, so I took a step back and returned to what I knew about just .cdb files. I looked into what I was familiar with for Constant Databases by D.J. Bernstein. I downloaded a few libraries in Python to extract and read their contents, which led me to today.

I tried to use it to extract the files based on what I saw in the Hex. The game uses multiple fonts, and my theory was solidified when I came across a much, much older post on Twitter by another individual who was working on translating this game as well, where they mentioned the same thing.

The problem is that the Python library doesn't seem to recognize the file. Maybe it was how I was doing it, or maybe my syntax was wrong somehow. But after spending a few weeks fixing the syntax errors and reading the others that emerged, I can only assume the file is compressed. So, I tossed it into No$psx to see if I could understand what I was trying to accomplish.

Maybe it isn't in that file and located somewhere else? So, I tossed the game into the emulator and tried to find the text using the Vram viewer, leading to further confusion. I have no idea how to read this program. I understand how it adjusts the registers and moves values around, but when I try to view one of the English characters in the game, I see it's shown as a QuadTex. Now, I'm inferring that all the information I'm assuming in the separate window below the list of textures is the code sent to the GPU, but even then, I'm still unsure what it's accomplishing. So I've been trying to research QuadTex or moving up the code to determine which file the font location is, or is it coming from the kernel itself?

I'm a bit stuck and can't find much info on making sense of the debug emulator, but at least I'm getting a lot of reading in with the documentation (granted, with the drain bamage I have, I'll be re-reading these documents for eternity after). Does anyone have places they can point me to about the .cdb format or using No$PSX, or have they maybe tried to mess around with this game before? I'd love to compare notes!

Post by **jype** » May 7th, 2024, 9:34 pm

Nice work adding subtitles to the STR

Human Entertainment games are awesome, I'd like to see the Twilight Syndrome games translated one day.

I looked into KFONT.CDB format and it actually has nothing to do with D.J. Bernstein's cdb format. The CDB extension might stand for Compressed Data Blob or something else, who knows

KFONT.CDB begins with 8 16-bit pairs of CD sector offsets and sector sizes (both in 2kB units) that point to fonts within KFONT.CDB. The fonts are compressed with different types of encoding. In each encoded blob the first byte "len" determines the uncompressed length and encoding type: (len < 0x80) => uncompressed, (len < 0xc0) => run-length encoding, (len < 0xe0) => look-back, (len < 0xf0) => delta encoding. (len == 0xff) indicates end of data.

Within each decompressed set of blobs the first blob is a CLUT with 256 entries for different text colors, and the remaining blobs are 4-bit 16x16 glyph textures.

Here's a quick and dirty tool to dump the blobs and create TIMs from KFONT.CDB. The output and converted BMPs are attached. The tool doesn't work with other CDB files, they're somehow different but I haven't looked further into it.
https://gitlab.com/jype/twilight_syndrome_tools

A lot of the glyphs are duplicated between fonts but often with different indices. If the glyphs need to be mapped to Kanjis, the best way would be to compare blob checksums to produce a mapping table ("sha1sum *.bin | sort" for a quick comparison).

Kanji fonts are typically drawn with textured quads either one glyph or one line of text at a time. GPU packet and VRAM dumps are useful for figuring out how the primitives are drawn, but not for figuring out the source data format unless they're copied directly from source as is.

I highly recommend using Ghidra to make sense of the code. I'm using version 10.4 with ghidra_psx_ldr. The CDB decompression code in dump_cdb.c was taken straight from Ghidra code analysis of CAP0.EXE. I can share my findings later once I figure out the best way to export stuff from Ghidra.