I've reverse-engineered most of the .SYM file chunks. The five chunk numbers from the ASM68K version do match for PSX version, too (but with slightly different parameter values). But there are a lot more than five chunks... lameguy's sample uses 12 different chunks, and the tomb5.zip package contains yet four more chunks, so, as by now, there are 16 known chunks. The whole .sym file consists of a 8byte fileheader, which is then followed by the various chunks.
Code: Select all
PsyQ .SYM Files
Fileheader
00h 4 File ID ("MND",01h)
04h 4 Whatever (0,0,0,0) ;TOMB5: 0,02h,0,0
Chunk 01h: Symbol (Immediate?) (eg. memsize, or membase)
Chunk 02h: Symbol (Function Address for External Function?)
Chunk 05h: Symbol (?)
Chunk 06h: Symbol (?)
00h 4 Address/Value
04h 1 Chunk ID (01h/02h/05h/06h)
05h 1 Symbol Length (LEN)
06h LEN Symbol (eg. "VSync")
Chunk 80h: Source Code Line Numbers: Address for 1 Line
00h 4 Address (for current line)
04h 1 Chunk ID (80h)
Chunk 82h: Source Code Line Numbers: Address for N Lines
00h 4 Address (for N lines, starting at current line)
04h 1 Chunk ID (82h)
05h 1 Number of Lines (00h=None, or 02h and up?)
Chunk 84h: Source Code Line Numbers: Address for N Lines (16bit?)
00h 4 Address (for N lines, starting at current line)
04h 1 Chunk ID (84h)
05h 2 Number of Lines (?)
Chunk 86h: Source Code Line Numbers: Address for Line (32bit???)
00h 4 Address (for N lines, starting at current line)
04h 1 Chunk ID (84h)
05h 4 Absolute Line Number (rather than number of lines) (?)
Chunk 88h: Source Code Line Numbers: Start with Filename
00h 4 Address (start address)
04h 1 Chunk ID (88h=Filename)
05h 4 First Line Number (after comments/definitions) (32bit?)
09h 1 Filename Length (LEN)
0Ah LEN Filename (eg. "C:\path\main.c")
Chunk 8Ah: Source Code Line Numbers: End of Source Code
00h 4 Address (end address)
04h 1 Chunk ID (8Ah)
Chunk 8Ch: Internal Function: Start with Filename
00h 4 Address
04h 1 Chunk ID (8Ch)
05h 4 Whatever (1Eh,00h,20h,00h) ;or 1Eh,00h,18h,00h
09h 4 Whatever (00h,00h,1Fh,00h)
0Dh 4 Whatever (00h,00h,00h,C0h)
11h 4 Whatever (FCh,FFh,FFh,FFh) ;mask? neg.offset?
15h 4 Whatever (10h,00h,00h,00h) <-- line number (32bit?)
19h 1 Filename Length (LEN1)
1Ah LEN1 Filename (eg. "C:\path\main.c")
xxh 1 Symbol Length (LEN2)
xxh LEN2 Symbol (eg. "VSync")
Chunk 8Eh: Internal Function: End of Function (end of chunk 8Ch)
00h 4 Address
04h 1 Chunk ID (8Eh)
05h 4 Line Number <-- line number (32bit?)
Chunk 90h: Internal Function:Whatever90h... first instruction in main func?
Chunk 92h: Internal Function:Whatever92h... last instruction in main func?
Maybe line numbers? Or end of definitions for incoming parameters?
00h 4 Address
04h 1 Chunk ID (90h/92h)
05h 4 Whatever (1Fh,00h,00h,00h) <-- line number relative to main.start?
Chunk 94h: Type/Symbol (Simple Types?)
00h 4 Offset (when used within a structure, or stack-N, or otherwise zero)
04h 1 Chunk ID (94h)
05h 2 Class (000Dh=Type.alias, 000Ah=Address, 0001h=Stack, 0002h=Addr)
07h 2 Type (XX = 8bit,16bit,signed,etc.?)
09h 4 Zero, or Size in Bytes (for "memblocks")
0xh 1 Symbol Name Length (LEN)
0xh LEN Symbol Name (eg. "size_t")
Chunk 96h: Type/Symbol (Complex Structures/Arrays?)
00h 4 Offset (when used within a structure, otherwise zero)
04h 1 Chunk ID (96h)
05h 2 Class (02h=Array,08h=RefToStruct,0Dh=DefineAlias,66h=StructEnd)
07h 2 Type (0xh=Small, 3xh=WithArrayStuff?) (same/similar as in chunk 94h)
09h 4 Struct Size in Bytes
0Dh 2 Array Dimensions (DIM) (0=none) ;eg. [3][4] --> 0002h
0Fh DIM*4 Array Entries per Dimension ;eg. [3][4] --> 00000003h,00000004h
xxh 1 Internal Fake Name Length (LEN1) (0=none)
xxh LEN1 Internal Fake Name (eg. ".1fake")
xxh 1 Symbol Name Length (LEN2)
xxh LEN2 Symbol Name (eg. "r")
Class definition (in chunk 94h) (and somewhat same/similar in chunk 96h)
(looks same/similar as C_xxx class values in COFF files!)
0001h = Local variable (with Offset = negative stack offset)
0002h = Global variable or Function (with Offset = address)
0008h = Item in Structure (with Offser = offset within struct)
0009h = Incoming Function param (with Offset = index; 0,4,8,etc.)
000Ah = Type address / struc start? (with Offset = zero)
000Dh = Type alias (with Offset = zero)
Type definition (in chunk 94h/96h)
(maybe lower 4bit=type, and next 4bit=usage/variant?)
(looks same/similar as T_xxx type values in COFF files!)
0000h =
0001h =
0002h =
0003h = (16bit signed?)
0004h = int (32bit signed?)
0005h =
0006h =
0007h =
0008h = (address) (32bit unsigned?) (with Definition=000Ah)
0009h =
000Ah =
000Bh =
000Ch = u_char (8bit unsigned?)
000Dh = u_short,ushort (16bit unsigned?)
000Eh = u_int (32bit unsigned?)
000Fh = u_long (64bit unsigned?) (or rather SAME as above?)
0021h = function with 0 params, and/or return="nothing"?
0024h = main function with 2 params, and/or return="int"?
0052h = argv (string maybe?)
0038h = GsOT (huh?)
00F8h = GsOT_TAG (huh?)
00FCh = PACKET (huh?)
?? = float,bool,string,ptr,packet,(un-)signed8/16/32/64bit,etc
?? = custom struct
What is that Stohrendorf for Tomb Raider and other games? I am double-confused...
Do you mean that some/all (?) PSX games were shipped with .SYM files on the retail discs?
Or did you find a tomb raider .SYM file from some non-retail prototype version?
Looking at the stohrendorf/symdump source code...
The Block.cs file seems to be somehow dealing with chunk ID numbers, but it's masking them to 7bit values, and then deals with them in decimal. For example, value "20" (aka 14h) seems to refer to chunk 94h in 8bit notation.
Or so, I couldn't really say that I understand what the stohrendorf source code is supposed to do. The chunk numbers seem to be also processed in various files other than Blocks.cs, but I haven't yet found if or where it's doing the "real" stuff (like processing the parameters that belong to those chunks).
*** SYM FILE CONTENTS ***
The stuff in the chunks consists of three main parts:
- Source code: filenames/line numbers
- Type Info: variables/structures/arrays
- Symbol: immediates/function labels
**** SYMBOLS ****
That part looks fairly simple, just assigning label strings to addresses (and immediates).
**** TYPE INFO ****
I am not really planning to support that in no$psx. Anyways, this stuff allows to define structures, and to assign those structures (or predefined basic types) to variables.
The "Class/Type" values seem to be same as in COFF files (apart from that, the "chunk-based" .SYM files are quite different than "section-based" COFF files).
Anyways, it would be nice to verify if the Type values are really same as in COFF. Could somebody make a .SYM file the defines variables with all possible basic types? Ie. boolean, strings, and signed & unsigned 8bit/16bit/32bit values, pointers, and 64bit values and float values (if that's supported), etc?
Best would be if the name of the variables is reflecting the type (eg. "my_int = int"). Plus some explanatation what it's doing (for example, "int"... I guess... that means a "signed 32it" value... or is that wrong?)
And, also make a structure, where one of the structure elements refers to another structure!
**** LINE NUMBERS ****
Decoding the line numbers works quite well for lameguy's sample. But for tomb5, I am ending up with DoGameflow in Line=1771 but going by the
https://github.com/Gh0stBlade/TOMB5/blo ... GAMEFLOW.C file, it should be around Line=93. Any ideas what's going wrong there?
Is the .sym file based on the GAMEFLOW.C source code? Is it an older prototype .sym that doesn't really belong to the decompiled source code? Or would be possible that leading include files do "insert" extra lines?
Also, I am not sure how the more complex line number cases are supposed to work, for example, this snippet from within Tomb5\GAME\CONTROL.C (the line numbers seem to suffer the same problem as mentioned above) (CONTROL.C has only 943 lines):
Code: Select all
Src.Line32 Addr=0001E38C, Line=451 ;set absolute line
Src.Line8 Addr=0001E3A4, Line=452..456 ;assign several lines (or assign one line, and SKIP the other lines?)
Src.Line Addr=0001E3B8, Line=457 ;assign one line (and increment to next line)
Src.Line16 Addr=0001E3B8, Line=458..1113 ;assign several lines (or assign one line, and SKIP the other lines?)
Src.Line Addr=0001E3E4, Line=1114 ;assign one line (and increment to next line)
Src.Line16 Addr=0001E3E4, Line=1115..1377 ;assign several lines (or assign one line, and SKIP the other lines?)
Src.Line8 Addr=0001E3F4, Line=1378..1379 ;assign several lines (or assign one line, and SKIP the other lines?)
Src.Line32 Addr=0001E3F8, Line=1377 ;set absolute line
Src.Line8 Addr=0001E400, Line=1378..1384 ;assign several lines (or assign one line, and SKIP the other lines?)
Src.Line32 Addr=0001E410, Line=1377 ;set absolute line
That's somehow assigning multiple lines to address 0001E3F4.
And the other way around, line 1378 is assingned to multiple addresses.
I guess such thinks might happen if a HLL read-modify-write operation gets scattered to several ASM opcodes (and the opcodes getting ordered noncontinously for load-delay and branch-delay reasons).
But I am not sure how that stuff could/should be displayed in a debugger window. A screenshot showing how the code near 0001E3F4 is getting displayed in PsyQ debugger would be nice.
Btw. people will probably hate that... but I like to keep displaying ASM code in no$psx, and have it displayed "mixed" with HLL code (ie. HLL source code lines, followed by the corresponding disassembled ASM opcodes) (however that would work out when the ASM opcodes are having different ordering than the HLL code lines).