Playstation CPU reversing [56k Warning]

Post by **Akari** » March 20th, 2014, 11:40 pm

I created this topic to keep this community update with recent foundings in PSX CPU reversing. A lot of other news can be found on http://psxdev.ru/ and in this topic http://forum.emu-russia.net/viewtopic.php?f=13&t=4106 (in russian)

It's already 3 and half month from the beginning of CPU reversing. It's started from top left corner, where resides MDEC logic.

There are 8 big tif files with a lot of layers where I place logic cell. For now almost all IDCT conversion is complete. Now I move down towards unit03 and unit04 which contains Scale Table Matrix and towards unit05 where stored RLE decoded data.

Logic of IDCT multiplication looks like this:

Here we can find random logic that controls IDCT. There are 3 counters there and a lot of related logic.

Latest foundings was that unit 03 output is go to unit00 (13bit scale table matrix for IDCT) and go to unit04. i don't know why data should go there. It doesn't seem that selectable matrix must be used somewhere outside of IDCT.

By the way this is the latest progress =)

Post by **nocash** » March 21st, 2014, 11:27 am

Good that you are posting here, too. The PSX decapping project is just great! Aside from the russian pages, maybe you should also mention that the project is also having an english forum http://board.psxdev.ru/ which, I don't think that it's competing with the psxdev.net forum. I am just glad that you have some english page at all. English version of the wiki http://wiki.psxdev.ru/index.php/ would be also great : - )

Brief summary: The above IDCT, RLE, ScaleTable stuff all belongs to the MDEC picture/movie decompression unit located in upper-left of the PSX Main CPU chip.
Theoretically, the IDCT multiplication circuit can be already simulated via logisim http://ozark.hendrix.edu/~burch/logisim/ so one could reproduce the exact fractional roundings for the MDEC calculations.

Akari, did you ever got that working? Feeding two factors to the circuit, and receiving some intact looking multiplication result from the outputs? I couldn't test it myself since logisim requires java. Would be very interesting to know if it works, and to know how many fractional bits it's stripping from the result!

Post by **Akari** » March 21st, 2014, 4:12 pm

nocash wrote:Akari, did you ever got that working? Feeding two factors to the circuit, and receiving some intact looking multiplication result from the outputs? I couldn't test it myself since logisim requires java. Would be very interesting to know if it works, and to know how many fractional bits it's stripping from the result!

No. Logisim circuit created by Org producing some strange result. I don't want spend too much time with debugging circuit now. I want reproduce whole picture of MDEC first. If someone can take this part and do it it will be a great speed up for the project.

Post by **Akari** » March 22nd, 2014, 1:08 am

Just finished output from unit04. Now I doubt that this is Scale Table Matrix because data is handled in very strange way.

Outputs go to set of MUXes, ANDs, NORs, invertors and triggers. Data goes to first set of mux as pair of 8 bit. Triggers receive 10 bit, invertors get next 6 bit. Nor handles first 9 bit while and handles next 6 bit.

Any ideas? =)

All cells this time is in part 5.

Post by **Akari** » March 23rd, 2014, 5:56 am

Nor handles first 9 bit while and handles next 6 bit.

And all of them and last bit nanded. That gives us detection of value 0Ñ…FE00. So it seems to be RLE

Post by **nocash** » March 23rd, 2014, 11:51 am

Upper 9bit all ones and lower 6+1bit all zero for the special FE00h RLE value sounds perfect.

And the case where the the 16bits are split into 10bit + 6bit should be the actual RLE data/length stuff. In this case, the 6bits should be in the upper bits, and the 10bits in the lower bits, right?

That RLE stuff should receive compressed data from the MDEC DMA IN FIFO array, and output decompressed data to a 8x8 pixel array.

Post by **Akari** » March 27th, 2014, 12:01 am

Sceme reordering is over. Now data inputs (unit 03) and RLE (units 04 and 05) are separated. All RLE related sceme are merged togather.

Newest RLE progress are few outputs from unit 04 (row of NOTs and row of MUXes) and outputs from unit 05. Data from the top of the picture are went directly to IDCT multiplying. Now scace between this parts are the only thing that stops us from understanding RLE algorythm

All this cells are placed in part 5.

Post by **nocash** » March 27th, 2014, 3:15 am

Fine.

What size is Unit 04?

And are you sure that it's used only for RLE encoded data, or could it be a general purpose FIFO for incoming MDEC data?
It might be also used for other incoming data (ie. from scaletable & quant table commands; though it's also possible that those commands are routing the data directly to their target, without going through a FIFO).

Post by **Akari** » March 27th, 2014, 5:33 am

nocash wrote:Fine.

What size is Unit 04?

And are you sure that it's used only for RLE encoded data, or could it be a general purpose FIFO for incoming MDEC data?
It might be also used for other incoming data (ie. from scaletable & quant table commands; though it's also possible that those commands are routing the data directly to their target, without going through a FIFO).

Unit 04 is 16+16x64 bits buffer. And I almost sure that this is RLE buffer, not 100% sure but it looks like it.

Unot 03 seems to be general purpose FIFO for incoming MDEC data. it has output lines for scaletable matrix (unit 00) and for rle data (unit 04). Some lines also go somewhere far far away.

I have small question: is scaletable, quant table and rle data always come through DMA or they all can be passed as command parameters? Can you write small example in asm for me to berrer understanding those things?

Post by **nocash** » March 27th, 2014, 7:40 am

Okay, then Unit 03 seems to be the FIFO for incoming data - which should be 32 x 32bit in size.

Theoretically the FIFO output could be passed straight to the RLE decompression hardware - not quite sure why it's stored in Unit 04 before passing it to RLE decompression hardware. But well, maybe there was some reason for doing that.

And, of course, there should be array(s) for decompressed data (separate ones for Cr, Cb, and at least one for Y). Did you altready spot those, too?

Yes, all MDEC data can be sent/received also without DMA, see here http://nocash.emubase.de/psx-spx.htm#mdecioports for details. For Non-DMA transfers, I have this asm code:

Code: Select all

;use r28=1F800000h for below stuff
;------------------
nondma_mdec_init:
 mov  r1,80000000h              ;#mdec reset and disable dma
 mov  [r28+1824h],r1            ;/
 ret
;------------------
nondma_mdec_xmit:   ;in: r3=src(cmd+data), r4=src.len, r5=dst, r6=dst.len
 mov  r1,[r3]  ;src             ;#
 add  r3,4     ;src             ; send command (mdec.in)
 mov  [r28+1820h],r1            ;/
@@xmit_lop:
@@xmit_in_wait:                 ;#
  @@try_rx_lop:                 ;  ;#
   mov  r1,[r28+1824h]          ;  ;
   mov  r2,1 shl 31 ;dta.out.empty ;
   and  r2,r1   ;1=empty        ;  ;
   jnz  r2,@@skip_rx            ;  ;
   mov  r1,[r28+1820h]          ;  ; recv data (if any) (mdec.out)
   add  r6,4    ;dst.len        ;  ;
   mov  [r5],r1 ;dst            ;  ;
   add  r5,4    ;dst            ;  ;
   jmp  @@try_rx_lop            ;  ;
  @@skip_rx:                    ;  ;/
 mov  r1,[r28+1824h]            ;
 mov  r2,1 shl 30 ;dta.in.full  ; wait if mdec.in is full
 and  r2,r1   ;1=full           ; (and alongsides: recv data, if any)
 jnz  r2,@@xmit_in_wait         ;/
 mov  r1,[r3]  ;src             ;#
 add  r3,4     ;src             ; send data (mdec.in)
 mov  [r28+1820h],r1            ;
 sub  r4,4     ;len             ;/
 jnz  r4,@@xmit_lop
 ret
;------------------
EDIT: The forum editor treats backslashes as "delete following linebreak"? 
      I've replaced them by "#" to avoid that effect.

Post by **Akari** » March 29th, 2014, 12:03 am

nocash wrote:Okay, then Unit 03 seems to be the FIFO for incoming data - which should be 32 x 32bit in size.

Theoretically the FIFO output could be passed straight to the RLE decompression hardware - not quite sure why it's stored in Unit 04 before passing it to RLE decompression hardware. But well, maybe there was some reason for doing that.

And, of course, there should be array(s) for decompressed data (separate ones for Cr, Cb, and at least one for Y). Did you altready spot those, too?

Unit 03 is 32x32+16 bit buffer. We don't know how this 32+16 rows used yet.

Unit 04 is part of RLE decompression hardware. it seems that unit 03 not wait until rle or some other block finishes it's work. It's just send data to appropriate part and go for next task while other sets of units and cells start decompressing multiplying and other specific things.

I don't thint there are special units for Cr, Cb, Y and others. It's just no place for them among unknown units. We'll understand more after I finish RLE ang go for IDCT result.

By the way - NEW PROGRESS. Not that much logic though )
Bottom part are now connected to top part. Now it's obvious that outputs from unit 04 went to unit 05.
At the bottom part you can see delay rows of buffers and triggers. it seems they delay data until other parts of RLE decide what to do with them.

Post by **nocash** » March 29th, 2014, 6:03 am

Akari wrote:I don't thint there are special units for Cr, Cb, Y and others. It's just no place for them among unknown units.

Yes, RLE and IDCT should be same/shared for Cr, Cb, Y. I just meant that IDCT results for Cr and Cb must be memorized somewhere (in two 8x8 arrays), so the YUV-to-RGB stage can use that memorized data for merging it with Y values.

Post by **Akari** » April 1st, 2014, 1:46 am

i finished basic data lined and start trace RLE calculation logic. It looks like a lot of logic elements merges in giant spiderweb.

I don't understand the pattern yet.

Post by **Akari** » April 5th, 2014, 11:28 pm

I spent whole week with RLE calculations I met.
This sceme just add one 11 bit value to another 11 bit value and add one bit optionally. First value are shifted by 1. One additional bit is shifted two.
Value 1 = A >> 1
Value 2 = B
Carry = C >> 1
So less significant bit of value 2 just go to result without modification.

But it works very strange. In some cases sum just don't work correctly. I recheck all elements 100 times but result still the same: when carry from result bit 4 to result bit 5 and from result bit 8 to result bit 9 occures then carry performed only if carry cause by overflow of bit 4 or 8 from both input values or if all lesser bits are filled with 1 (0-3 or 0-7). in all other cases carry don't work.
Examples:

0x07 >> 1 + 0x1F = 0x2D
0x08 >> 1 + 0x10 = 0x20
but
0x06 >> 1 + 0x1F = 0x0B

Very strange behavior and looks like error. BUT if this sum used just in some special cases then this error may nether occures. Lets look where it used =)

I move on.

Post by **Akari** » April 9th, 2014, 8:28 pm

Riddle of previouse sceme solved! That was usual inverted adder. It adds two number but both numbers and result are inverted. That's why creators of chip used custom logic instead of usual Full Adders (thay are used in IDCT multiplication).

By the way I moved forward and traced a bit of control logic of RLE. Logic tightly merged with data. You can see few trigger chains which pass value down with each clock signal (Two trigger chain in upper left corner and one left from UNIT 4).

Post by **Akari** » April 19th, 2014, 4:51 pm

10 days and 200 cells later I almost finished new big calcilation block. It 3 times bigger than previous one and works with inverted values too. I still don't know calculation result (I think it will be sum as well). Next goal is to finish all inputs and outputs from this sceme and simulate it in logisim. You can see this sceme in left part of image. Blue square are cells where not all inputs and output wire is traced.

This sceme is so big. Some cells are went to part 8 of CPU.

Soon RLE will be completed :yahoo:

Post by **Akari** » April 24th, 2014, 12:33 am

New sceme succesfully simulated. This is inverted sum again, but this time for 16 bits.
Inputs are new value and previous sum result right shifted by 2.

How this is used we see a bit later when I reverse control logic.

Post by **Akari** » April 29th, 2014, 12:53 am

Main RLE datapath is finished. Now all data went from beginning (unit04) to the end (unit05). Only different controls left until all RLE will be reversed.

Last sceme do clamping from -0x7FF to +0x7FF (result is signed 12bit) and one more strange operation. It round result to closest to zero even number (except -1).
4 = 4
3 = 2
2 = 2
1 = 0
0 = 0
-1 = -1
-2 = -2
-3 = -2
-4 = -4
-5 = -4
If sceme is works in inversed logic then result is a bit stranger )
It round result to closest to zero odd number (except 0)
4 = 3
3 = 3
2 = 1
1 = 1
0 = 0
-1 = -1
-2 = -1
-3 = -3
-4 = -3
-5 = -5
I don't know what it is.

Post by **nocash** » April 29th, 2014, 1:10 am

Cool!

Akari wrote:Last sceme do clamping from -0x7FF to +0x7FF (result is signed 12bit) and one more strange operation. It round result to closest to zero even number (except -1).

Whoops that's odd. And that might be why my software tests lead me to thinking that RLE output would be only 11bit wide, not 12bit.

Akari wrote:If sceme is works in inversed logic then result is a bit stranger )
It round result to closest to zero odd number (except 0)

Strange... what do you mean by "if it works in inversed logic"?
Does it use both "normal" and "inversed" logic... and when/where/why does it use this or that logic?

Post by **Akari** » April 29th, 2014, 1:56 am

nocash wrote:Cool!

Akari wrote:Last sceme do clamping from -0x7FF to +0x7FF (result is signed 12bit) and one more strange operation. It round result to closest to zero even number (except -1).
Whoops that's odd. And that might be why my software tests lead me to thinking that RLE output would be only 11bit wide, not 12bit.

RLE result is 12 bit for sure. This way it's stored in UNIT05 and this is way it went to IDCT.

nocash wrote:
Akari wrote:If sceme is works in inversed logic then result is a bit stranger )
It round result to closest to zero odd number (except 0)
Strange... what do you mean by "if it works in inversed logic"?
Does it use both "normal" and "inversed" logic... and when/where/why does it use this or that logic?

Input and output data can be inverted to reduce propagation delay or to ease calculation logic. I don't know how data went to this part of sceme in normal or in inversed way (where 0 is 1 and 1 is 0).

By the way did you see some roundings in RLE? Can you check them out?

Playstation CPU reversing [56k Warning]

Playstation CPU reversing [56k Warning]

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Re: Playstation CPU reversing.

Who is online

Login • Register