Regurgitating what I think I learnt

Graphic based area of development (Graphics Processing Unit), including the Geometry Transform Engine (GTE), TIM, STR (MDEC), etc.
Post Reply
Tommy
Active PSXDEV User
Active PSXDEV User
Posts: 48
Joined: April 19th, 2014, 8:16 am

Regurgitating what I think I learnt

Post by Tommy » April 23rd, 2014, 2:35 am

I'm new. I had a quick read through some of the documentation last night; can anyone read my attempted summary of the GPU basics and confirm, deny, correct, annotate, etc?

There is 1mb VRAM. It forms a 1024x512 15-bit drawing area. The current screen output is some portion of that display, typically 320x240 though lower and higher resolutions are available, along with sizes more appropriate to PAL display where applicable. I can position that output within the raster area. For most of the available resolutions that gives me the space for a double buffer.

The memory is not directly addressable by the CPU. Textures and colour lookup tables go into the same 1mb. Textures are arranged in 256x256 pages. 256x256 is the maximum texture size. Textures may be 15 bit + 1 bit of alpha, 8 bit or 4 bit. The latter two index per-texture lookup tables.

There's a small texture cache that in practise makes 256x256 textures inadvisable — 64x64 4 bit is a very cache-friendly compromise. Similarly there's a tile-based cache for rendering output so drawing is more efficient per pixel if polygons are smaller and closer to square.

Primitives are always specified directly in window coordinates. They're chained together in a linked list and that list is read by the GPU directly. A limited selection of state changes can be encoded as non-primitive producing nodes in that list. They're drawn in list order. There's famously no depth buffer.

An ordering table is a list of such lists intended to allow bucket sort by depth. Primitives are added to the bucket that corresponds to their depth. The GPU can traverse an ordering table without CPU intervention, either forwards or backwards. Per-frame setup costs for a backwards traversal are lower due to hardware assistance and backwards is usually more helpful because it corresponds to drawing from back to front.

As well as primitives, ordering tables buckets can contain further ordering tables.

There's a mechanism for rejecting primitives that are entirely off screen without any pixel work. The documentation doesn't guarantee that it's 100% effective. Other geometry will be clipped per pixel; in the worst case the rejection won't happen until the actual write would have occurred but it may happen sooner.

Transparencies are, inevitably, read/modify/write rather than just write, so slower.

The important GPU calls are asynchronous and can be queued up before blocking occurs, to an extent. There are calls to wait until GPU work is complete, wait for vsync, wait for n vsyncs, etc. It looks like a traditional hsync interrupt could be arranged if you wanted classic-style sine wave distortions and the rest. Presumably that's what Jumping Jack does underwater.

The GPU can be told to cancel whatever it's doing and just stop now. You'll commonly do that to ensure you hit a certain frame rate regardless of whether you get everything drawn, e.g. because you're in interlaced mode and have only a single buffer. So at vsync you need to start the next draw immediately regardless.

They attempted to optimise for interlaced mode by having the GPU draw only odd lines of geometry during odd fields and only even lines during even fields. It forgets which it is starting within vertical retrace so the mechanism isn't entirely reliable. Nevertheless the intention is that you use interlacing as it was designed, in NTSC to provide independent images 1/60th of a second apart and slightly offset on the vertical rather than to provide a single larger image that changes only every 1/30th of a second, though the larger image is maintained in VRAM presumably because you may not update all of it and/or for subpixel accuracy reasons.

The fill rule is that a pixel belongs to a triangle if its centre is inside. If the centre is exactly on the boundary then it belongs if the pixel to its right is inside or if the pixel below it is inside. I've seen some inter-poly gaps in commercial titles so it's possible this mechanism may not quite have been implemented perfectly?

I saw no mention of automatic reverse face removal even though that's often something a GPU can do pretty much for free.

I've yet to read up on the GTE so could not comment on the intended way to do stuff preceding the pixel work.

Tommy
Active PSXDEV User
Active PSXDEV User
Posts: 48
Joined: April 19th, 2014, 8:16 am

Re: Regurgitating what I think I learnt

Post by Tommy » April 24th, 2014, 4:56 am

... and this is what I think I now know about the GTE, again just from the documentation:

The GTE sits directly on the CPU die as one of the MIPS coprocessors. So in contrast to the GPU it has no independent relationship with the system bus. It merely augments the processor instruction set (specific MIPS syntax aside). So usage is quite different: it's used synchronously on individual data items rather than asynchronously on batches of data.

Operations are supplied for:

Atomically applying a transformation matrix and projecting either one or three vertices. Points that would be behind the camera are nudged in front of it.

Directional lighting, which is the standard Lambert stuff: a dot product between the normal and the light vector to give intensity of incident light.

Interpolation and depth cueing, plus other normal vector operations (dot product, outer product, squaring, multiplying by a scalar with accumulation), with a specialised operation essentially for reverse face calculation.

Beyond that it sounds like the work of arranging primitives and stuffing ordering tables is all up to the CPU by whatever means you prefer? And any batch operations or higher level model-to-screen stuff that may be present in the various C libraries is implemented by those libraries and not in hardware?

Is my overall understanding of the CPU/GTE/GPU relationship accurate? I've yet to write a line of code; I'm the sort of person that reads the manuals first (and, later, often discovers I misread the manuals).

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest