head 1.1; branch 1.1.1; access ; symbols noReleaseTag:1.1.1.1 noVendorName:1.1.1; locks ; strict; comment @# @; 1.1 date 2003.02.10.04.04.53; author doru; state Exp; branches 1.1.1.1; next ; 1.1.1.1 date 2003.02.10.04.04.53; author doru; state Exp; branches ; next ; desc @@ 1.1 log @Initial revision @ text @ Pipeline structure
Main Page   Modules   Related Pages  

Pipeline structure
[Implementation]

Shift-like flow
pAVR has a pipeline with 6 stages:
pavr_pipestruct_01.gif

Each pipeline stage is pretty much of an independent state machine.

Basically, each pipeline stage receives values from the previous one, in a shift-like flow. Only the `terminal' registers contain data actually used, the previous ones are used just for synchronization.
For example, this is how a particular hardware resource request flows through pipeline stages s3, s4 until it is processed in s5:

pavr_pipestruct_02.gif

Exceptions from this `normal' flow are the stall and flush actions, which can basically independently stall or reset to zero (force a nop into) any stage. Other exceptions are when several registers in such a chain are actually used, not only the terminal one.

Apart from the (main) pipeline stages above (stages s1-s6), there are a number of pipeline stages only needed by a few instructions (such as 16 bit arithmetic, some of the skips, returns): s61, s51, s52, s53 and s54. During these pipeline stages, the main stages are stalled.

Stages s1, s2 are common to all instructions. They bring the instruction from Program Memory (PM) into the instruction register (instruction fetch stages).
During stage s3, the instruction just read from PM is decoded. That is, the following pipeline stages (s4, s5, s6, s61, s51, s52, s53, s54) are instructed what to do, by means of dedicated registers.

At a given moment, a pipe stage stage can do one of the following actions:
Hardware resource managing
Pipeline stages can request access to hardware resources. Access to hardware resources is done via dedicated hardware resource managers (one manager per hardware resource; one VHDL process per manager).

Main hardware resources:
Only one such request can be received by a given resource at a time. If multiple accesses are requested from a resource, its access manager will assert an error during simulation; that would indicate a design bug.
The pipeline is built so that each resource is normally accessed during a fixed pipeline stage: However, exceptions can occur. For example, LPM instructions need to read PM in stage s5. Also, loads/stores must be able to read/write RF in stage s5.
Exceptions are handled at the hardware resource managers level.
Stall and Flush Unit
Because of the exceptions above, different pipeline stages can compete for a given hardware resource. A mechanism must be provided to handle hardware resource conflicts. The SFU implements this function, by arbitring hardware resource requests. The SFU stalls some instructions (some pipeline stages), while allowing others to execute.

Stall handling is done through two sets of signals:
pavr_hwres_sfu_01.gif

Each instruction has an embedded stall behavior, that is decoded by the instruction decoder.
Various instructions in the pipeline, in different execution phases, access the SFU exactly the same way they access any other hardware resources, through SFU access requests.
The SFU prioritizes stall/flush/branch/skip/nop requests and postpones younger instructions until older instructions free the hardware resources (SFU hardware resource including). The postponing process is done through the stall-flush controls, on a per-pipeline stage basis.
The `SFU rule': when a resource conflict appears, the older instruction wins.

Some instructions need to insert a nop before the instruction `wave front', for freeing hardware resources normally used by younger instructions. For example, loads must `steal' the Register File read port 1 from younger instructions.
Nops are inserted by stalling certain pipe stages and flushing other, or possibly the same, stages.
Other instructions need a nop after the instruction wave front, for the previous instruction to complete and free hardware resources. For example, stores must wait a clock, until the previous instruction frees the Register File write port.
The two situations differ pretty much from the point of view of the control structure. In the second situation, the instruction is required to stall and flush itself, which adds additional problems. These problems are solved by introducing a dedicated noping state machine in stage s4, whose only purpose is to introduce at most one nop after any instruction. On the other hand, introducing nops before an instruction wave front is straightforward, as any instruction can stall/flush younger instructions by means of SFU requests.

The specific SFU requests can be found here.
Shadowing
Let's consider the following situation: a load instruction reads the Data Memory during pipe stage s5. Suppose that next clock, an older instruction stalls s6, during which Data Memory output was supposed to be written into the Register File. After another clock, the stall is removed, and s6 requests to write the Register File, but the Data Memory output has changed during the stall. Corrupted data will be written into the Register File. With the shadow protocol, the Data Memory output is saved during the stall. When the stall is removed, the Register File is written with the saved data.

The shadow protocol
If a pipe stage is not permitted to place hardware resource requests, then mark every memory-like entity in that stage as having its output `shadowed', and write its associated shadow register with the corresponding data output. Else, mark it as `unshadowed'.
As long as the memory-like enity is marked `shadowed', it will be read (by whatever entity needs that) from its associated shadow register, rather than directly from its data output.
In order to enable shadowing during multiple, successive stalls, shadow memory-like entities only if they aren't already shadowed.

Basically, the condition that shadows a memory-like entity's output is `hardware resources are disabled during that stage'. However, there are exceptions. For example, LPM family instructions steal Program Memory access by stalling the instruction that would normally be fetched that time. By stalling, hardware resource requests become disabled in that pipe stage. Still, LPM family instructions must be able to access directly Program Memory output. Here, the PM must not be shadowed even though during its pipe stage s2 (during which PM is normally accessed) all hardware requests are disabled by default.
Fortunately, there are only a few such exceptions (holes through the shadow protocol). Overall, the shadow protocol is still a good idea, as it permits natural & automatic handling of a bunch of registers placed in delicate areas.

Todo:




Generated on Tue Dec 31 20:26:30 2002 for Pipelined AVR microcontroller by doxygen1.2.16
@ 1.1.1.1 log @Importing into repository the new directory structure. @ text @@