symptom: NEG instruction computes the H flag via other formula than that given in the AVR Instruction Set (H=R3+Rd3).
Where is the bug, in the simulator or in the document, it's up to be seen.
Versions 3.53 and 4.04 of AVRStudio behave the same (weird) way.
Example: initially having SREG=0x01 and R10=0xD9, NEG R10 sets SREG to 0x01 instead of 0x21.
The AVRStudio formula for H seems to be R3*(not Rd3) rather than R3+Rd3.
symptom: when trying to set/reset port A pins, there is a 1 clock delay between the moment PORTA receives the bits and the moment PINA gets updated. Those events should have been simultaneous (of course, port A direction was considered already configured as output, by setting DDRA(i)=1).
28-31 July 2002
The Program Memory and Program Counter are handled in different places, even though they share much functionality. Moreover, the Program Counter doesn't have associated an explicit manager. This makes PM and PC quite difficult to maintain.
Reorganized PM and PC handling. Now they are handled by a common manager, the PM manager.
Every test runs smoothly so far.
27 July 2002
The Stall and Flush Unit and Shadow Manager are difficult to maintain because of too many rules and exceptions.
Reorganized the SFU so that its behavior follows only one rule, the so-called `SFU rule': older hardware resource requests have priority over younger ones.
Reorganized the Shadow Manager so that its behavior accurately implements the shadow protocol. However, a few exceptions still exist (such as LPM Program Memory handling or CPSE RF handling).
*** Modelsim 5.3 behaves strange again.
It asserts hardware managers warnings, but when the the local conditions are investigated, the situation is perfectly legal. It seems that at a moment when a signal has a 0-1 transition and another one has a 1-0 transition, there is a `small' (theoretically 0) amount of time that both signals are considered 1, and that transient triggers the warning. That shouldn't happen, it seems to be a Modelsim bug.
However, trying to reproduce that behavior was unsuccessfull. It only appears sometimes; the apparition rule is well hidden.
For now, it's best to ignore these warnings during simulation. However, it means that those assertions don't fullfill their purpose.
25 July 2002
symptom: IJMP and EIJMP don't jump were they are supposed to, if the instruction before them modifies the Z pointer.
remedy: IJMP and EIJMP actually jump before even the BPU gets updated by the previous instruction. As they use the Register File mapped Z pointer for finding target address, they need to be calmed down for a clock (Z pointer is modified in stage s5).
Just request a nop in pipe stage s4. Now IJMP and EIJMP take 4 clocks (RJMP and JMP still take 3).
symptom: loads don't work any more (!). They (sometimes) get garbage.
remedy: when correcting bug 021, the shadow protocol was applied for all devices that could use it. It was wrong. The Data Address Calculation Unit must not use the shadow protocol, because it gets RF/IOF/DM exclusivity by means of stalling, and it must be granted access to these resources, even during stalls.
When trying to read from Unified Memory, loads got data from shadow registers, not directly from the RF/IOF/DM 's data out.
status: corrected DACU
symptom: JMP gets corrupted if the previous instruction is a load.
remedy: JMP is a 32 bit instruction. The second word (a 16 bit constant) can get flushed by a previous instruction stall s5.
Flush s2 requested in s3 and s4 are more delicate than other flushes. They can interfere with stalls requested by older instructions. They must be stallable because older instructions might want that. If stall s2 requested in s3 or s4, then if older instructions require stall, don't blindly flush s2, but rather do nothing and wait for the stall to end. Only after that acknowledge the flush.
symptom: CPSE doesn't skip the following instruction, when it should.
remedy: the skip condition was picked as `not zero flag', instead of `zero flag'.
symptom: SBIC and SBIS don't do their job.
remedy: IOF read access was simply not requestd.
status: corrected the Instruction Decoder by placing an IOF request in pipe stage s5, for SBIC and SBIS
symptom: RCALL doesn't work.
the 12 bit relative offset wasn't initialized in the Instruction Decoder. Just do that (cut&paste the corresponding code line from RJMP, as the relative jump address is placed in the same bits in the instruction code).
the return address was correct for CALL but bigger with one than needed for RCALL. Actually, CALL and RCALL need different return addresses, as CALL has 32 bits and RCALL only 16.
Modification: now, the current instruction's PC is conditionally incremented in pipe stage s4. A new set of wires and registers were introduced so that CALL can request to increment its return address. RCALL doesn't need to do that.
status: corrected the Instruction Decoder, so that CALL requires to increment its return address.
note: all instructions seem to work.
24 July 2002
symptom: garbage got by loads placed immediately after stores that modify their pointer.
remedy: loads and stores can modify their data pointer. However, the Bypass Unit must also be updated, because the pointer registers are placed in the Register File. The BPU wasn't updated.
note: the modularity of the design (separate hardware managers, small set of conventions regarding signal naming, grouping similar-function code) payed off. This bug required an intervention spread out over half megabyte of code. The Data Address Calculation Unit, Bypass Unit were modifed, new wires and registers were defined, some of them were renamed.
symptom: stores that modify their pointer make the following instruction unable to update the Bypass Unit. Moreover, the BPU is written with garbage.
remedy: Stores and the instruction after them can require to simultaneousely write the BPU. That's because these stores make intensive use of BPU and eat all its write resources. They write 3 bytes: 2 of them in s5 (the modified pointer) and 1 in s6 (the data to be written into the Register File). The one written in s6 can be simultaneous with following instruction's s5 write BPU request.
To correct this bug, there are 2 options:
1. add a stall in pipe stage s5 for all stores. That is, stores will take 3 clocks.
2. increase BPU width from 2 chains to 3 chains and modify the way stores make use of Bypass Unit (write all what has to be written - 3 bytes - in the same pipe stage, s5). This is more attractive because stores still need only 2 clocks. However, the Bypass Unit continues to grow (from initial depth/width of 2/2 to the present 4/3).
Option 2 was chosen.
The Unified Memory architecture favorized this bug. Stores must be able to write the Register File and, consequently, write their data into BPU along with the pointer they have modified.
symptom: LPM always returns 0.
remedy: multiple bug:
The LPM stalled s2, then read s2 status. Seeing it `busy', gives up from reading what it needed and maintains pavr_pm_addr_int at its present value. The Program Memory Manager needs to be instructed to forcedly grant access to LPM instructions to s2, even if it is stalled. Also, the shadow protocol must be bypassed.
LPM didn't update BPU.
pointer registers were used directly in a few hardware managers, not via BPU. This enables subtle read before write hazards (they escaped until now).
LD Rd, -X; LD Rd, X; LD Rd, X+;
LD Rd, -Y; LD Rd, Y; LD Rd, Y+; LDD Rd, Y+q;
LD Rd, -Z; LD Rd, Z; LD Rd, Z+; LDD Rd, Z+q;
ST -X, Rr; ST X, Rr; ST X+, Rr;
ST -Y, Rr; ST Y, Rr; ST Y+, Rr; STD Y+q, Rr;
ST -Z, Rr; ST Z, Rr; ST Z+, Rr; STD Z+q, Rr;
LPM; LPM Rd, Z; LPM Rd, Z+
seem to work.
23 July 2002
symptom: read before write data hazards
remedy: BLD instruction didn't update BPU.
symptom: BLD doesn't modify the target register.
remedy: while processing pavr_s5_iof_rq IOF request, the IOF Manager set IOF bit address to zero instead of pavr_s5_iof_bitaddr. Correct that.
symptom: Even though they work fine separately, POP, PUSH and MOVW one after another (in various combinations) don't.
remedy: This is a triple (!) bug:
MOVW requires a stall in s6 while POP requires a stall in s5. The two stalls are simultaneous.
The Stall and Flush Unit doesn't handle properly multiple stalls.
Modify SFU so that the oldest stall doesn't kill the younger one(s), but only delays it (them).
The SP was incremented during a stall, and the DACU received after the stall a wrong pointer (the new SP).
All hardware resources must be stallable. Presently they are not.
The instruction after MOVW, PUSH is skipped. The PM data out shadow register doesn't do its job.
The shadow registers are updated every clock. That's not right.
Update them only if they don't already hold meaningful data (check the corresponding `shadow_active' flag). Otherwise, during successive stalls they get corrupted.
This was a tough one.
symptom: the sequence
LDI R17, 0xC3
ST Z+, R17
results in storing garbage into memory.
remedy: the nop requests (placed by ST) increase the needed BPU depth with one. Thus, BPU depth must be increased from 3 to 4.
CBI, SBI, BST, BLD, MOVW, IN, OUT, PUSH, POP, LDS, STS seem to work.
22 July 2002
symptom: DEC does in fact INC
remedy: ALU operand 2 is selected as -1 in pipe stage s5, and then, the DEC-related code does out=op1-op2, which results in out=op1+1.
Just make the ALU treat INC and DEC the same way (that is, out=op1+op2).
symptom: BPU doesn't do its job.
remedy: stupid and time costly bug, generated by a (too) quick cut and paste in the BPU code.
note: Modelsim PE/Plus 5.3a_p1 has a cache problem. After correcting this bug, the same results came after recompiling and restarting the simultation. It was enough to close Modelsim and open the project again for things to go fine. It's not the first time Modelsim behaves this way.
symptom: Z flag is computed wrongly for ALU opcodes that need 8 bit substraction with carry.
symptom: Z flag is computed wrongly for all ALU opcodes (!).
remedy: instead of and-ing the negated bits of output, Z output was computed by and-ing output's bits.
symptom: read before write data hazards related to IN instruction
remedy: IN doesn't write the Bypass Unit. Do that. Nasty one, requiring new wires and registers.
note: the shadow manager was completed. Pretty much code, hopefully with no new bugs.
remedy: the Bypass depth was increased from 2 to 3. Design bug.
To update the documentation!
symptom: the 16 bit arithmetic instructions write only the lower byte of the result in the Register File if the next few instructions aren't nops.
remedy: 16 bit arithmetic instructions stalled s6. During stalling s6, the Bypass flushed a value that was needed later. A signal was needed that can stall the BPU. Now, the stall s6 requests also stall the BPU.
Pretty triky design bug.
To update the documentation!
symptom: stalls needed by 16 bit arithmetic instructions induce the replacement of the instruction placed 4 clocks later by a nop
remedy: shadow registers were assigned, but never used. PM data out, (and consequently, the instruction register) read a nop instead the correct data that was read during the stall. Now the pipeline uses shadow registers related by PM data out.
The other shadow registers (related to DM, RF, IOF and DACU data out) are still unused!
To update the documentation with shadow-related issues!
ADD, ADC, ADIW, SUB, SUBI, SBC, SBIW seem to work.
15 July 2002
remedy: reporting this bug was a bug. The Register File works fine. This bug report was generated by modifying X register (RF addr 27:26) and expecting that RF bulk data (RF addr 0...25) to be modified, which won't happen.
remedy: DACU data out was duplicated, with 2 different names: pavr_dacu_do and pavr_s6_dacudo. pavr_dacu_do was only writen, and pavr_s6_dacudo was only read. When RET tried to read the return address from DACU, it got garbage, because it read DACU data out from pavr_s6_dacudo, that was not assigned any value.
Cut out pavr_s6_dacudo. DACU data out is now unique, for both read an write (that is, pavr_dacu_do). Also, the documentation was updated.
symptom: CALL doesn't work.
remedy: in the SP Manager, pavr_s5_calldec_spwr_rq was writen twice, and pavr_s52_calldec_spwr_rq wasn't writen at all, because of a less careful cut-and-paste. As a result, during CALL, PC's lsByte was not stored.
symptom: ALU flags are not defined.
remedy: ALU flags in was not connected to SREG (zero-level assignment)
RET, CALL seem to work.
pAVR runs its first complete program (12 instructions).
13 July 2002
symptom: RET is a mess
remedy: during nop requests, stall must have higher priority that flush in s2. The Stall Manager (the nop request-related lines) must take care of that.
symptom: RF seems to be unable to write other registers than pointer registers.
status: NOT corrected!
symptom: RET is still a mess.
status: NOT corrected!
bugs pool: 004, 005
27 June 2002
symptom: read before write data hazards. Hmm, this kind of bugs shouldn't have occured.
remedy: LDI didn't update BPU0. Just do that.
symptom: while reading the code, something was smelling bad.
remedy: the code that computes the branch/skip conditions was not writen at all.
The controller has successfully executed its first instruction (a RJMP)! However, it was the only...
The kernel seems to be easy to debug thanks to its regular structure.
RJMP, LDI, NOP seem to work.
Generated on Tue Dec 31 20:26:31 2002 for Pipelined AVR microcontroller by
@Importing into repository the new directory structure.