[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Universal Processor (was Re: [oc] x86 IP Core)



> I have been thinking along those lines for quite some
> time, but have come to the conclusion that even though
> this sounds kind of nice it is not particularly practical.
>
> You would have a CPU (e.g. OpenRisc) and a translation
> block. Which means the size, and power consumption is
> much higher than that of a native x86 or 68K etc.
> Additional problems occur in latency of the code morphing
> block. Think when you take a branch, you would have to
> stall until the code morphing block can catch up.
>
> Several companies have taken this a step further and
> ave removed the traditional instruction decode block
> from the CPU and are capable of creating custom
> instruction decode units based on your requirement
> (x86, 68K etc.)
>
> So a CPU that does that might be more interesting than
> a translation block.
>
> Other companies also have written "Just In Time" compilers
> and translates that convert binaries in software to the
> desired architecture. But thats obviously very performance
> degrading unless you are running one very, very long task
> at a time ...

Like you two already mentioned there has been done a lot of research/work on 
this subject.
I was trying to use this idea for power saving issues. The idea is to use 
compressed code and decompress it on the fly into internal L0 cache (e.g. 32 
locations only). It is also possible to optimize this code quite easily, 
since 'microcode' is very orthogonal and you can quickly see which units are 
used. (I am using 'microcode' here meaning control signals for 1 clock for 
the CPU pipe).
It is a bit tricky however to keep addresses of previous instructions in the 
cache here.
Also it would be really nice if microcode would be superset of all the 
instructions of all the CPUs you are trying to simulate e.g. number of 
registers. Otherwise you have to use JIT.

If you do decode with cache line refill (L0 cache), you have additional 
latency just with cache misses with mispredicted branches.

Note that (if L0 is properly sized) this should actually save power.

best regards,
Marko



--
To unsubscribe from cores mailing list please visit http://www.opencores.org/mailinglists.shtml