[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Universal Processor (was Re: [oc] x86 IP Core)
This kind of "translation" could not be used for optimising the use of one
Some interface could be very complexe, and the size of the FSM needed
could be much bigger than a small risc processor and it's memory. Could it
possible to adapt a small cpu (size of the register set, type of
instructions,...) and generate the binary code in the same time from a C
For the compression stuff, i had an idea, juste to save bandwith a little
bit. Could it be possible to compress L2 cache line ? So at each L2 cache
miss, the line is loaded and decompressed but because it's compressed, you
could save some memory bus cycle. I imagine that the size of the cache
line could be too udge to be interresting.
>> I have been thinking along those lines for quite some
>> time, but have come to the conclusion that even though
>> this sounds kind of nice it is not particularly practical.
>> You would have a CPU (e.g. OpenRisc) and a translation
>> block. Which means the size, and power consumption is
>> much higher than that of a native x86 or 68K etc.
>> Additional problems occur in latency of the code morphing
>> block. Think when you take a branch, you would have to
>> stall until the code morphing block can catch up.
>> Several companies have taken this a step further and
>> ave removed the traditional instruction decode block
>> from the CPU and are capable of creating custom
>> instruction decode units based on your requirement
>> (x86, 68K etc.)
>> So a CPU that does that might be more interesting than
>> a translation block.
>> Other companies also have written "Just In Time" compilers
>> and translates that convert binaries in software to the
>> desired architecture. But thats obviously very performance
>> degrading unless you are running one very, very long task
>> at a time ...
> Like you two already mentioned there has been done a lot of
> research/work on this subject.
> I was trying to use this idea for power saving issues. The idea is to
> use compressed code and decompress it on the fly into internal L0 cache
> (e.g. 32 locations only). It is also possible to optimize this code
> quite easily, since 'microcode' is very orthogonal and you can quickly
> see which units are used. (I am using 'microcode' here meaning control
> signals for 1 clock for the CPU pipe).
> It is a bit tricky however to keep addresses of previous instructions in
> the cache here.
> Also it would be really nice if microcode would be superset of all the
> instructions of all the CPUs you are trying to simulate e.g. number of
> registers. Otherwise you have to use JIT.
> If you do decode with cache line refill (L0 cache), you have additional
> latency just with cache misses with mispredicted branches.
> Note that (if L0 is properly sized) this should actually save power.
> best regards,
> To unsubscribe from cores mailing list please visit
To unsubscribe from cores mailing list please visit http://www.opencores.org/mailinglists.shtml