[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [openrisc] OR1200 ASIC Success probabilities.



Hi Christian,

sorry for delay in replying, I was travelling.

I need to have a look at the RTL regarding spr_pc_we, I will let you know.

I'm thinking it might be good to create a branch for any optimizations so we
don't pollute the main tree. In addition I'm also thinking to work on the
regression server that will run much more test cases, also needed for other
folks that are interested in some other configurations, so a good regression
infrastructure will become very needed.

regards,
Damjan

----- Original Message -----
From: "Christian Melki" <christian.melki@axis.com>
To: <openrisc@opencores.org>
Sent: Wednesday, June 11, 2003 12:10 AM
Subject: Re: [openrisc] OR1200 ASIC Success probabilities.


> Hi again. :)
>
> > Christian,
> >
> > first I need to say that OR1200 has been intentionally split into many
> > submodules for better overview. It does not mean that many modules mean
> > slower design. You can put everything in a single module and it will be
just
> > as slow as many modules. However it is true to say if you would
synthesize
> > OR1200 submodule by submodule then results would not be good as inter
module
> > optimization might be less effective.
> >
> > It is important to point out that all the effort so far went mostly into
> > making sure the core works properly and only limited effort went into
speed
> > optimizations. All speed optimizations are welcome (I'm thinking to
create a
> > branch that could be used by people to optimize it for speed).
> > Looking at this path it looks strange indeed. The dbg module and sprs
module
> > up to genpc/spr_pc_we could be path of one path and the rest should be a
> > different path. Second path starting in genpc_taken is part of insn
fetch
> > logic, instruction cache/immu abort logic. Because OR1200 has only 5
stage
> > pipeline this path is a bit long due to a lot of different logic needed
to
> > perform all the necessary steps in insn fetch (and more important if
insn is
> > aborted, fetch stalled resulting in stall of pipeline etc etc).
>
> Yes. We took a look at that. the spr_pc_we is only used for debug unit to
> load pc with data via mtspr/mfspr? or is it used by any other logic?
> Are there any programs that could use spr to load up the pc for any
> practical purposes? Or have i got everything wrong?
>
> > It is important to take into account speed of your RAMs. As in some
cases it
> > can happen that RAMs used for data/insn caches and for MMUs are so slow
that
> > in fact those paths starting at RAMs are timing critical and not this
path
> > that you brought up.
> > So in this case I think it is possible to split this path and save 1ns.
> >
> > regards,
> > Damjan
>
> RAM will become an issue later. First we are looking at the madness going
> on inside the cpu. :) If i cant get it to do 200MHz in 0.13 without
> ramblocks.. how will it ever get better with them? I must produce a core
> that is mean and lean before even adding RAMs. This is atleast how i look
> at things. The paths though the rams are utterly predictable. ( since all
> signals will pass the clock and a limit through the ramblock is defined ).
> My biggest problem now is all the
> combinatorical paths that are possibly valid, and possibly not.. I would
> like to split those paths somehow.. Adding false paths to
> the synthesis would only add on uglyness. And I really dont want to do
> that. For example. After we did a temp hack on the spr_pc_we thingy.. i
> hit another path.. take a look at this one please.
>
> the short version:
>
> in except/clk
> out except/lr_sav
>
> in genpc/binsn_addr
> out genpc/icpu_adr_o
>
> in immu/icpu_adr_i
> out immu/icpu_err_o
>
> in if/icpu_err_i
> out if/if_stall
>
> in freeze/if_stall
> out freeze/ex_freeze
>
> in except/ex_freeze
> out except/flushpipe
>
> in ctrl/flushpipe
> out/rfwb_op
>
> ..
> Now what i dont understand is how the current address passed to the immu
> produces an err signal before it has even translated the address.
> I do know that if the address is within the same page ( ie no page cross )
> we really dont need to do another lookup. the tests for miss and fault
> are for itlb_done and hit and protection bits. as any mmu would do it. :)
> Problem is..
> The lookup should be done completely in parallell instead? Or perhaps it
> is and i can't see it. IC lookup should take 1 cyc, IMMU lookup 1 cyc
> and then we should compare data since we use diffrent parts of the
> address to IMMU and IC.
> As I see it now: immu lookup is passed to ic fsm and then after 1 more
> cycle it is compared to the tag in IC tag.
> Could someone please explain how the mmu works in a bit more detailed
> fashion? I seem to have confused myself trying to figure out
> when data arrives.. :)
>
> oh. and attached are a coulpe of pdf ( of tgif objs ) that we have
> drawn in our project. they might be quite useless to you but i like to
> have a graphical representation handy to look at. ( I havn't got
> the insta-matrix-world-from-code-view down just yet. ;)
>
> best regards
> Christian M
>

--
To unsubscribe from openrisc mailing list please visit http://www.opencores.org/mailinglists.shtml