#amaranth-lang on 2022-11-08 — irc logs at libera.irclog.whitequark.org

2021-12-11 06:40 whitequark changed the topic of #amaranth-lang to: Amaranth hardware definition language · code https://github.com/amaranth-lang · logs https://libera.irclog.whitequark.org/amaranth-lang

00:50 lf has quit [Ping timeout: 248 seconds]

00:51 lf has joined #amaranth-lang

00:59 cesar has quit [Read error: Software caused connection abort]

00:59 cesar has joined #amaranth-lang

01:39 electronic_eel has quit [Read error: Software caused connection abort]

01:44 electronic_eel has joined #amaranth-lang

01:48 <_whitenotifier> [YoWASP/yosys] whitequark pushed 1 commit to develop [+0/-0/±1] https://github.com/YoWASP/yosys/compare/805a79256e0e...b0d3656d4ea1

01:48 <_whitenotifier> [YoWASP/yosys] whitequark b0d3656 - Update dependencies.

02:37 ovf has quit [Read error: Software caused connection abort]

02:37 ovf has joined #amaranth-lang

03:48 electronic_eel has quit [Ping timeout: 260 seconds]

03:48 electronic_eel has joined #amaranth-lang

03:54 Degi_ has joined #amaranth-lang

03:55 Degi has quit [Ping timeout: 252 seconds]

03:55 Degi_ is now known as Degi

04:36 gruetzkopf has quit [Read error: Software caused connection abort]

04:36 gruetzkopf has joined #amaranth-lang

05:52 jevinskie[m] has quit [Read error: Software caused connection abort]

05:53 jevinskie[m] has joined #amaranth-lang

06:26 bob_twinkles_ has quit [Read error: Software caused connection abort]

06:27 bob_twinkles has joined #amaranth-lang

06:47 richardeoin has quit [Ping timeout: 248 seconds]

06:47 richardeoin has joined #amaranth-lang

08:51 richardeoin has quit [Ping timeout: 260 seconds]

08:52 richardeoin has joined #amaranth-lang

14:23 Mrmaxmeier has quit [Quit: The Lounge - https://thelounge.chat]

14:24 Mrmaxmeier has joined #amaranth-lang

17:08 <d1b2> <gatin00b> I'm currently working on a project which takes an unpractical amount of time to convert to Verilog. Not being a python programmer, but wanting to learn more Amaranth internals, I'm looking for guidance on debugging the issue

18:09 Lord_Nightmare has quit [Quit: ZNC - http://znc.in]

18:11 Lord_Nightmare has joined #amaranth-lang

18:28 <tpw_rules> i feel like i remember that problem long ago with large memories and simulation

18:33 <d1b2> <gatin00b> Yeah, more likely due to the use of Array, but I don't really see a way around that. So to me, it's either we find a way to make Amaranth generate the rtlIL/verilog in a practical time frame either by finding better ways to code the design or despite the way it was coded... or I have to use something else. Since I've got time and energy, I'd rather help improve the Amaranth ecosystem than drop it because it's not mature enough for my use

18:33 <d1b2> case.

18:34 <tpw_rules> i mean how big are your Arrays?

18:34 <tpw_rules> the quirk i recall i thought was fixed though

18:35 <tpw_rules> but maybe that was just for Memory and the Array has the same underlying issue

18:35 <d1b2> <gatin00b> Array((Signal(4) for idx in range(256)))

18:35 <tpw_rules> what are you doing with that? can you not just have a 1024 bit signal and slice it up later?

18:38 <tpw_rules> iirc there's a function to do that even

18:38 <d1b2> <gatin00b> It's used as a sort of async read memory. slice accesses are dynamic(?) and this is my first Amaranth design.

18:38 <d1b2> <gatin00b> Arrays was what was recommended for the use case

18:41 <tpw_rules> are the docstrings put online anywhere?

18:42 <d1b2> <gatin00b> Is that question directed at me?

18:42 <tpw_rules> not directly

18:44 <tpw_rules> yeah you can use Value.word_select(x, 4) to replace Value[x]

18:44 <tpw_rules> maybe that can be a custom subclass of Value. or you can wait a bit for someone else to weigh in

18:44 <d1b2> <gatin00b> Good, 'cause I don't know what that meant

18:45 <tpw_rules> i mean if you want to learn python and amaranth's structure that would be something good to chew on

18:46 <tpw_rules> subclass Signal, override __getitem__ to call the super's word_select

18:46 <d1b2> <gatin00b> Thanks, I'll wait a bit, but I'm willing to give it a try and see if it improves the situation.

18:49 <d1b2> <gatin00b> Are there any tools that would allow me to trace the process to get some insight of what is going on and why the process is slow?

18:49 <tpw_rules> there is a basic profiler built into python itelf

18:50 <tpw_rules> and iirc the problem is in the python logic, not yosys

18:51 <d1b2> <gatin00b> I would think so as the process is tuck in python land generating the rtlil, and not the yosys processing

18:53 <d1b2> <gatin00b> Thanks for the help

19:22 <_whitenotifier> [YoWASP/yosys] whitequark pushed 1 commit to develop [+0/-0/±1] https://github.com/YoWASP/yosys/compare/b0d3656d4ea1...8b1bc9b30bed

19:22 <_whitenotifier> [YoWASP/yosys] whitequark 8b1bc9b - Update dependencies.

20:00 <_whitenotifier> [YoWASP/yosys] whitequark pushed 11 commits to release [+0/-0/±13] https://github.com/YoWASP/yosys/compare/32bb4208bd8e...8b1bc9b30bed

20:00 <_whitenotifier> [YoWASP/yosys] whitequark 11a2b93 - [skip ci] README: add a note on build and development platforms.

20:00 <_whitenotifier> [YoWASP/yosys] whitequark d19d108 - Update dependencies.

20:00 <_whitenotifier> [YoWASP/yosys] whitequark 1fba5d1 - Update dependencies.

20:00 <_whitenotifier> [YoWASP/yosys] ... and 8 more commits.

20:37 <d1b2> <gatin00b> And after hours of processing: amaranth._toolchain.yosys.YosysError: Could not find an acceptable Yosys binary. The `amaranth-yosys` PyPI package, if available for this platform, can be used as fallback

20:37 <d1b2> <gatin00b> That one is on me

20:38 <whitequark> that sounds like you have an exponential expansion somewhere where you're using Arrays, or where you are using part-select on LHS as a part of a compex expression

20:38 <whitequark> tpw_rules: you can't subclass Value

20:38 <whitequark> you can use ValueCastable though

20:39 <d1b2> <gatin00b> At least, now I'm sure it doesn't get stuck in an infinit loop

20:39 <d1b2> <gatin00b> I guess I am, how should I address that?

20:40 <tpw_rules> like i said i recall there being a problem in how Memory was simulated that led to this behavior. it wasn't hours though, especially for something only 1024 bits. but maybe it's being used a lot more

20:40 <whitequark> tpw_rules: gatin00b are generating Verilog, no?

20:41 <d1b2> <gatin00b> Yes

20:41 <whitequark> s/are/is/

20:41 <d1b2> <gatin00b> It's simulates within a sec I think

20:41 <d1b2> <gatin00b> Or not much more

20:41 <whitequark> yeah, it doesn't have anything to do with Memory then

20:41 <d1b2> <gatin00b> Which is why it's surprising it's so long when converting to verilog

20:42 <tpw_rules> ok. well wq is undoubtedly the expert here :)

20:42 <whitequark> gatin00b: first off: this is not really an Amaranth issue. you would have the same problem in Verilog if you attempted to use the same approach that Array expands down into

20:42 <whitequark> it is just that in Verilog you would have to write down what sounds like several gigabytes of code to do so, which people are usually unwilling to do

20:42 <whitequark> could you show me your code, please?

20:43 <d1b2> <gatin00b> https://github.com/ylm/nibblecpu

20:43 <d1b2> <gatin00b> I've found some other issue in Amaranth's elaboration, which I've side steped, but haven't commited yet.

20:44 <whitequark> aha, I see

20:44 <whitequark> in smolcpu.py, every time you use data_memory[], it expands into a 256-input multiplxeer

20:44 <whitequark> * in smolcpu.py, every time you use data_memory[], it expands into a 256-input multiplexer

20:45 <whitequark> in addition, when you use several data_memory[] in the same m.d.sync statement, it effectively expands into a 16777216-wide multiplexer

20:45 <d1b2> <gatin00b> a) That is pretty often b) why everytime?

20:45 <whitequark> * in addition, when you use several data_memory[] in the same m.d.sync statement, it effectively expands into a 16777216-input multiplexer

20:47 <tpw_rules> why did you decide to use Array instead of Memory, actually? if it's just because it's async, you can have async ports on a Memory just fine. but they won't be synthesizable to a real FPGA. not that a 16777216-input mux is either

20:47 <d1b2> <gatin00b> Okay... so stay clear from Array in my usecase. Then, I'm very curious in how the Array class should be used then

20:48 <d1b2> <gatin00b> Because it's effectively, not a memory and you're right in that is more of a 1024 bit sliced register

20:48 <d1b2> <gatin00b> That's what was recommended as an alternative to bit slicing

20:48 <whitequark> why everytime: Amaranth is a low-level language that makes no attempt to determine which branches of the `m.Switch(self.opcode)` or `m.Switch(self.operandX)` are mutually exclusive, or which indexed lookups of `data_memory[]` have the same index. so it assumes that all of the m.d.sync/m.d.comb/etc statements you have written can execute in parallel. so for every one, it emits a separate multiplexer, since they could all have different

20:48 <whitequark> inputs

20:49 <whitequark> it's a bit like a powerful macroassembler, if you're familiar with those

20:49 <whitequark> and Array is very much like a macro

20:49 <d1b2> <gatin00b> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOH

20:49 <d1b2> <gatin00b> Now it makes sense

20:50 <whitequark> I'm glad it does ^_^

20:51 <d1b2> <gatin00b> memory would really be impractical for my use case so bit slicing makes much more sense

20:51 <whitequark> you could try hoisting `data_memory[x]` for identical `x` into top-level signals (i.e. add some `m.d.comb += data_memory_x.eq(data_memory[x])`)

20:51 <tpw_rules> it's a memory though, right?

20:51 <d1b2> <gatin00b> It' complicated and ultimately not my design. Just my implementation.

20:52 <whitequark> and for things like `Cat(data_memory[0x10+(3*self.sp)-3],Cat(data_memory[0x10+(3*self.sp)-2],data_memory[0x10+(3*self.sp)-1]))`, try grabbing the entire three-nibble word as a whole and then operating on it as a bit vector

20:52 <d1b2> <gatin00b> Some part of that memory is used as GPRs

20:52 <whitequark> the second suggestion is actually the one that I think will make synthesis times on your design manageable

20:52 <whitequark> the first one, applied on top, will make it actually compact

20:53 <whitequark> tpw_rules: it's an async memory with too many read ports. ultimately using an Array here carefully is not any worse than using a Memory with async read ports, but it will actually synthesize

20:53 <whitequark> it seems like it'd be pretty large when synthesized, but since this is an existing design, I'm not sure if much can be done about that

20:55 <d1b2> <gatin00b> Alright, I'll revise the code based on that and try to write some sort of blog post on Array and memory after.

20:56 <whitequark> at some point this should be the part of first-party documentation, but yeah

20:57 <d1b2> <gatin00b> Yeah, for hardware purposes, it's not a great design. The original design is meant to be emulated on a uC and this one is really a learning exercise more than anything

20:57 <whitequark> I'm not sure if it'll actually fit into the tapeout constraints, to be honest

20:57 <whitequark> from a glance at your code, you're going to have a dozen 256-wide multiplexers

20:57 <whitequark> (at least)

20:57 <d1b2> <gatin00b> Yeah, I want to bring it there, but I'll work with what I can first then refer to you how you best want to integrate into Amaranth's doc

20:58 <d1b2> <gatin00b> I know

20:58 <whitequark> oh, now that I've looked at the terms of the contest, I think I see where a misunderstanding may lie

20:59 <whitequark> you don't actually have to have this many read ports. you could write an FSM that sequences reads and implement it with a Memory with even a single read port (which is how CPUs are often implemented)

21:00 <tpw_rules> i don't think this is an existing design

21:00 <tpw_rules> there's nothing in the contest about clocks per instruction :)

21:01 <tpw_rules> could you just implement a PIC and submit that i wonder

21:01 <whitequark> yeah, I misunderstood what "an existing design" meant in this context

21:01 <whitequark> tpw_rules: I think a PIC might not fit either?

21:01 <d1b2> <gatin00b> Yes, but first goal was not to get a practical design for tapeout, just a working design in Amaranth. Figuring out how to make tapeout happen comes after

21:02 <whitequark> I see

21:02 <tpw_rules> have you done digital logic design before?

21:02 <d1b2> <gatin00b> Contest asked to replicate the badge. I decided to make it cycle accurate

21:02 <d1b2> <gatin00b> Plenty

21:03 <d1b2> <gatin00b> This is my first amaranth design and will be my first ASIC design, but I've done plenty FPGA desgns

21:04 <adamgreig[m]> cycle accurate to the pic running it?

21:05 <d1b2> <gatin00b> I'm still on the fence about getting it to ASIC in the first place, but if it makes Matt happy and we get to prove another Amaranth design on silicon, then that's a good win

21:05 <d1b2> <gatin00b> My contract with AMD is going out in December, so learning Amaranth is more of skill development for a new job.

21:06 <d1b2> <gatin00b> No cycle accurate of the badge as documented

21:06 <whitequark> oh, nice! best of luck on your endeavours

21:10 <d1b2> <gatin00b> Thanks. In the process, I hope to help improve Amaranth.

21:10 <d1b2> <gatin00b> By the way, really looking forward your live stream

21:10 <whitequark> oh yeah! I need to set that up

21:14 <d1b2> <gatin00b> There's glimesh.tv which I think you might prefer over Twitch, for you use case, but that's really out of topic

21:15 <whitequark> oh, that actually looks really handy, thank you!

21:17 <d1b2> <gatin00b> Yeah, I was reading the tweets and figured it'd be a good match, but I really don't want to derail anything.

22:42 nelgau has quit []

22:44 nelgau has joined #amaranth-lang

23:39 cr1901 has quit [Read error: Connection reset by peer]

23:46 cr1901 has joined #amaranth-lang