<josHua[m]>
1) "no, sort of": waveform debugging is a very powerful tool! having a widely versatile iLA with good debug visibility is a very powerful tool! the downside is that you need to know everywhere you want to insert iLA stuff, but you sort of have to know that about printf, too. if I had one thing I would ask for in the FOSSI world w.r.t debugging... (full message at
<whitequark[cis]>
note that the infrastructure for this is virtually identical to that of ILA
<whitequark[cis]>
so it kind of automatically presumes you also get an ILA that works at least as well
<josHua[m]>
hm I am not sure I understand. a 'many-head printf' (i.e., NOTIFY) is roughly a many-fifo-to-one mux. an ILA is a single FIFO (or ring buffer, for that matter) with some gate conditions
<whitequark[cis]>
the Glasgow ILA is more complex than that
<whitequark[cis]>
for example it also reports transactions on the AXI-Stream-like interface between the applet and the FX2
<whitequark[cis]>
(as transactions, not just infinite streams of "ready, not valid")
<whitequark[cis]>
when I say "ILA" here I just mean "displayed as waveforms"
<josHua[m]>
nod, I have not looked at the Glasgow ILA in specific. I think of, like, the structure neutral implementation neutral ILA indeed as 'grab a bunch of signals, and display them as a waveform'
<josHua[m]>
like Xilinx Chipscope is a structure neutral implementation neutral ILA
<whitequark[cis]>
are we discussing whether chipscope is a sandwich?
<josHua[m]>
(I have no idea what the axes are.)
<whitequark[cis]>
(it is definitely a sandwich btw. mmmm, sandwich)
<josHua[m]>
or if it is an ILA, at least, yes
<whitequark[cis]>
so I guess our disagreement is over triggers and collection
<whitequark[cis]>
we agree that "a bunch of signals are sampled" and then "they are displayed as a graph [waveform diagram]"
<whitequark[cis]>
* we agree that "a bunch of signals are sampled" and then "they are displayed graphically \[as a waveform diagram\]"
<whitequark[cis]>
however I'm thinking of providing the ability, as base, to only sample signals if a condition is met, when designing an upcoming Amaranth ILA
<whitequark[cis]>
with m.If(running): m.d.sync += Sample(blah) or something
<josHua[m]>
so I guess the 'structure neutral, implementation rebel' ILA is "an ILA is something that feeds into any FIFO anywhere and outputs any waveform anywhere" and is like "printf() to an 80x25 console buffer in DRAM when connected to a CRT via a RAMDAC is an ILA"
<josHua[m]>
but yes, a 'gate to collect sample' is fine
notgull has quit [Ping timeout: 255 seconds]
<josHua[m]>
i.e., rather than 'always on clock, push data bus to FIFO', it is fine for 'always on clock, if programmable sample criterion is met, push data bus to FIFO'
<josHua[m]>
as well as 'always on clock, if programmable sample criterion is met, and it has been fewer than n samples after the trigger, push data bus to FIFO'
<whitequark[cis]>
what I mean is that if you have the infra to wire all of those signals (from printfs or not) to FIFOs, generate these FIFOs on demand, handle overflows, connect them to a narrower FIFO, do some sort of compression, inject this into your existing design, etcetc you can use it equally well for prints or for waveform capture
<whitequark[cis]>
the only thing that's really different is how you display it. so if you turn the prints into text on the FPGA, which you could and it would be useful then you probably don't want to display waveforms while you're at it
<whitequark[cis]>
but if you turn the prints into text on the PC, you might as well draw some waveforms while you're at it
<josHua[m]>
yeah, I think that's a fine approach (and is, basically, what NOTIFY is), but I'm not sure it is a hugely useful paradigm for debugging digital logic?
<whitequark[cis]>
huh? it degenerates into chipscope if you use just one trigger
<whitequark[cis]>
how could it be not useful and chipscope be useful if the former is a strict superset of the latter
<josHua[m]>
the mode of thought is that rather than collecting a single data bus of n bits on a trigger, you may be collecting n events on a trigger. if you're triggering every single clock, and you can only push one trigger to the main fifo per clock, you sort of end up in trouble (or you need very large event-input FIFOs)
<whitequark[cis]>
consider: 1) each of those events can include an entire data bus
<whitequark[cis]>
2) if your triggers aren't fixed at build time to "one LUT configurable over JTAG and one really plain mux per channel" you don't have to continuously capture into a tiny buffer after a trigger
<whitequark[cis]>
you can just discard all of the samples you don't care about, provided the timing relationship (absolute timestamps or even just "was there more than one cycle dropped") is preserved
<whitequark[cis]>
so, yes, you might easily end up with a stupidly wide FIFO, which you probably want to be shallow. if you're targeting a xilinx platform you want it to be probably at most like 16 entries deep in order to maximize the use of a single slice per bit
<josHua[m]>
I guess mostly I am just not sold on it for debugging because I do not see why it is more useful than an appropriately-powerful single-headed ILA, and has certainly a lot more cognitive load than just "drop a printf" 🙂 but I am a grognard stuck in my ways and I would be interested to be wrong!
<whitequark[cis]>
if you know how to use an ILA and can read waveforms easily you aren't really the target audience
<josHua[m]>
I think also the sorts of problems I debug in my digital logic designs are the sorts of things where I have a fundamental assumption wrong, and my debugging bandwidth is not increased by having a dramatically more powerful debugger that can see more things at once (beyond a point), but by having more debug inputs available, and by, like, staring harder into a blank screen in hope or something
<josHua[m]>
yeah, it may be the case that this is not for me. I guess the caution in *that* case is that reading waveforms is sort of necessary for debugging digital logic. the core concept of digital logic being "everything happens at the same time, and is a circuit" is the problem that I find many software engineers running into. providing more tooling to reinforce the illusion that digital logic can be described imperatively, IMO, is sort of
<josHua[m]>
doing those engineers a disservice; the higher quality tooling we can provide that makes it easier to understand that everything happens in parallel, the better of a service we do these early-stage digital logic designers, and the sooner they will stop writing imperative code under the hope that it might turn into something synthesizable
redstarcomrade has joined #glasgow
redstarcomrade has quit [Changing host]
redstarcomrade has joined #glasgow
<whitequark[cis]>
actually, there's one big thing that's not covered by this
<whitequark[cis]>
well, no, first I have to remark that I don't care about legacy languages about Verilog, I care about Amaranth, and Amaranth doesn't have non-synthesizable constructs (even in simulation)
<whitequark[cis]>
so the entire problem you describe just doesn't exist
<whitequark[cis]>
the closest they'll come to it is writing a logic loop and trying to understand why it's a hard error
<josHua[m]>
sure it does. by 'non-synthesizable' I mean also 'typing in programs that map to gates, but with a critical path measured in milliseconds, or that map to more gates than have ever existed on any FPGA ever made', or something like that
<whitequark[cis]>
that's unplaceable or unroutable thank you very much
<whitequark[cis]>
I would expect an expert designer like you to know the difference :p
<josHua[m]>
I am but a novice
<whitequark[cis]>
oh, you didn't do HDL at NVIDIA?
<whitequark[cis]>
anyway, I think the difference is really important because "a program that has entirely well-defined, deterministic execution but doesn't fit onto your device" is a far cry off of "a program that fits on your device but performs complete garbage unrelated to designer intent when ran"
<josHua[m]>
I did architecture at nvidia, and the RTL engineer who wrote the Viva (write-only perl that generated verilog) sat one cube away from me. but I do write HDL professionally now for my clients, but they hire me because they don't have a real FPGA development team 😉
<whitequark[cis]>
anyway, the thing thing that's not covered by any of this discussion is that you might even know about waveforms, and use them in simulation, and use testbenches, but then you run your design on an FPGA for the very first time, and nothing works, because suddenly you have to care about IO
<josHua[m]>
(I think I do a fine job at it. but maybe my point is that we are all novices.)
<josHua[m]>
yeah, "nothing works because you have to care about IO" is a very relatable feeling
<whitequark[cis]>
and maybe you don't know that buttons bounce. so now you spend two weeks figuring out how the fuck you get an ILA to work with yosys+nextpnr and now you give up and never touch FPGAs again for two years
<josHua[m]>
yep, I am in deep deep deep agreement with this
<whitequark[cis]>
meanwhile this capability only needs to print two (2) messages over UART for one (1) button press to resolve that very common misconception
<whitequark[cis]>
basically, if we could magically give everyone an ILA that doesn't need setup we should do that. but empirically we can't
<josHua[m]>
sure, I guess I agree with that also
<whitequark[cis]>
and to the extent that we can, it will also dramatically improve the efficiency of prints at the same time
<whitequark[cis]>
like... you just capture all of the data bits of all of the print arguments, plus one enable per print
<whitequark[cis]>
is that wide? yes, but I've also seen professional designers casually throw three chipscopes each at a 64-bit AXI bus
<whitequark[cis]>
so like... clearly a few hundred bits is basically change. I think it still ran at close to a hundred MHz (on a vu19p but still)
<josHua[m]>
I guess my argument is that 'an ILA that doesn't need [much] setup that dumps events to a listener of some kind that feeds them into waveforms in gtkwave' is a much more well-constrained problem than 'being able to print arbitrarily formatted things from many sources', and if I got to choose which I would spend someone's time on, I would choose to spend their time on the first one. on the other hand, *I am not the one doing the work*, so
<josHua[m]>
my opinion is maybe interesting to me, but I wouldn't want to stop someone from doing the whole thing if they were gonna do it
<whitequark[cis]>
but ... formatting prints is way easier than drawing waveforms
<whitequark[cis]>
have you looked into gtkwave?
<whitequark[cis]>
have you used gtkwave?
<whitequark[cis]>
the second question is not serious, the first one is
<josHua[m]>
I have spent years resisting trash talking gtkwave because I absolutely do not want to write one myself.
<josHua[m]>
but I would imagine it is an enormous amount of GTK and C.
<whitequark[cis]>
of some of the most garbage C you'll ever see in your life
<whitequark[cis]>
I think I particularly liked the place where it had to replicate MSVC bitfield ABI on Cygwin because it committed to using that as the stable SHM ABI
<whitequark[cis]>
deeply unserious application
<josHua[m]>
I would be happy to just have a vcd dumped out and be able to put a waveform viewer of my choice at it. there are a handful of other vcd viewers that are starting to become a thing
<whitequark[cis]>
I would not be happy implementing that because VCD is also a trash-tier format
<whitequark[cis]>
how do I represent an FSM state machine name in VCD?
<whitequark[cis]>
or an enum value?
<whitequark[cis]>
that's fucking right, you can't.
<whitequark[cis]>
because who would ever want to do that, right?
<josHua[m]>
(there are not a handful of other vcd viewers that are finishing becoming a thing, because as soon as everyone writes a renderer, they go "well, that sure was the fun part!" and then they stop working on it)
<whitequark[cis]>
and also the whole concept of VCD where you save a twenty gigabyte full view of your simulation because it runs like dogshit and you leave it overnight only to then go and try to dissect it using your equally dogshit viewer is so 80s I think it will spontaneously manifest a disco ball if you leave it for a few weeks more to itself
<josHua[m]>
ideally the vcd or whatever contains raw data and you have an interpretation map, but yes, I indeed get the problem with existing VCD viewer tooling
<josHua[m]>
in the specific case of an ILA output, a VCD makes somewhat more sense because you just don't have that much bandwidth out of the chip to the host
<whitequark[cis]>
(yes it works with Surfer and the simulation engine that I wrote, yes it will be integrated with Amaranth)
<whitequark[cis]>
it's not about the bandwidth, it's about the utter poverty of the data representation
<whitequark[cis]>
I have a language with advanced 90s concepts such as "structs" and "enums", what do I do now? these are the things that exist on the internal buses
<josHua[m]>
at some point, there are bits in flops and on wires, and arguably that is what exists on the internal bus, and anything else is a view on that
<josHua[m]>
abstractions are useful but there are things underneath them
<josHua[m]>
he says, sweeping some electron counts and metastability under the rug
<whitequark[cis]>
josHua[m]: so why don't you use flop geographical locations as your VCD signal names?
<whitequark[cis]>
the wire names are pure abstractions too, they sure don't exist on the device
<whitequark[cis]>
or "buses" for that matter
<josHua[m]>
it wouldn't be an indefensible decision for an ILA's internal representation output
<whitequark[cis]>
wrong. it would be
<whitequark[cis]>
ILAs exist to serve the designer. if I am the designer and I put an enum inside a chip my debugging tools better fucking show me enum values so I don't have to print it out and hold it next to my waveform
<josHua[m]>
the tools can show you that, that's fine
<whitequark[cis]>
sure, and you're talking about using VCD, which directly impedes that
<whitequark[cis]>
it's not even a good representation for a bunch of bit level changes, with an overhead of at least 24x
<josHua[m]>
my argument here is that 'strictly reading a bit-level debug definition and displaying to the user is wrong', not 'bit-level debug definitions should be represented in the output to the user'
<whitequark[cis]>
have more ambition. ask more of your tools. don't settle for this garbage.
<josHua[m]>
yeah, I also use 'VCD' to mean in this case 'something VCD-like' (FST2, LXT, whatever are fine)
<whitequark[cis]>
those are even worse because at least VCD has a format description (specification, if we're being generous)
<josHua[m]>
but I think my claim is that the issue is not the conceptual time-series mapping of bits, but the issue is that gtkwave and its ilk are not willing to do something more interesting with them
<josHua[m]>
I am perfectly happy to have a CXXRTL time-series bit mapping as the intermediate representation that some tool interprets on my behalf
<josHua[m]>
I use VCD a lot because pyvcd 1) sucks, but 2) exists, and so I use it as an interchange format that I convert Saleae traces to, output from my simulations, have renderers that take as input, use as input to my simulations, ...
<whitequark[cis]>
well if we leave aside the case of "the FPGA directly transmits data by UART", then prints end up in a time series database one way or another
<whitequark[cis]>
you definitely still want to implement it as a textual log if what you're hunting is transactions on AXI, those are no fun to read off waveforms
<whitequark[cis]>
(yes, you have to if your peripheral is completely broken and can't pipeline properly or hangs, but most of the time doing SoC debugging it's not that)
<josHua[m]>
yeah, that's fine. one challenge is representing that prints very well happen in parallel, which is the kind of thing a waveform is a good representation of, but a linear stream of prints is not (i.e., a linear stream of prints connotes to the user's brain -- or, at least, to my brain -- that these things happened in an order [even if these things might have happened in parallel] -- and that the system is doing one thing or the other
<josHua[m]>
thing, but not both at the same time)
<josHua[m]>
yeah I agree with this also that a "wireshark log" for AXI transactions would kind of be nice.
<josHua[m]>
yeah, that's exactly the problem! 'blah' and 'bleh' "look like" they happened in an order, but they did not
<whitequark[cis]>
they sort of do happen in an order actually (the order in which the frontend has parsed your code)
<josHua[m]>
and again, this is not a problem for an advanced designer, but for a novice designer, it is reinforcing "brainshapes" of trying to do imperative programming in an inherently parallel architecture
<josHua[m]>
yes. that's the problem. the frontend enforces order where order of the underlying events does not exist
<whitequark[cis]>
these things look like they happen in an order, because they do; they don't happen in parallel; reordering them would have observably different result and this is guaranteed by the language
<whitequark[cis]>
(Verilog does this too, but in a slightly different way)
<whitequark[cis]>
and then you want the pieces to assemble on a line nicely so you can actually read them
<whitequark[cis]>
this is why the frontend enforces order
<whitequark[cis]>
yes, the data flow graph is inherently parallel, but our coding mechanisms are part-parallel, part-sequential
<josHua[m]>
if you want to enforce that ordering on your fifo engine, I suppose, though that adds substantial complication
<whitequark[cis]>
it adds to the FIFO as Cat(print1_en, print2_en, txid, print3_en), in the most general case
<whitequark[cis]>
the interesting thing about prints is that they are roughly as cheap to capture as just some random signals you want to see on your ILA, and also designing your ILA around storing only the signals whose capture you have enabled lets you do some really deep captures if the duty cycle isn't very high
<whitequark[cis]>
which is absolutely perfect for a beginner whose problem is IO
<josHua[m]>
ok, yeah, I see, that is fine, as long as all of your prints are nearby each other. if you have this in many places across the design then you have to Cat() them all together in the same FIFO -- or you have to ensure sequential reordering in the output renderer of adjacent main-FIFO reads
<whitequark[cis]>
"nearby" lexically or in terms of placement?
<josHua[m]>
lexically
<josHua[m]>
and, I guess, also, in terms of placement, if they are not going into the same FIFO
jstein has joined #glasgow
<tpw_rules>
whitequark[cis]: is the debug server related to the VS code thing?
<Wanda[cis]>
it's the backend for that, yes
<whitequark[cis]>
though it's already more wide than that (surfer can talk to it, and i've heard of other projects wanting to adopt)
<tpw_rules>
ok, so there's already non-vscode frontends
<tpw_rules>
i need to try out surfer. and cxxrtl in general
<whitequark[cis]>
turns out that designing an actually good... or even like semi-decent... protocol for talking to a time series database with waveforms is shockingly popular
<tpw_rules>
(is there any documentation anywhere or should i just start digging)
<tpw_rules>
oh, the manual help looks thorough, i'll give it a try. thanks for everything again!
<tpw_rules>
(time to tie this to cython)
<whitequark[cis]>
yeah pretty much everything on that page should still work
<whitequark[cis]>
cython... T__T
<whitequark[cis]>
there's a ctypes binding in the cxxsim branch that'll still work if you want that
<tpw_rules>
maaaaybe
<whitequark[cis]>
cython doesn't seem to give you c++ so you're limited by the c api which has basically not been extended since anyway
<whitequark[cis]>
oh no it does
<whitequark[cis]>
maybe it makes sense then
<tpw_rules>
idk i'll see, i was just trolling a bit and being sleep deprived. goodnight
<whitequark[cis]>
night!
galibert[m] has joined #glasgow
<galibert[m]>
Catherine: I've done interfacing of c++ with python if there's a need for help there. I remember it being unpleasant compared to lua, but perfectly doable
<whitequark[cis]>
um... which direction?
<galibert[m]>
both iirc
<galibert[m]>
but mostly controlling the program from python
<galibert[m]>
including exposing internal C++ classes as opaque object + methods
<galibert[m]>
but I also did some "running mathplotlib from the program and grabbing the result"
<whitequark[cis]>
how did you do that? (not mathplotlib)
<galibert[m]>
It's BSD, use whatever you need if you need anything
<whitequark[cis]>
ok, so this is just normal use of the Python C API (and mostly in the wrong direction)
<galibert[m]>
yes, it is, but it works :-)
<galibert[m]>
Define "wrong direction" though?
<whitequark[cis]>
well you're mostly calling Python from C++
<whitequark[cis]>
(Python C API from C++ code; they can obviously then call each other)
<whitequark[cis]>
CXXRTL isn't going to have any use of the Python C API in it so this is all irrelevant
<galibert[m]>
well, cxxrtl can present an import-able library
<galibert[m]>
that code generates a .so one can import
<galibert[m]>
but yeah, it puts all the interfacing on the c++ side
<whitequark[cis]>
it does present an import-able library
<whitequark[cis]>
with a C API
<whitequark[cis]>
unfortunately the C API (or the old testbench API) isn't expressive enough to represent any interesting trigger condition or process
<galibert[m]>
ah yeah, you'd need to bytecode it some way, urgh
<galibert[m]>
dunno if you'd need a JIT too for performance, perhaps a minimal one hooked to asmjit would suffice
<whitequark[cis]>
i think just being able to say "call me back on next clock tick when this variable is 1 with mask of 1" will go like 90% of the way there
<whitequark[cis]>
technically that's bytecode but also not
<galibert[m]>
yeah
<whitequark[cis]>
that and the ability to drive a clock from C++ and not Python
vegard_e[m] has joined #glasgow
<vegard_e[m]>
pybind11 was reasonably pleasant last I used it
<whitequark[cis]>
again, wrong direction
<whitequark[cis]>
I really don't want myself to be a complaint point for "the vendor did something fucked with Python headers and now cxxsim does not work"
<FireFly>
oh yeah, that's fun, I'll probably move in like 2-3 months so we'll see how that meshes with shipping :p
<sorear>
beautiful
jslcom[m] has joined #glasgow
<jslcom[m]>
Great to see the progress! While browsing the CroudSupply site, I looked at Pitor's profile and was surprised to see that he's in Eugene, OR. That was my dad's home town. Nice place.