sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv
jedix has joined #riscv
<sorear> so I don't usually draw attention to the difference
zBeeble has quit [Ping timeout: 264 seconds]
<muurkha> like a mc68000?
<sorear> I don't see what mc68000 has to do with this conversation
<muurkha> oh, I think Smecher's numbers above are from the XQRKU060
<muurkha> it was a 32-bit architecture with a 16-bit ALU
<sorear> then yes
joev has quit [Ping timeout: 246 seconds]
joev has joined #riscv
awita has joined #riscv
awita has quit [Read error: Connection reset by peer]
<muurkha> would you want to add 8-bit or 16-bit instructions?
<muurkha> so that the programmer doesn't have to pay the 4 cycles all the time
Armand has quit [Ping timeout: 246 seconds]
<sorear> the temptation is to say "yes" but I don't think any real 680xx supported single-cycle execution of FOO.W without a 32-bit ALU
joev has quit [Ping timeout: 246 seconds]
joev has joined #riscv
unsigned has quit [Quit: .]
Tenkawa has quit [Quit: Was I really ever here?]
unsigned has joined #riscv
heat has joined #riscv
TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM_ has joined #riscv
<sorear> the newest changes to Zicfilp mean that dynamic linkers will not, under any reasonable conditions, have to edit symbol addresses to skip landing pads, so I no longer have to reach out to dynamic linker people to figure out which changes to generic relocation processing are and aren't acceptable...
<dh`> oops
<dh`> er
<dh`> ping me if you need to talk to our dynamic linker people
<sorear> ...our = netbsd? been a while
<muurkha> nice!
sakman has quit [Read error: Connection reset by peer]
[Reinhilde] has quit [Quit: Bye Open Projects!]
[Reinhilde] has joined #riscv
jacklsw has joined #riscv
jacklsw has quit [Ping timeout: 260 seconds]
junaid_ has joined #riscv
paddymahoney has joined #riscv
JanC has quit [Read error: Connection reset by peer]
JanC_ has joined #riscv
JanC_ is now known as JanC
<dh`> yeah, netbsd
heat has quit [Ping timeout: 246 seconds]
<muurkha> \o/
BootLayer has joined #riscv
unsigned has quit [Quit: .]
aburgess has quit [Ping timeout: 245 seconds]
mauz has joined #riscv
junaid_ has quit [Remote host closed the connection]
mauz has quit [Quit: Leaving...]
pecastro has joined #riscv
jmdaemon has quit [Ping timeout: 245 seconds]
mauz has joined #riscv
ntwk has quit [Read error: Connection reset by peer]
ntwk has joined #riscv
mauz has quit [Ping timeout: 246 seconds]
Andre_Z has joined #riscv
aburgess has joined #riscv
vigneshr has joined #riscv
Tenkawa has joined #riscv
junaid_ has joined #riscv
heat has joined #riscv
unsigned has joined #riscv
vigneshr has quit [Quit: Connection closed for inactivity]
junaid_ has quit [Remote host closed the connection]
joev has quit [Ping timeout: 246 seconds]
joev has joined #riscv
junaid_ has joined #riscv
TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM_ has joined #riscv
heat has quit [Read error: Connection reset by peer]
heat has joined #riscv
pecastro has quit [Ping timeout: 245 seconds]
pecastro has joined #riscv
heat has quit [Remote host closed the connection]
JanC_ has joined #riscv
JanC has quit [Read error: Connection reset by peer]
JanC_ is now known as JanC
josuah has quit [Read error: Connection reset by peer]
JanC has quit [Ping timeout: 240 seconds]
JanC has joined #riscv
jmdaemon has joined #riscv
joev has quit [Ping timeout: 252 seconds]
joev has joined #riscv
JanC has quit [Ping timeout: 240 seconds]
JanC has joined #riscv
rsalveti has quit [Quit: Connection closed for inactivity]
stefanct has quit [Ping timeout: 258 seconds]
stefanct has joined #riscv
mps has quit [Quit: Lost terminal]
JanC has quit [Ping timeout: 260 seconds]
JanC has joined #riscv
mps has joined #riscv
junaid_ has quit [Remote host closed the connection]
junaid_ has joined #riscv
Armand has joined #riscv
EchelonX has joined #riscv
junaid__ has joined #riscv
junaid_ has quit [Ping timeout: 245 seconds]
BootLayer has quit [Quit: Leaving]
* cousteau` just reading the conversation from yesterday
<cousteau`> muurkha: fwiw, a Xilinx CLB LUT is a 6:1 (or 5:2) LUT. I recall reading that it is considered as equivalent to approximately 1.6 4:1 LUTs
<cousteau`> I'd estimate it's equivalent to, um, 5-ish gates (20 transistors), computing power wise
JanC has quit [Read error: Connection reset by peer]
JanC has joined #riscv
vagrantc has joined #riscv
joev has quit [Ping timeout: 246 seconds]
joev has joined #riscv
junaid__ has quit [Quit: leaving]
JanC has quit [Ping timeout: 240 seconds]
<cousteau`> (I also estimate that a 4:1 LUT is equivalent to 3-ish gates, which would fit the 1.6x factor...)
junaid_ has joined #riscv
<cousteau`> and yeah, the M branch of ARM is for Microcontrollers, as in very low performance ones, so no surprise it's so tiny. RV32EC sounds like a fair comparison.
<cousteau`> I've always wanted to make a theoretical comparison of different microarchitectures, in terms of computing power (dhrystone/coremark?), area, and maximum frequency. Like, if the arch is very powerful (many DMIPS per MHz), but the gates are connected in a dumb way and the critical path is too long, it'll have a low max frequency despite the high computing power, so less MHz
<gurki> youre vastly underestimating the cortex m stuff.
<muurkha> cousteau`: that sounds pretty plausible; is it based on your experience taking designs from Xilinx prototypes to ASIC realizations?
<gurki> the m4+ stuff is -really- sophisticated in 23.
<muurkha> gurki: tell me more!
<gurki> i have no idea why ud compare it with ec tbh
<muurkha> gurki: any insight into how much FPGA real estate a Cortex-M0 takes up?
<cousteau`> ok ok... so the M *zero* is meant to be very low performance
<muurkha> cousteau`: I don't know if I'd say *very* low performance
<muurkha> I mean it's a matter of perspective I guess
<gurki> iirc there is a public m you can integrate into xilinx (?)
JanC has joined #riscv
<muurkha> but 0.9 DMIPS/MHz with 15 32-bit GPRs and a typical fmax north of 48 MHz sounds pretty high-performance for 12000 gates to me?
<gurki> i dont want to mage guesstimation statements and i wouldnt be allowed to post numbers.
<gurki> make*
<cousteau`> muurkha: I know Xilinx FPGAs in a deep detail; been hacking with them for a long time. But the "5 gates per LUT" was a crude guesstimate. I don't have any real metric of "this design takes N gates (4N transistors) when implemented as an ASIC, and M LUTs when implemented on a Xilinx FPGA, so 1 LUT is approx N/M gates"
<muurkha> gurki: I bet your guesstimation would be a lot more reliable than mine
<muurkha> cousteau`: it seems like a pretty good guesstimate to me
<cousteau`> muurkha: well, the performance is so low that it's the lowest in the ARM family, it literally doesn't go any lower (as far as I know) for that vendor, so from that point of view, yes, the M0 is *meant* to be low performance
<muurkha> cousteau`: it's twice the DMIPS/MHz of the ARM3
<cousteau`> (I didn't mean "very low performance" as in "waugh, what a terrible piece of work ARM makes", but as in "this is what ARM gives you when your performance requirements are very low")
<gurki> please dont make generic asic gate comparisons
<gurki> im gonna cry
<muurkha> haha
<gurki> :(
<muurkha> why not?
<sorear> the M-profile is an ISA with unifying software properties, largely connected to a heavy use of memory-mapped registers instead of cp0 registers (basically csrs) and a very C-friendly interrupt model
<cousteau`> gurki: gotta measure "circuit complexity and cost" somehow...
<sorear> the top one has floating point and dual-issue, I'm not sure how the cache works so I'm not sure whether it's closer to S7 or NOEL-V
<cousteau`> gurki: did you mean "don't compare two ASIC designs in terms of gates", or "don't compare ASIC gates to FPGA LUTs"?
<muurkha> M0 and M0+ don't have cache; I forget where in the line it appears
<gurki> what kind of gate? which pdk? what level of sophistication do we have for these standard cells?
<gurki> any recent tech will offer a clusterfuck of extremely complex gates
<muurkha> what kind of complex gates?
<gurki> were not talking stuff like 2and or 2or here.
<muurkha> I'm thinking things like AOI?
<gurki> so the comparison is moot to begin with
<gurki> fpgas also have dsp blocks which are hard to translate. so what is it? some macro block? does this count as n gates?
<gurki> which effective area?
<sorear> you get rather large systematic errors if you try to use fpga cost as a proxy for asic cost or vice versa because fpgas have very cheap memory (RAM blocks are hard macros and there's a FF on ~every LUT whether you use it or not)
<gurki> it just doesnt make sebse
<muurkha> cousteau`: I also think the Cortex-M0 is comparable in size to an ATTiny AVR, but a lot higher in performance due largely to the wider ALU. I mean sometimes you don't need operations of more than 8 bits but it's pretty common
<gurki> also what sorear said
<muurkha> gurki: thank you, this is very informative
<sorear> DSPs are less of an issue if you're specifically comparing riscv cores because those tend not to use more than one multiplier
<gurki> theres a nice intel paper where they port an atom core to fpga
<muurkha> I don't think the DSP blocks are super relevant to the M0, yueah
<gurki> spoiler: its a mess
<gurki> sorear: just giving an example which obv doesnt match ;)
<gurki> muurkha: np
<muurkha> sorear: for whatever reason https://github.com/gsmecher/minimax says it uses 116 FFs and 398 LUTs; does that mean it's sort of wasting another 280 DFFs?
<cousteau`> gurki: 1 gate = 4 transistors
<gurki> oh god
<muurkha> gurki: what kinds of complex gates do you mean?
<gurki> no.
<cousteau`> 1 NOT gate = 0.5 "gates", 1 AND gate = 1.5 "gates", etc
<cousteau`> it's like a "base gate"
<cousteau`> at least that's what I have in mind when comparing things in terms of "gates"
<gurki> please read up on asic design.
<gurki> seriously. no offense.
<muurkha> do I have to sign an NDA to get the reading material?
<cousteau`> ...obviously an AND gate is exactly 1 gate, not 1.5, but this measure seems to be common as an estimation of area
<muurkha> Camenzind didn't say much in his book about standard cells, see
<sorear> you can get books on VLSI design without an NDA, the ones I read were 30-40 years old but the basic principles haven't changed that much
<cousteau`> estimates like this are seen for example in https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-draft.pdf chapter 3
<sorear> a lot more automation now, and people pretty much gave up on SSI MSI LSI VLSI ...what comes next
<muurkha> I think the stuff gurki is talking about has changed a lot
<gurki> i like "cmos vlsi design" by harrison weste
<muurkha> thanks!
<gurki> harris* sorry
<cousteau`> (btw, back in the day I got access to a real cell library and the areas were slightly off wrt what that estimate in the bitmanip document uses, but it's still a good approximation)
<muurkha> sorear: ULSI? Ku-band SI?
<cousteau`> as for different processes or different gate sizes (I don't know exactly how this works but I know that you can have "smaller" or "bigger" AND gates for the same number of inputs, depending on the timing and power properties you want), they don't really matter that much for an analysis being made at gate level
<muurkha> sure, you need bigger transistors if your fanout is larger if you're in the criticalpath
<muurkha> which means your gate charge is also larger
<cousteau`> yes but at that point you're just fine-tuning a lot, and losing generalizability in the analysis
<gurki> :p
<muurkha> maybe, depends on how dominant a factor it is
<muurkha> but I'm interested to hear what kinds of complex gates are in vogue nowadays
<gurki> it didnr make sense to begin with
<cousteau`> my idea was to have an analysis in terms of "the critical path is N gates, so the maximum frequency is going to be ...about 2x as much as this other design where the critical path is N/2 gates"
<gurki> but im not here to fight. bbl.
<muurkha> that might not be true tho, cousteau`
<sorear> when people do that level of analysis they call the unit "FO4" not "gates"
<cousteau`> muurkha: it is absolutely true, for a given range of "about" :)
JanC has quit [Read error: Connection reset by peer]
JanC has joined #riscv
<sorear> as in the nominal delay of a 2-input NAND with a fanout of 4
<muurkha> it might depend more on things like wire capacitance and routing distance than on the number of gates
<muurkha> I mean, it might not! but it can
<muurkha> gurki: I appreciate the insight you've shared
<cousteau`> well, Claire Wolf just went and called them gates...
junaid_ has quit [Remote host closed the connection]
<muurkha> she's focused on Lattice FPGAs, though, isn't she?
<muurkha> or is she doing full-custom ASICs?
<dh`> clearly the proper unit is schmates
<sorear> yosys was an asic tool before anything else
<muurkha> schmand schmates consuming two schmits
<muurkha> really? presumably a gate-array asic tool rather than full-custom, right?
<muurkha> or is there some way to interface it with standard cell libraries from TSMC or whoever?
<cousteau`> what's a gate array ASIC? like a PLD?
<sorear> considering that the "sy" is for "synthesis" you can infer standard cells are involved...
<cousteau`> Xilinx also has a "synthesis" stage...
<muurkha> no, it's a cheaper ASIC where you only have to make one or two masks instead of like 30
<sorear> there's a way, but nobody is allowed to tell me
<muurkha> the idea is that the first N layers build up a sea of gates, much like an FPGA, and then the last layer or two wires them together
<sorear> ah https://yosyshq.net/yosys/files/yosys-austrochip2013.pdf covers both standard cell asics and fpga targets
<cousteau`> I see
<cousteau`> so it's some sort of "ASIC designed as if it were a CPLD, but with fixed connections"
<muurkha> yeah
<muurkha> but more like an FPGA
<muurkha> sometimes called "semicustom"
<muurkha> sorear: thanks!
<cousteau`> hm, what's the difference? I don't quite know how a CPLD is on the inside
<cousteau`> I'm guessing it's like a matrix of gates
<cousteau`> or rather, a matrix of interconnects
<cousteau`> maybe more fully connected than the FPGA style of "this is like a NoC, so if you want to go from here to there, you can't go directly, but you can go in 3 jumps"
<muurkha> yeah, a pre-FPGA PLD is a matrix of interconnects rather than of combinational logic
<muurkha> each output pin is a sum of products of input pins
JanC has quit [Ping timeout: 246 seconds]
<cousteau`> (and, of course, an FPGA has LUTs and some more advanced logic implemented in silicon than "just a bunch of gates")
<muurkha> and a CPLD is kind of the same thing except that some of the outputs and inputs are internal to the chip
<muurkha> NoC?
<cousteau`> muurkha: ok, that was pretty much the whole extent of my knowledge on PLDs so I'm glad I got it right :)
<cousteau`> Network on Chip
<muurkha> cousteau`: I've never used a PLD so I could be wrong too
<cousteau`> (uses the same sort of topology as a NoC at least)
<muurkha> and apparently gurki is no longer here to correct us, having exhausted his patience with our caveman approximations :)
<muurkha> I think of LUTs as being simpler than a bunch of gates, though typically you also have carry chains and routing logic
<muurkha> in an FPGA cell
<cousteau`> yep
<muurkha> sorear: this paper is great, I'd never read it! thank you
<muurkha> I don't know what a network on chip is either
<cousteau`> a Xilinx CLB (of the pre-Ultrascale era, lilke 7 Series or so) has: 4 6:1 LUTs (that also work as 5:2 LUTs), a 4-bit carry chain, 4 or 8 FFs, and a few muxes so that you can combine all the 4 LUTs together into one single mega-LUT
<muurkha> oh! thanks for explaining that
<cousteau`> and in some CLBs, the LUT can also work as a 64-bit memory (distributed RAM) or as a 32-bit shift register
<cousteau`> why 32 and not 64? Well, because of reasons, I guess
<muurkha> sure, that makes a lot of sense
<muurkha> it has to be able to work as a shift register anyway during boot, right?
<cousteau`> so for example you can configure the LUTs as half-adders, and combined with the carry chain you get a 4-bit full-adder
<muurkha> so exposing that functionality seems like an obvious thing to do
<muurkha> right
<cousteau`> hmm not really, I think the LUT configuration is not "shifted in" but mapped to a config memory
<muurkha> typically it's shifted in at powerup
<cousteau`> if that's what you were thinking
<muurkha> I mean it depends on the chip! and I know zilch about Xilinx
<muurkha> I had somehow gotten the wrong idea that a CLB had one LUT rather than 4, for example
<cousteau`> I think it was shifted in on older chips, but on the ones I've played with it's mapped to config memory
<muurkha> hmm, they don't shift it in bit by bit from the config memory? so you have a Flash bit physically on chip next to each RAM LUT bit to load it at powerup?
<cousteau`> like, you can just write to this config memory, one 32-bit word per clock cycle at 100 MHz, and that'll change the config of LUTs instantly
<cousteau`> ...then again, maybe there IS a 64-cycle latency after you write into this memory
Kedleston has quit [Ping timeout: 245 seconds]
<cousteau`> but in any case. Older Xilinx FPGAs had 16-bit LUTRAMs that worked as 16-bit shift regs. But newer ons have 64-bit LUTRAMs that work as 32-bit shift regs. Like, they disabled the shiftiness of the other half.
<sorear> not all FPGAs are shift registers, some of them are big SRAMs with internal taps on the feedback loops
Kedleston has joined #riscv
<cousteau`> I remember hearing that Xilinx FPGAs "are SRAM-based". I had no idea what that meant back in the day
<sorear> lattice parts have a commodity SPI flash die inside the package for hostless boot. Microchip's have actual flash transistors wired up to the LUT instead of SRAM, which is good for SEU resistance but means that you can only test ~50 designs before you have to junk the die (wear leveling, hahaha)
<cousteau`> wow that's sad
<cousteau`> back in the day I worked with a design that self-reconfigured the FPGA thousands of times per second
<muurkha> I didn't know Microchip even had FPGAs. are they tiny and cheap?
<sorear> microchip bought microsemi bought actel, the product line has existed for ages and mostly targets aerospace
<muurkha> I like the idea of self-reconfiguring the FPGA thousands of times per second, but I don't think that's viable with Lattice parts, is it?
<muurkha> oh, they're Actel FPGAs? I guess those aren't that cheap
<muurkha> conceptually the idea of context-switching between processes by reconfiguring an FPGA makes a lot of sense to me, but I don't know what the programming model looks like for that
unsigned has quit [Quit: .]
<cousteau`> ah so "microchip FPGAs" is like "Intel" or "AMD FPGAs"
<cousteau`> nobody knows what the hell are those until they remember what the original brand name was
JanC has joined #riscv
<muurkha> AMD FPGAs?
<conchuod> xilinx ;)
<conchuod> Microchip makes a SoC FPGA with hard RISC-V cores ;)
<cousteau`> yep
<cousteau`> cool!
<cousteau`> ...actually wait, I think I already knew that
<cousteau`> or at least that Microchip was into RV
<conchuod> I dunno anything about it though
* conchuod hides
meowray has quit [Remote host closed the connection]
meowray has joined #riscv
<muurkha> makes sense
<muurkha> a CPU is a waste of perfectly good LUTs, but sometimes you'd like a little bit more state-transition control logic than is comfortable to write as a statechart
<cousteau`> and that's when you get a "chip with an FPGA in it" instead of just an FPGA
Andre_Z has quit [Quit: Leaving.]
fabs has joined #riscv
Tenkawa has quit [Quit: Was I really ever here?]
peepsalot has quit [Read error: Connection reset by peer]
aredridel has quit [Server closed connection]
aredridel has joined #riscv
peepsalot has joined #riscv
scruffyfurn has quit [Server closed connection]
scruffyfurn has joined #riscv
kaaliakahn has joined #riscv
Pokey has quit [Server closed connection]
EchelonX has quit [Quit: Leaving]