sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv
cwebber`` has joined #riscv
cwebber` has quit [Ping timeout: 268 seconds]
kaph_ has joined #riscv
kaph has quit [Ping timeout: 240 seconds]
drmpeg has quit [Ping timeout: 240 seconds]
drmpeg has joined #riscv
jacklsw has joined #riscv
KombuchaKip has quit [Quit: Leaving.]
KombuchaKip has joined #riscv
cwebber`` is now known as cwebber
cwebber has quit [Changing host]
cwebber has joined #riscv
cousteau has quit [Quit: ♫ I can't forget the day I shot that network down ♫]
zjason` is now known as zjason
jacklsw has quit [Quit: Back to the real world]
jacklsw has joined #riscv
freakazoid343 has joined #riscv
freakazoid12345 has quit [Ping timeout: 245 seconds]
PyroPeter has quit [Ping timeout: 252 seconds]
freakazoid343 has quit [Ping timeout: 252 seconds]
PyroPeter has joined #riscv
riff_IRC has joined #riscv
riff_IRC has quit [Quit: PROTO-IRC v0.73a (C) 1988 NetSoft - Built on 11-13-1988 on AT&T System V]
riff-IRC has quit [Remote host closed the connection]
riff-IRC has joined #riscv
BOKALDO has joined #riscv
<Sofia> Do any processors do "dependent instruction fusing"? Ex. b := slli a imm; c := addi b imm. If yes, do they require the reuse of registers b = c to eliminate the obsolete register write or do they use multiple writebacks?
<Sofia> I'm primarily interested in current (and potential for future) RISC-V implementations. However I ask generally.
kaph_ has quit [Ping timeout: 240 seconds]
winterflaw has joined #riscv
riff-IRC has quit [Quit: PROTO-IRC v0.73a (C) 1988 NetSoft - Built on 11-13-1988 on AT&T System V]
<Xark> Sofia: Not sure how recent, but a bit of info here: https://en.wikichip.org/wiki/macro-operation_fusion
<sorear> the apollo (closed 68k clone) allegedly also does fusion for 3-address instructions and extending loads http://www.apollo-core.com/AC68080-Decoder-FUSING.pdf
<sorear> power10 has some recommended fusion pairs, there are also some in various arm impls
<sorear> in general, not architectural
wgrant has quit [Ping timeout: 260 seconds]
jacklsw has quit [Quit: Back to the real world]
jacklsw has joined #riscv
<Sofia> Xark, sorear: Thanks. Looks like the jumps and post-indexed load need to write to multiple registers.
wgrant has joined #riscv
<Sofia> And as expected, most of these macro op fusions (the term I forgot the name..) immediately hide the intermediate value with register reuse.
<Sofia> If macro-op fusion occurs, does this count as "issuing two instructions" or just one in terms of ports? Ex. if we have a 6-issue core and do mulh-mul, does this consume 1 or 2 of the 6-issue?
<dh`> if it happens in the decoder it'll be one
<dh`> which is kind of the point afaik
<Sofia> As expected. Okay.
<Sofia> So I kinda had another detail here I wanted. Umm
<Sofia> When the decoder issues a fused operation, can the decoder still keep up to issue the full 6 instructions or is it a bottleneck?
<Sofia> Or to rephrase towards the real objective. Does it suffice to know the max-issue, number of ports, distribution of operations per port, and which operations are fused; or do we also need to model more decoder details?
<Sofia> If a pair of operations are fused, does this ever reduce the set of ports available? I.e. Port#1 supports fusing but Port#2 does not.
<sorear> you need to model more decoder details.
<Sofia> Fun.
<Sofia> Is there a list of microarchitectural choices (and implications) a core can make, which an optimizing compiler would need to know?
<sorear> No.
<Sofia> :C
<sorear> https://course.ece.cmu.edu/~ece447/s15/lib/exe/fetch.php?media=cmpwrgd.pdf i suggest reading this, then imagining everything 10x more complicatedx
<Sofia> Is it correct to consider jumps (modulo branch prediction) to flush and align the 6-issue per cycle to the jump target and limit out-of-order-ness to this window? Or can out-of-order see further ahead?
<Sofia> Thanks for the link.
<sorear> out-of-orderness can see further ahead than jumps, that's the entire point
<sorear> the compiler can do scheduling within a basic block
<sorear> there's an entire set of dispatch queues between frontend and issue, generally, and no alignment is maintained
<sorear> decoding can be influenced by alignment, if you have fun decode properties (like Alpha's one-cycle decode penalty per CMOV in an aligned 16-byte group of instructions)
<Sofia> Okay. I'll go with sufficiently complex as to ignore for a long time. Hehe.
jacklsw has quit [Read error: Connection reset by peer]
<la_mettrie> platformio seems to think that mtimecmp register is a label... undefined reference to `mtimecmp'
<la_mettrie> plain assembler has no problems assembling it
<Sofia> sorear: My cost function plan is to start with a coarse class of implementations ranging from ideal yet impossible to trivial. Setting the lower bound to unbounded decode and issue (dependency graph depth), and upper bound to single issue. Use this partial order to discard all expressions unconditionally worse. Then incrementally tighten the specification for each degree of freedom (ex. removing
<Sofia> extensions, fusions, or decreasing the vector width), and discard all expressions unconditionally worse for each reduction of the specification.
<Sofia> In other words: I'd like to render the N-dimensional pareto surface of program performance to hardware implementation choices.
<Sofia> To guide hardware-software co-design or find the cheapest core to meet a user's budgets for their program.
<Sofia> I'll start with only the extensions, fusions and vector width (and other small mostly binary choices).
<dh`> there's a good chance you'll find that your relaxations aren't confluent
<Sofia> Then move on to compression-friendly register allocation, towards size and fusion-friendliness.
<Sofia> dh`: Hm?
<Sofia> Er, I meant in the tightening case that we **partition** the expressions. Not discard.
<Sofia> This should ensure confluence, no?
<Sofia> Partition as in: if we remove the B extension, we cannot use the instructions. But if we have B, then emulating them manually is obsolete.
___nick___ has joined #riscv
peeps[zen] has quit [Quit: Connection reset by peep]
peeps[zen] has joined #riscv
<Sofia> dh`: If there is any confluence issue, I will eventually find an example. Maybe find a way to fix it.
cousteau has joined #riscv
___nick___ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
___nick___ has joined #riscv
___nick___ has quit [Client Quit]
___nick___ has joined #riscv
aerkiaga has joined #riscv
ssb has quit [Ping timeout: 245 seconds]
ssb has joined #riscv
<muurkha> probably the best-known macro-op fusion is in the amd64 family, which nowadays fuse test-jump pairs
jwillikers has joined #riscv
kaph has joined #riscv
riff-IRC has joined #riscv
mahmutov has joined #riscv
Andre_H has joined #riscv
jacklsw has joined #riscv
jimbzy has quit [Remote host closed the connection]
jimbzy has joined #riscv
jimbzy has joined #riscv
jimbzy has quit [Changing host]
cwebber has quit [Ping timeout: 240 seconds]
freakazoid333 has joined #riscv
freakazoid343 has joined #riscv
freakazoid333 has quit [Ping timeout: 245 seconds]
freakazoid12345 has joined #riscv
wgrant has quit [Ping timeout: 240 seconds]
freakazoid343 has quit [Ping timeout: 252 seconds]
freakazoid333 has joined #riscv
freakazoid12345 has quit [Ping timeout: 252 seconds]
freakazoid343 has joined #riscv
freakazoid333 has quit [Ping timeout: 252 seconds]
riff-IRC has quit [Ping timeout: 240 seconds]
balrog_ has joined #riscv
Forty-Bot has quit [Ping timeout: 268 seconds]
balrog has quit [Ping timeout: 268 seconds]
freakazoid12345 has joined #riscv
freakazoid343 has quit [Ping timeout: 250 seconds]
freakazoid333 has joined #riscv
freakazoid12345 has quit [Ping timeout: 252 seconds]
Forty-Bot has joined #riscv
freakazoid343 has joined #riscv
jacklsw has quit [Read error: Connection reset by peer]
freakazoid333 has quit [Ping timeout: 250 seconds]
freakazoid12345 has joined #riscv
freakazoid343 has quit [Ping timeout: 250 seconds]
<sorear> Sofia: the scheduling models llvm and gcc use are horribly simplified, but a majority of users seem to not complain, maybe look at those for inspiration
<sorear> they really don't know about load-bearing NOPs though
kailo has quit [Ping timeout: 240 seconds]
Narrat has joined #riscv
vagrantc has joined #riscv
riff-IRC has joined #riscv
cousteau has quit [Quit: ♫ I can't forget the day I shot that network down ♫]
cousteau has joined #riscv
shicz2 has quit [Ping timeout: 256 seconds]
shicz2 has joined #riscv
freakazoid343 has joined #riscv
freakazoid12345 has quit [Ping timeout: 240 seconds]
<geist> muurkha: yeah some of the newer arm64 cores also explicitly mention macro op for cmp/branch pairs as well
shicz2 has quit [Ping timeout: 240 seconds]
shicz2 has joined #riscv
Andre_H has quit [Quit: Leaving.]
solrize has quit [Ping timeout: 256 seconds]
vagrantc has quit [Quit: leaving]
lagash has quit [Ping timeout: 252 seconds]
wgrant has joined #riscv
cwebber has joined #riscv
gdd has quit [Ping timeout: 256 seconds]
aerkiaga has quit [Remote host closed the connection]
BOKALDO has quit [Quit: Leaving]
shicz2 has quit [Ping timeout: 268 seconds]
shicz2 has joined #riscv
freakazoid12345 has joined #riscv
lagash has joined #riscv
freakazoid343 has quit [Ping timeout: 240 seconds]
jwillikers has quit [Remote host closed the connection]
___nick___ has quit [Ping timeout: 240 seconds]
mahmutov has quit [Ping timeout: 252 seconds]
solrize has joined #riscv
<dh`> sofia: sorry, fell asleep
<dh`> anyway yeah, that should be ok
<dh`> but the other thing is: at some level if that sort of thing isn't confluent it's a problem with the architecture
<dh`> it's just that such problems are very hard to detect until you step in them
peeps[zen] has quit [Read error: Connection reset by peer]
peeps[zen] has joined #riscv
freakazoid343 has joined #riscv
freakazoid12345 has quit [Ping timeout: 250 seconds]
Narrat has quit [Quit: They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance.]
riff-IRC is now known as riffmas
freakazoid12345 has joined #riscv
freakazoid343 has quit [Ping timeout: 245 seconds]
freakazoid343 has joined #riscv
freakazoid12345 has quit [Ping timeout: 240 seconds]
Ivii has joined #riscv
freakazoid12345 has joined #riscv
KombuchaKip has quit [Remote host closed the connection]
freakazoid343 has quit [Ping timeout: 245 seconds]
KombuchaKip has joined #riscv
freakazoid343 has joined #riscv
freakazoid12345 has quit [Ping timeout: 252 seconds]
mahmutov has joined #riscv
perro has quit [Ping timeout: 268 seconds]
solrize has quit [Changing host]
solrize has joined #riscv
Ivii has quit [Remote host closed the connection]
freakazoid12345 has joined #riscv
freakazoid343 has quit [Ping timeout: 240 seconds]
<jimwilson> Sofia, the SiFive u740 in Unmatched can fuse a branch around an alu instruction into a predicated alu instruction, this is supported in gcc for conditional move instructions, more recent cores support more fusion pairs but I don't know what they are
<jimwilson> Ventana VT1 supports a number of fusion pairs, lui/auipc+addi/ld, add+ld, shift+shift, there was a patch posted to the FSF gcc-patches list https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584402.html but no ventana hardware yet
freakazoid343 has joined #riscv
freakazoid12345 has quit [Ping timeout: 260 seconds]