freakazoid343 has quit [Ping timeout: 252 seconds]
Tranmi has joined #riscv
Nartim has joined #riscv
Tranmi has quit [Ping timeout: 260 seconds]
jjido has joined #riscv
jacklsw has quit [Quit: Back to the real life]
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
jamtorus has joined #riscv
jellydonut has quit [Ping timeout: 252 seconds]
jamtorus is now known as jellydonut
mahmutov has joined #riscv
Nartim has quit [Quit: Leaving]
aburgess has joined #riscv
Narrat has joined #riscv
mmohammadi9812 has joined #riscv
jjido has joined #riscv
mmohammadi9812 has quit [Killed (NickServ (GHOST command used by mohammadi9812m!~Mohammad@2.178.201.78))]
mmohammadi9812 has joined #riscv
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
devcpu has quit [Quit: leaving]
devcpu has joined #riscv
freakazoid12345 has quit [Quit: Leaving]
freakazoid333 has joined #riscv
BOKALDO has quit [Quit: Leaving]
sm2n has joined #riscv
Narrat has quit [Quit: They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance.]
<kaddkaka[m]>
When compiling riscv code with gcc, is there any way to affect the scheduling that the compiler does? (e.g. Somehow provide information about instruction latency)
<kaddkaka[m]>
Some instructions might have implementations that take several cycles to compute results, causing an in-order machine to stall/slow down if this is not considered by the compiler.
<jimwilson>
-mtune=, the default is rocket/sifive-3-series/sifive-5-series, there is support for sifive-7-series (SiFive Unmatched), on mainline there is support for thead-c906 that will appear in next year's gcc release
<jimwilson>
other vendors haven't contributed scheduler support patches for FSF gcc but may have local patches
eduardas has quit [Quit: Konversation terminated!]
<jimwilson>
you can also use -mcpu= which is equivalent to using both -mtune= and -march, though this only supports sifive parts current, the missing thead support looks like an oversight
<kaddkaka[m]>
Ok, thanks, :) is there a short description/comparison of the different tune targets (implementations) and hence the tuning/scheduling effects?
<jimwilson>
there is gcc source code and processor manuals, otherwise the short description is that they optimize for the specified core
<jimwilson>
slightly better description is that they schedule instructions for the core's pipeline, and do instruction selection based on latencies, e.g. is multiply by constant a multiply instruction or a series of shift and adds, that depends on the core
<jimwilson>
and multiply-by-constant code depends on the constant
<kaddkaka[m]>
Yes of course. I guess I just wanted numbers about general instruction latency and in-order/ooo properties
<kaddkaka[m]>
But I guess another approach is to just test all tune options and see which one gives the best result 🥸😋
<jimwilson>
see the core manual or the gcc sources
<kaddkaka[m]>
Professor manuals seems like something that might be Lots of information, whereas I would just prefer a condensed summary. Perhaps I need to collect and compare by myself.
<kaddkaka[m]>
Yeah, thanks
<jrtc27>
the sifive core manuals have a table in them somewhere if you just want the numbers
valentin has quit [Remote host closed the connection]
<jrtc27>
e.g. §4.3 and table 8 in the FU740 manual is probably what you're looking for
<jrtc27>
though I could have sworn there was a more detailed table somewhere with things like floating-point operations but can't for the life of me find it
<jrtc27>
ah that's in the manual for the U74 itself, not the FU740
<jrtc27>
seems like there's a lot of overlap but some useful info in the former missing from the latter when it comes to appendices
<jimwilson>
the u74 core changes every 3 months, the fu740 is fixed in time at the point where it was manufactured, the u74 manual follows the core, and the fu740 manual was forked from the u74 manual at some point in the past, so the u74 core manual may have extra info, and may have info that isn't correct for the fu740
<jimwilson>
there are a few cases where instruction latencies in the current u74 core are lower than in the fu740 because of improvements
<jrtc27>
ack, the appendix about floating-point instruction latencies must've appeared after the fu740 then
<jrtc27>
(or at least after the doc was forked...)
jwillikers has quit [Remote host closed the connection]
mmohammadi9812 has quit [Read error: Connection reset by peer]