<jrtc27>
still some jank to fix though, like why it's able to allocate memory space for the XHCI controller but not the NVMe one...
<jrtc27>
and a whole bunch of warning spew that I think is part possibly-dodgy device tree and part freebsd
<GreaseMonkey>
ok, seems i've managed to segfault gcc, that's lovely, we'll try that again to confirm it isn't the kernel falling apart as it sometimes does
<GreaseMonkey>
...it compiled successfully this time, probably the kernel falling apart
mahmutov has quit [Ping timeout: 252 seconds]
<dh`>
isn't tyrquake the recommended fork, or is that gl-only?
<GreaseMonkey>
tyrquake looks like it might have software rendering
<geist2>
the original one did at least. is it based on the same code base or some sort of recreation?
FluffyMask has quit [Quit: WeeChat 2.9]
<GreaseMonkey>
checked, it exists, it's probably based on the same codebase
<GreaseMonkey>
looking at the U74MC core manual, what is the repeat rate, is that how long one has to wait before dispatching that instruction again
<jrtc27>
yeah that'll be an indication of how pipelined it is
<jrtc27>
which is basically "only one outstanding division or square root allowed, everything else is fully pipelined"
<jrtc27>
in this case
<jrtc27>
I wonder why the lower and upper bounds for latency and repeat rates differ by non-constant amounts
<sorear>
data dependent timing?
<jrtc27>
not just that
<jrtc27>
latency is 9-36, repeat rate is 8-33
<jrtc27>
so there's more pipelining for slow cases
<jrtc27>
but only right at the beginning or end?
<jrtc27>
maybe an artifact of some long routes that needed registering to meet timing?
<sorear>
the old rocket fdiv/fsqrt was interesting, newton’s method with an internal pipelined mul…
<GreaseMonkey>
my guess is that for stuff involving division and square roots, making that a whole full pipeline might not be worth the divide space
<jrtc27>
depends what you're running
<jrtc27>
HPC will care
<GreaseMonkey>
also huh, i'd've thought newton's method would have been good
<sorear>
repeat 1 dividers basically don’t exist
<jrtc27>
if I'm reading the LLVM schedule info correctly, skylake can dispatch one fdiv64 every three cycles with a latency of 18 cycles
<GreaseMonkey>
huh, messing w/ mandelbrots, the speed in terms of double precision and single precision isn't that big here
jimbzy has quit [Changing host]
jimbzy has joined #riscv
<Sos>
jrtc27, sweet!
<Sos>
i need to try to boot FreeBSD on the Nezha
<Sos>
gonna be fun with fried UART
Sos has quit [Quit: Leaving]
<sorear>
GreaseMonkey: is it bad?
<GreaseMonkey>
it's not bad at all
<GreaseMonkey>
something like 1.23x as many cycles for double math compared with single math
<GreaseMonkey>
although then again i'm not doing any divisions here
<sorear>
you made a comment on newton's method
<GreaseMonkey>
oh right
<GreaseMonkey>
i'd've thought it would have been faster than the "one bit per cycle" approach
<GreaseMonkey>
but then again it's more useful for approximated stuff i think
<sorear>
it is
<sorear>
was replaced to save area
<GreaseMonkey>
ah
<sorear>
was particularly inefficient in that regard because it had its own multiplier not shared with the fmul multiplier
<sorear>
the multiply was pipelined over several cycles so the divide unit could have multiple divides in operation simultaneously, but only if it was instantiated as part of a vector coprocessor; it was not efficiently utilized for scalar code
<GreaseMonkey>
...turns out while i managed to beat the compiler for integer code, for floating point code i'm behind
<jrtc27>
which compiler, what code, and is it the kind of thing that it *should* be able to optimise?
<GreaseMonkey>
gcc, custom mandelbrot implementation to practice assembly
<GreaseMonkey>
ok i've managed to beat -O2 which seems to give the fastest code