azonenberg changed the topic of ##openfpga to: Open source tools for FPGAs, CPLDs, etc. Silicon RE, bitfile RE, synthesis, place-and-route, and JTAG are all on topic. Code: https://github.com/azonenberg/openfpga. Channel logs: https://libera.irclog.whitequark.org/##openfpga
kristianpaul has joined ##openfpga
egg|cell|egg has joined ##openfpga
Degi_ has joined ##openfpga
Degi has quit [Ping timeout: 264 seconds]
Degi_ is now known as Degi
tokomak has quit [Ping timeout: 264 seconds]
specing_ has joined ##openfpga
specing has quit [Ping timeout: 265 seconds]
indy has joined ##openfpga
emeb_mac has quit [Quit: Leaving.]
<tnt> Any clue to what would yield nexpnr runs to be inconsistent between them when specifying --seed ?
<tnt> (same machine ... same binary ... same options ... run it twice and get different output)
<emilazy> bugs, for one :(
<tnt> yeah, unfortunately I can only reproduce this on my laptop and a build takes 40 min there so I can't realistically bisect 5 months of commits (previous build I was using that didn't exhibit the issue).
<gatecat> I think it's to do with a bad reliance on unordered_map ordering
<gatecat> (most C++ libraries use insertion ordering or something but some don't)
<gatecat> really it needs to be replaced by another structure, ideally, like the Yosys hashlib dict
<tnt> mmm ... I'm using gcc 9.3.0 on both laptop/workstation and they have different behavior. (granted one is the gentoo variant, the other the ubuntu one so no clue what patches/optiosn they ended up being built with).
<gatecat> which one is nondeterministic? I have never seen this here on Arch, so it'd be useful to be able to reproduce
<tnt> my laptop
<tnt> (gentoo)
<tnt> Trying to bisect anyway (even given long build) and first commit results in segfault, not a great start.
<tnt> if you have the same boost version (1.72) I can send you the binary
<gatecat> I'm not sure if a bisect will even help here as this is a bug that's been in there for a long time, if it's the one I'm thinking of
<gatecat> is it possible that the compiler or C++ library has changed? (not that it's a bug on that side, just a bad assumption in nextpnr)
<tnt> I never had this behavior before on my laptop. And no, same exact gcc version.
<gatecat> huh, maybe it is something else then
<tnt> The build from fpga-toolchain is constitent, but it exhibits poor QoR. 17 fails out of 32. (vs 3 for the same oss-cad-suite build, both nightly builds).
<gatecat> is that with the same netlist?
<tnt> yup, same netlist, same pcf, same options.
<tnt> scanning seed from 0 to 31
tokomak has joined ##openfpga
<tnt> From bisect result, I suspect timing: Use new engine for HeAP
<tnt> ebc2527368d920ea3c40a9ca83f73df242785044
<tnt> (hard to be 100% sure because a lot of commit don't build or plain segfault ...)
<gatecat> OK, that's definitely something I should look into
<gatecat> can you point me to the design? I can probably find a way of triggering the QoR issue here
<tnt> Yup. preparing a package now.
<gatecat> thanks!
<tnt> Hopefully that's clear enough.
<gatecat> tyvm
<gatecat> that should be plenty to reproduce
_whitelogger has joined ##openfpga
<tnt> valgrind shows tons of "Conditional jump or move depends on uninitialised value(s)" and similar.
<tnt> (which it doesn't for fba71bd182151713455c8d1cf0abefea9cf59831 for instance)
_whitelogger has joined ##openfpga
Miyu has joined ##openfpga
freeemint has joined ##openfpga
hackkitten has quit [Ping timeout: 272 seconds]
<gatecat> yep, I'm looking into it now
Miyu is now known as hackkitten
egg|cell|egg has quit [Read error: Connection reset by peer]
egg|cell|egg has joined ##openfpga
egg|cell|egg has quit [Ping timeout: 264 seconds]
<tnt> gatecat: The PR fixes valgring here as well. And seems to indeed make the results stable/consistent.
<tnt> It also fixed the "bad" QoR AFAICT. Now on my laptop / self-built nextpnr, I get only 5 out of 63 fails and only very near misses.
<tnt> like ... Max frequency for clock 'clk': 47.99 MHz (FAIL at 48.00 MHz)
<gatecat> tnt: great, thanks for testing!
<tnt> So I guess HeAP was operating with bad timing estimates causing it to do ... bad things.
<tnt> I still seem to get different results on different machine, which is a bit strange. I'll rebuild from scratch to make sure it's not some residue from all the testing.
egg|cell|egg has joined ##openfpga
<tnt> Nope, still different results depending on the machine where I built it ... meh, I guess there must be some subtle build flag or compiler differences or something. In anycase, both exhibit statistically similar QoR so it's not an issue, just intriging.
egg|cell|egg has quit [Ping timeout: 252 seconds]
egg|cell|egg has joined ##openfpga
freeemint has quit [Ping timeout: 252 seconds]
freeemint has joined ##openfpga
<q3k> does nextpnr rely on floating point math? these can also be quite notorious reproducibility killers
<q3k> s,reproducibility,determinism,
<tnt> Yes it does.
<tnt> And that's what I'd suspect ... some SIMD optimization difference could easily explain that since a lot of those are not strictly compliant with IEEE.
<q3k> i mean, if you're going from source, then yeah, a lot of things can cause bit-to-bit differences
<q3k> compilers can pick to emit different instructions for the same high-level code, that are okay acording to the C/C++ standards but slightly differ on bit-to-bit results
<mwk> yosys ended up with its own dict/set implementations, specifically to avoid C++ library differences between compiler causing different output
<q3k> and even if you ship the same binary, it can still cause bit-to-bit differences if it eg. does per-cpu-capability dispatch, ie. uses SSE2 for sqrt if present, but defaults to an x87 impl otherwise
<mwk> (with limitted success, but eh)
cr1901 has left ##openfpga [##openfpga]
cr1901 has joined ##openfpga
<gatecat> initialiser ordering is the other one that's bitten us before in nondeterminism between compilers/machines
<thaytan> q3k, I remember hitting one floating point bug once that depended on whether a computation stayed in the FPU and was computed in full 80-bit precision the whole way, or was evicted to a floating point register and truncated to 64-bits mid-way
<q3k> thaytan: yep, exactly
<mwk> that's classic, though luckily mostly extinct along with 32-bit x86
fibmod has quit [Ping timeout: 272 seconds]
_whitelogger has joined ##openfpga
egg|cell|egg has quit [Ping timeout: 264 seconds]
egg|cell|egg has joined ##openfpga
egg|cell|egg has quit [Read error: Connection reset by peer]
egg|cell|egg has joined ##openfpga
freeemint has quit [Ping timeout: 244 seconds]
freemint has joined ##openfpga
SmutLord_ has quit [Read error: Connection reset by peer]
SmutLord_ has joined ##openfpga
freemint has quit [Ping timeout: 272 seconds]
specing has joined ##openfpga
specing_ has quit [Ping timeout: 264 seconds]
freemint has joined ##openfpga
egg|cell|egg has quit [Ping timeout: 264 seconds]
egg|cell|egg has joined ##openfpga
egg|cell|egg has quit [Read error: Connection reset by peer]
egg|cell|egg has joined ##openfpga
egg|cell|egg has quit [Ping timeout: 272 seconds]
egg|cell|egg has joined ##openfpga
esden has quit [*.net *.split]
renze has quit [*.net *.split]
mwk has quit [*.net *.split]
Hoernchen has quit [*.net *.split]
rektide has quit [*.net *.split]
sorear has quit [*.net *.split]
pie_bnc has quit [*.net *.split]
hl has quit [*.net *.split]
keesj has quit [*.net *.split]
renze has joined ##openfpga
keesj has joined ##openfpga
sorear has joined ##openfpga
esden has joined ##openfpga
egg|cell|egg has quit [Read error: Connection reset by peer]
mwk has joined ##openfpga
pie_bnc has joined ##openfpga
hl has joined ##openfpga
Hoernchen has joined ##openfpga
egg|cell|egg has joined ##openfpga
rektide has joined ##openfpga
freemint has quit [Remote host closed the connection]
freemint has joined ##openfpga
freemint has quit [Remote host closed the connection]
freemint has joined ##openfpga
freemint has quit [Remote host closed the connection]
freemint has joined ##openfpga
tokomak has quit [Ping timeout: 272 seconds]
freemint has quit [Remote host closed the connection]
freemint has joined ##openfpga
freemint has quit [Remote host closed the connection]
freemint has joined ##openfpga
freemint has quit [Ping timeout: 244 seconds]
cr1901 has quit [Quit: Leaving.]
cr1901 has joined ##openfpga
cr1901 has quit [Read error: Connection reset by peer]
cr1901 has joined ##openfpga
SmutLord_ has quit [Quit: Leaving]
SmutLord_ has joined ##openfpga
SmutLord_ has quit [Client Quit]
specing has quit [Ping timeout: 264 seconds]
Lord_Nightmare has quit [Quit: ZNC - http://znc.in]
specing has joined ##openfpga
Lord_Nightmare has joined ##openfpga