sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv
Trifton has joined #riscv
elastic_dog has quit [Ping timeout: 248 seconds]
elastic_dog has joined #riscv
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
crabbedhaloablut has quit [Write error: Connection reset by peer]
crabbedhaloablut has joined #riscv
rodrgz has joined #riscv
rodrgz has quit [Client Quit]
rodrgz has joined #riscv
rodrgz has quit [Ping timeout: 250 seconds]
rodrgz has joined #riscv
jacklsw has joined #riscv
vagrantc has quit [Quit: leaving]
matoro has quit [Remote host closed the connection]
matoro has joined #riscv
matoro has joined #riscv
matoro has quit [Changing host]
joev has quit [Ping timeout: 268 seconds]
joev has joined #riscv
davidlt has joined #riscv
ssb has quit [Ping timeout: 265 seconds]
joev has quit [Ping timeout: 265 seconds]
joev has joined #riscv
rodrgz has quit [Quit: WeeChat 3.5]
davidlt has quit [Ping timeout: 265 seconds]
crabbedhaloablut has quit [Remote host closed the connection]
crabbedhaloablut has joined #riscv
GenTooMan has quit [Ping timeout: 264 seconds]
BootLayer has joined #riscv
radu242 has quit [Ping timeout: 265 seconds]
hrberg has joined #riscv
jack_lsw has joined #riscv
davidlt has joined #riscv
jacklsw has quit [Ping timeout: 264 seconds]
bauruine has joined #riscv
dor has joined #riscv
jack_lsw1 has joined #riscv
jack_lsw has quit [Ping timeout: 252 seconds]
FL4SHK has quit [Ping timeout: 265 seconds]
dor has quit [Ping timeout: 250 seconds]
<bjdooks> conchuod: what's the issue?
FL4SHK has joined #riscv
jack_lsw2 has joined #riscv
jack_lsw1 has quit [Ping timeout: 250 seconds]
dor has joined #riscv
<conchuod> bjdooks: a boot failure that seems to happen at about a 1/5 frequency
<conchuod> System just hangs with nothing on the uart at all
<conchuod> It's late enough in boot that I'd expect some output & I've not managed to repro it with a debugger connected
<conchuod> I've got about 5 steps left in the bisection I was doing last night so hopefully I am at least close...
jack_lsw2 has quit [Quit: Back to the real world]
jacklsw has joined #riscv
<drmpeg> When you build with clang, do you use CC=clang or LLVM=-14?
<bjdooks> I'm still stukc on 5.17 so can't really help
pecastro has joined #riscv
<conchuod> I do LLVM=1 drmpeg
<drmpeg> Ok, I'll give it a try.
<conchuod> Also, I managed to repro it with gcc, I was thinking it was a clang thing but just a fluke that CI never hit it with gcc
<conchuod> bjdooks: ye nw, I think I'm nearly there...
<bjdooks> depends which CI, but some of it runs with a lot of debug features on which slow things down and can make bugs appear/dissapear
<bjdooks> the uaccess fault is amplified by a lot of the test code when I last had a deep dive of riscv
<conchuod> Yeah, that's my problem if I connect a debugger or if I turn on the sanitisers etc
dor has quit [Ping timeout: 268 seconds]
<conchuod> drmpeg: I've done CC=clang stuff too, both should work for you. There is a reported bug for CC=clang with llvm15 and binutils < 2.39 if zicbom is enabled though just FYI
<drmpeg> Gotcha.
ssb has joined #riscv
jmdaemon has quit [Ping timeout: 260 seconds]
GenTooMan has joined #riscv
qwestion has quit [Ping timeout: 268 seconds]
jjido has joined #riscv
Andre_H has joined #riscv
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM_ has joined #riscv
GenTooMan has quit [Ping timeout: 264 seconds]
jacklsw has quit [Ping timeout: 244 seconds]
<pbsds> pabs2: thanks!
dor has joined #riscv
jobol has joined #riscv
dor has quit [Ping timeout: 268 seconds]
zkrx has quit [Ping timeout: 268 seconds]
dor has joined #riscv
zkrx has joined #riscv
dor has quit [Remote host closed the connection]
GenTooMan has joined #riscv
pecastro has quit [Ping timeout: 244 seconds]
prabhakarlad has quit [Quit: Client closed]
prabhakarlad has joined #riscv
pecastro has joined #riscv
zkrx has quit [Ping timeout: 264 seconds]
GenTooMan has quit [Ping timeout: 244 seconds]
GenTooMan has joined #riscv
CYKS has quit [Quit: Ping timeout (120 seconds)]
CYKS has joined #riscv
jjido has joined #riscv
TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM_ has joined #riscv
Maylay has quit [Ping timeout: 268 seconds]
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
BootLayer has quit [Quit: Leaving]
Maylay has joined #riscv
ElementW has quit [Read error: Connection reset by peer]
ElementW has joined #riscv
bauruine has quit [Remote host closed the connection]
qwestion has joined #riscv
prabhakarlad has quit [Quit: Client closed]
prabhakarlad has joined #riscv
mort has left #riscv [The Lounge - https://thelounge.chat]
mps has quit [Ping timeout: 265 seconds]
Trifton has quit [Ping timeout: 252 seconds]
mps has joined #riscv
wgrant has quit [Ping timeout: 244 seconds]
dor has joined #riscv
Maylay has quit [Ping timeout: 265 seconds]
wgrant has joined #riscv
Andre_H has quit [Quit: Leaving.]
BootLayer has joined #riscv
Trifton has joined #riscv
Maylay has joined #riscv
jacklsw has joined #riscv
jack_lsw has joined #riscv
jack_lsw has quit [Client Quit]
jack_lsw has joined #riscv
jack_lsw has quit [Client Quit]
vagrantc has joined #riscv
jacklsw has quit [Ping timeout: 260 seconds]
Maylay has quit [Ping timeout: 265 seconds]
Maylay has joined #riscv
Trifton has quit [Ping timeout: 252 seconds]
jacklsw has joined #riscv
Trifton has joined #riscv
aburgess has quit [Ping timeout: 265 seconds]
aerkiaga has joined #riscv
jjido has joined #riscv
Trifton_ has joined #riscv
jjido has quit [Client Quit]
Trifton has quit [Ping timeout: 264 seconds]
prabhakarlad has quit [Quit: Client closed]
Trifton_ has quit [Read error: Connection reset by peer]
jjido has joined #riscv
rodrgz has joined #riscv
aburgess has joined #riscv
jacklsw has quit [Read error: Connection reset by peer]
rodrgz has quit [Read error: Connection reset by peer]
rodrgz has joined #riscv
<conchuod> uhh so bisect has pointed at the merge of the netdev tree this time
<conchuod> ...
<conchuod> There is 100% a change in printk behaviour between fed0d9f13266a22ce1fc9a97521ef9cdc6271a23 and 5e8379351dbde61ea383e514f0f9ecb2c047cf4e
<conchuod> I guess I may be hitting bisect struggling with merge commits /and/ a hard to repro bug
<conchuod> Hard to know if the printk thing is related though...
jmdaemon has joined #riscv
<conchuod> The only commit in my good -> bad range that touches arch/riscv is e83031564137 ("riscv: Fix ALT_THEAD_PMA's asm parameters")
<conchuod> nathanchance:
<nathanchance> conchuod: That's unfortunate... I do seem to recall there being a series of printk reverts but I cannot remember which cycle they went into (5.19 vs. 6.0). It is completely possible I botched something with e83031564137 though.
<nathanchance> Disabling CONFIG_ERRATA_THEAD_PBMT would be an easy way to confirm though
<conchuod> Yeah, the threaded printers went in for 5.19-rc1 & came out later on.
<conchuod> I think I am just in a random branch and that behaviour was side tracking me
<jrtc27> it's funny how git was invented for linux development, which is heavily merge-based, but git bisect sucks for merges
<conchuod> I have CONFIG_RISCV_ISA_SVPBMT=y but not CONFIG_ERRATA_THEAD_PBMT in my config
<jrtc27> linear branch history has its benefits...
<conchuod> It's doubly bad here Jess b/c ~1-in-5 boot failure so it's very easy to screw it up
<jrtc27> how's your memory of stats?
<muurkha> ugh, I hate those
<muurkha> not stats, flaky failures
<conchuod> memory of stats classes is not great, ability to estimate probabilities is worse
<conchuod> reverted e83031564137, it aint that. You're off the hook nathanchance :)
<nathanchance> bisecting flakey issues is the worst experience though
<jrtc27> well log(p)/log(0.2) tells you how many times you need to run to have a p chance of all being false-negatives for a 0.2 false-negative rate
<jrtc27> oh I guess yours is 0.8 not 0.2
<conchuod> btw, your issue may be unrelated - dunno if http://lists.infradead.org/pipermail/opensbi/2022-July/003019.html is more likely to be your problem than what I am seeing
<jrtc27> ~13 runs for 0.05
<conchuod> I've been doing 11 or 12
<conchuod> nathanchance: "btw, your issue may be unrelated " was directed at you
<muurkha> conchuod: https://en.wikipedia.org/wiki/Fisher's_exact_test is in theory what you need to tell you how much confidence you should have that two sequences of attempted boots with different versions have different probabilities of success
<muurkha> I think
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
<conchuod> I really need to get an automated bisection setup going.
<nathanchance> conchuod: Thanks for the link. I think we are using QEMU 7.1.0 in CI so I would expect that to be fixed but not sure...
<muurkha> like, if with a known broken version you had 2 failures out of 9 boot attempts, and with a version you're trying now, you've had 0 failures out of 8, how confident should you be that this version changed the chance of a boot failure
<conchuod> At this point I don't know how far back I need to bisect haha
<conchuod> I *thought* 6.0-rc1 was good, now I don't even thing v5.19-rc1 is
<conchuod> I should have kept track of what the number of boots to find a failure was & do 12 no matter what etc
<muurkha> at least keep a record of all the attempted boots and what conditions you did them under
<muurkha> one of the nice things about "computer science" is that often we don't have to do any science, because science is laborious
<muurkha> flaky bugs are one of the exceptions. you have to do actual science things like keeping a lab notebook with the results of all your experiments in order to make progress
<conchuod> If I don't figure this out tonight, I'll rip my setup up this weekend so that I can automate this.
<conchuod> I have the relay etc that I need to do power cycles & then I can run expect scripts instead of doing things manually
prabhakarlad has joined #riscv
rodrgz has quit [Quit: WeeChat 3.5]
aburgess has quit [Ping timeout: 252 seconds]
jjido has joined #riscv
zkrx has joined #riscv
davidlt has quit [Ping timeout: 244 seconds]
gordonDrogon has quit [Ping timeout: 265 seconds]
crabbedhaloablut has quit [Quit: No Ping reply in 180 seconds.]
aburgess has joined #riscv
crabbedhaloablut has joined #riscv
gordonDrogon has joined #riscv
<bjdooks> does it happen under qemu?
BootLayer has quit [Quit: Leaving]
<conchuod> bjdooks I ran 2000 reboots in qemu and could not hit it
<conchuod> I ripped up my board "farm" and put a relay in it cos I got frustrated, currently writing an expect script to automate this
zjason` has joined #riscv
zjason has quit [Ping timeout: 265 seconds]
GenTooMan has quit [Ping timeout: 248 seconds]
aerkiaga has quit [Remote host closed the connection]
rodrgz has joined #riscv
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
GenTooMan has joined #riscv
<conchuod> Right this is so much faster now...
gordonDrogon has quit [Ping timeout: 265 seconds]
GenTooMan has quit [Ping timeout: 244 seconds]
gordonDrogon has joined #riscv
gordonDrogon has quit [Ping timeout: 246 seconds]
gordonDrogon has joined #riscv
GenTooMan has joined #riscv
jobol has quit [Remote host closed the connection]
gordonDrogon has quit [Ping timeout: 265 seconds]
GenTooMan has quit [Ping timeout: 264 seconds]
rodrgz has quit [Ping timeout: 246 seconds]
gordonDrogon has joined #riscv
rodrgz has joined #riscv
GenTooMan has joined #riscv
pecastro has quit [Ping timeout: 250 seconds]
prabhakarlad has quit [Ping timeout: 252 seconds]
Trifton has joined #riscv