<davidlt[m]>
somlo: HW issues or software? Or just things working too slow and thus revealing issues?
<davidlt[m]>
It's actually funny. I used to drop clock on Xeon servers to lowest possible to find new bugs :)
<davidlt[m]>
Sometimes execution speed hides bugs in the code.
<davidlt[m]>
Actually too fast also works, but a lot less.
smudge-the-cat has joined #fedora-riscv
smudge-the-cat has left #fedora-riscv [#fedora-riscv]
smudge-the-cat has joined #fedora-riscv
smudge-the-cat has left #fedora-riscv [#fedora-riscv]
jcajka has joined #fedora-riscv
<rwmjones>
davidlt[m]: oh good, it worked
<rwmjones>
it was taking a very long time to be picked up by the builders when I looked yesterday
<davidlt[m]>
rwmjones: yeah, too few builders and too many tasks. In that case Koji prefers to keep producing those SRPMs before actually building anything.
<davidlt[m]>
Annoying. The more builders you have the less likely that is the case.
<davidlt[m]>
rwmjones: are you around? could you reconfigure kojid?
<rwmjones>
davidlt[m]: sorry got to go out, can you email me the details & I can do it later
<davidlt[m]>
rwmjones: ok
<davidlt[m]>
Sample PSU arrived, preparing to test a new builder
<davidlt[m]>
rwmjones: the final production fan on Unmatched is definitely a lot worse :D
<davidlt[m]>
rwmjones: I can feel vibrations on my table from that tiny fan
<davidlt[m]>
I just refreshed my knowledge how to reprogram FTDI chip on it (now it has a serial number in ID)
<davidlt[m]>
which causes for irq affinity map stuff to fail I think
<davidlt[m]>
your final crash seems to be related to timer/irq stuff thus the initial issue might be IRQ related?
<davidlt[m]>
Is your DT matching HW?
<davidlt[m]>
Check your DT for plic0 configuration and in general all refs to plic0 in various devices
<somlo>
yeah, that mask comes from the verilog (and dts) generated by Rocket. For some reason, linux complains about it on the 1-core version (was not complaining when I used 4 cores, but was still crashing :) )
<somlo>
I need to do trial-and-error to figure out what the value is that would be "just right" (be nice if linux told us by *how much* the value is too large :) )
<davidlt[m]>
Are IRQs count and mapping the same between 1 and 4 core versions?
<somlo>
yeah
<davidlt[m]>
if 0xffff is IRQ number, that would be a large number :)
<davidlt[m]>
FU740 has 69 IRQs IIRC
<davidlt[m]>
So where did "hwirq 0xffff" came from?
<davidlt[m]>
That would be too large.
<somlo>
the chisel->verilog elaboration in the rocket chip sources
<davidlt[m]>
More than riscv,ndev
<somlo>
might have been a bug there, and I cut'n'pasted it. I should check if it's still that after I updated the rocket chip sources, thanks for pointing it out
<davidlt[m]>
You can probably figure out a bit from looking at kernel source and adding some printk
<somlo>
interestingly enough it's not complaining when I boot my custom kernel, and I use the same value in DT
<somlo>
I remember doing the printk experiment a while back, when it *was* complaining :)
<davidlt[m]>
Do you use the same kernel config?
<davidlt[m]>
But this looks like it failed to while init'ing stuff by looking at DT
<somlo>
not the same as fedora (a whole lot fewer things enabled :)
<davidlt[m]>
let me get my tea and do a quick git grep on kernel
<davidlt[m]>
well, only debugging can answer that :)
<davidlt[m]>
new board has joined Koji party
esv_ has joined #fedora-riscv
esv has quit [Killed (NickServ (GHOST command used by esv_))]
<javierm>
davidlt[m]: thre's no need to compile a debug kernel, I believe pr_debug() is part of the dynamic debug infra so you could just enable that using a kernel cmdline param or sysfs entries
<somlo>
javierm: thanks, I'll study that, should come in handy in the future
<somlo>
davidlt[m]: I specifically remember narrowing that down in the past (using printk's :) ) to 0x3F (instead of the default `65535` suggested by the sample .dts automatically generated by chisel)
<somlo>
I'm trying again with that value in .dts (and so far I got no complaints from the kernel during that phase of booting, it's still working its way through the later stages, we'll see how it shakes out)
<somlo>
but ultimately I think I'll have to remember all the chisel I didn't quite comprehend in the first place and figure out why the rocket generator suggests that wrong 0xFFFF (65535) value in the first place, and maybe suggest a fix
<somlo>
side note: generating a matching .dts for the LiteX bitstream is still a semi-manual process, at least when using Rocket as the cpu
<davidlt[m]>
Yeah
<davidlt[m]>
IIRC generate DTS is not upstream quality for Linux
<davidlt[m]>
*generated
<somlo>
and fully automating that would strongly benefit from being able to trust the sample .dts produced during chisel's elaboration of rocket sources (and then "pasting" in a bunch of extra device information from LiteX -- mmio register addresses, irq numbers, etc)
<somlo>
which is obviously not the case at the moment :)
<somlo>
maybe there's a bug in the LiteUART driver -- I used to be able to use the sbi console (ecall-ing into machine mode and having the hypervisor -- BBL -- do the console work on my behalf)
<somlo>
wondering if that's something OpenSBI also supports...
<davidlt[m]>
this time you didn't have irq thing during boot
<davidlt[m]>
apart the final crash :)
<somlo>
yeah, I fixed the offending mask in DT
guerby_ is now known as guerby
<davidlt[m]>
Good, but did I fail the same?
<davidlt[m]>
I don't recall the 1st log.
<somlo>
same-ish -- most of my crashes are some interrupt thing, most of the time related to the uart
davidlt has quit [Ping timeout: 252 seconds]
jcajka has quit [Quit: Leaving]
davidlt has joined #fedora-riscv
<davidlt[m]>
nirik rwmjones djdelorie : I sent an email about kojid reconfiguration
<davidlt[m]>
Feel free to kill any running job. There are none right now. Not sure if Miro will send something for the night.
<nirik>
changes made here.
<djdelorie>
davidlt[m]: the plugin is giving me errors: AttributeError: 'NoneType' object has no attribute 'origin'
<djdelorie>
do I need to install it?
<davidlt[m]>
could check that syntax is ok?
<davidlt[m]>
I had this before, but it was wrong syntax in kojid.conf
<davidlt[m]>
Has to be: plugins = rpmautospec_builder
<davidlt[m]>
disk image already has everything installed
<djdelorie>
yeah, the file had quotes around it
<davidlt[m]>
that's an old typo :)
<djdelorie>
rebooted, waiting to see if the timeout works