dgilmore changed the topic of #fedora-riscv to: Fedora on RISC-V https://fedoraproject.org/wiki/Architectures/RISC-V || Logs: https://libera.irclog.whitequark.org/fedora-riscv || Alt Arch discussions are welcome in #fedora-alt-arches
jednorozec has quit [Read error: Connection reset by peer]
jednorozec_ has joined #fedora-riscv
jednorozec_ is now known as jednorozec
davidlt has joined #fedora-riscv
<davidlt> test
<conchuod> You find your unmatched issue davidlt ?
<conchuod> And is it the same as the one Samuel reported?
<davidlt> Dammit. Now I don't know where to reply :) Matrix/IRC :)
<davidlt> No. I don't know why it's not booting. I have another two kernels building for today's testing.
<davidlt> Which issue was reported by Samuel?
<davidlt> It looks like this: https://paste.centos.org/view/598f9f02
<davidlt> on RC4 it does even panic, just nothing happens.
<davidlt> A few things I tried so far: (1) moving NR_CPUS from 512 -> 32 (back to old default), (2) attempted to disable IRQ_STACKS, but failed, (3) disable vector support (just in case, but it shouldn't cause this).
<davidlt> There was also something strange, IRQ_STACKS depend on CONFIG_EXPERT, but that wasn't enabled. I got that enabled in kernel-ark to see if that has any affect.
<davidlt> There are a couple IRQ related patchsets on linux-riscv mailing list too. I might try those later too.
<conchuod> Hm, I figure that's unrelated to his issues with has_fpu()
<davidlt> Yeah
<davidlt> I could give you my defconfig if you want to try locally.
<davidlt> Well, maybe after I test the next two kernels.
<davidlt> The next kernel will be ready in ~3 hours, another one in ~6 hours.
<davidlt> conchuod, I also found that OpenSBI v1.3-40-gc2e6027 + v6.4.7 doesn't work on Unmatched (didn't test QEMU yet).
<davidlt> I pinged Anup already about this.
<conchuod> You need 1.3.1
<davidlt> This is newer.
<davidlt> Basically with OpenSBI master branch you cannot bring up secondary cores online.
<conchuod> Oh dear
<davidlt> v1.3.1 works fine.
<conchuod> 1.3.0 wouldn't have boot either
<davidlt> Yeah, I did some catching up on emails after I came back :)
<conchuod> It's as if most of that stuff doesn't get tested outside of qemu...
<davidlt> I typically run OpenSBI master as it used to be stable for years.
<conchuod> I mostly don't update opensbi, too much hassle on most of my stuff
<davidlt> Well, Debian and Fedora depends on Unmatched for now :)
<davidlt> Nothing really changes here until SOPHGO stuff is available (and upstreamed).
<davidlt> Or some new board shows up (Horse Creek?)
<conchuod> I'll believe horse creek is happening when they ship
<davidlt> Yup, until hardware is in my hand it doesn't really exist.
<davidlt> Upstream is also highly important bit. Hardware with no upstream software is just wasted money in most cases.
<davidlt> It seems that StarFive keeps pushing forward.
<davidlt> I saw a tweet/post someone mentioned good board + availability + upstream support == making this a popular board.
<davidlt> I really hope that JH8100-series will push the bar higher.
<conchuod> Hopefully the 8100 stuff reuses the same IP :)
<davidlt> That would be smart.
<davidlt> I a bit more worried about the cores themselves. That's in house this time.
<davidlt> I noticed your tests, is anyone allowed to add their own tests under "<username>/test"?
<conchuod> davidlt: I don't know if anyone can, and to be honest I hope they can't.
<davidlt> :)
<conchuod> Could spam the thing with random crap if it was possible.
<conchuod> I am a maintainer on patchwork, I don't know if I could push those checks otherwise.
<conchuod> davidlt: Is there something you want that is missing checks wise?
<conchuod> Bjorn and I have been trying to move to a different type of infra (using gitlab CI rather than based on NIPA) so that it can make use of more than just the one x86 build machine it is currently running on.
<conchuod> When that happens, plan is to add actual boot tests etc.
<davidlt> I was it would be cool to boot distro(s) defconfig kernels + stress-ng + selftests. Get some real-life sanity checks.
<conchuod> The selftests take far too long to run
<conchuod> IMO, that kind of thing is better run against linux-next.
<conchuod> Not against every single patch
<davidlt> stress-ng is relatively fast (I typically do 2-4 hours run for a quick test).
<conchuod> 2-4 hours for every single patch is not happening.
<davidlt> If it cannot pass that it's typically a bad thing.
<davidlt> Actually vforkmany from stress-ng kills the kernel today.
<conchuod> I think we averaged something like 1000 patches per month this year sent to the linux-riscv list.
<conchuod> So multiply that by whatever time a new test takes to run ;)
<davidlt> It still sounds that this lacks proper funding from vendors.
<conchuod> Oh totally.
<davidlt> I just keep seeing those large farms from Linaro in front of my eyes.
<davidlt> I have free machines to spin another test kernel. I am pulling in IRQ related patches mostly.
<davidlt> This one sounds a bit more risky.
<conchuod> But I still think things like stressng should be run against next or similar, not on every single patch sent to the list.
<davidlt> Fedora kernels are bumped almost daily (but I am not rebasing things daily in RISCV land).
<davidlt> I am trying to boot v6.5-rc to attempt PR to upstream Fedora kernel-ark (unless I loose interest again).
<conchuod> In case I am not explaining things properly, the patchwork stuff applies things one patch at a time & runs the tests on each patch individually. 2-4 hours for a test means 20-40 hours for a 10 patch series.
<davidlt> Well, somehow v6.5 kernels takes me 20-22 hours. I cannot afford right now testing a single patch :)
<conchuod> We are hoping (or were hoping?) that RISE would be able to get some build machines sorted. That's the motivation for moving to gitlab. But I'm not investing the time in moving things until someone tells me if it is happening, because even though I run that stuff, I am not in the loop :)
<conchuod> > It still sounds that this lacks proper funding from vendors.
<conchuod> 🤡
<davidlt> It's probably too early for anything RISE involved. I bet it takes months (more?) before you start seeing fruits of that work.
<davidlt> What's your opinion on Samuel path: irqchip/sifive-plic: Avoid clearing the per-hart enable bits ?
<davidlt> I pulled his over patch, Guo CONFIG_FRAME_POINTER fixes, IRQ restore patch too.
<davidlt> s/over/other/
<conchuod> I have not even looked at it. Trying to tread water in terms of stuff to do.
<davidlt> Ah, let's skip it for now.
<conchuod> I need to look at the starfive pcie stuff too
<davidlt> That would be nice.
<davidlt> and a new kernel is cooking, something to test tomorrow.
<conchuod> davidlt: linux next is not booting, so there goes my morning.
<davidlt> conchuod, sounds like fun ;)
jcajka has joined #fedora-riscv
masami has joined #fedora-riscv
<davidlt> The 1st kernel arrived for testing (most likely will fail)!
masami has quit [Quit: Leaving]
zsun has joined #fedora-riscv
<davidlt> and 2nd one is almost ready for testing
zsun has quit [Quit: Leaving.]
<davidlt> good news and bad news
<davidlt> with kernel-ark CONFIG_EXPERT set the kernel seems to be alive for a lot longer
<davidlt> sadly still fails the same
<davidlt> the next kernel will have a bunch non-yet-merged/reviewed IRQ related patches in
<conchuod> I had a look at samuel's irq patch today, or at least gave it a run out, and it seemed okay to me.
<davidlt> I didn't incl. that one, but pulled 4 others related to IRQ
jcajka has quit [Quit: Leaving]
<conchuod> Did you pick up the IRQ STACK stuff?
<davidlt> Yes
<davidlt> I bet it's it, but I couldn't easily disable it in kernel-ark
<davidlt> Something was forcing it =y
<davidlt> it depends on CONFIG_EXPERT, but somehow it at final config CONFIG_EXPERT was disabled.
<davidlt> Not sure how that could happen if CONFIG_IRQ_STACKS depends on CONFIG_EXPERT.
<conchuod> what is a "kernel-ark"?
<conchuod> Yah, it is default Y, but you can't user-edit it unless CONFIG_EXPERT is set.
<davidlt> Always Ready Kernel. It's a common place for all Fedora, ELN, RHEL, etc. kernels.
<davidlt> I had CONFIG_IRQ_STACKS=y, and CONFIG_EXPORT is not set in the final config for the 1st kernel IIRC.
<davidlt> conchuod, here is what I get now with CONFIG_EXPORT enabled: https://paste.centos.org/view/31467f1a
<conchuod> Does it die differently each time?
<davidlt> Which is way better (as I get some output), but the issue is the same as with rc3.
<davidlt> No, seems to be the same and stable right now.
<davidlt> I will hit reset one more time.
<conchuod> I assume bisection is a shite with this ARK thing that takes hours?
<davidlt> Days
<davidlt> bisection with kernel-ark would be a nightmare.
<davidlt> Technically I don't know right now if that would be possible.
<davidlt> Maybe (?)
<davidlt> Rebooting the board gives me exact same kernel crash.
<Esmil> davidlt: sorry, my scrollback isn't long enough. by any chance do you disable CONFIG_FRAME_POINTER and hit this: https://lore.kernel.org/linux-riscv/20230716001506.3506041-1-guoren@kernel.org/
<davidlt> FRAME_POINTER=y in the config, but I did pull in the patches related to this for the next build
<Esmil> ah, cool
<davidlt> Sadly it's ~14 hours away
<davidlt> Oh, this time I get no output from the kernel.
<davidlt> Died way too early.
<conchuod> 6.5 seems like a bit of a delightful mess that I have somehow avoided.
<davidlt> Yeah, this will waste a lot of time.
<davidlt> Well at least this time I jumped it way too earlier than usual into testing :)
<davidlt> I just started building another test kernel with more patches pilled on top
droidrage has joined #fedora-riscv
somlo has quit [Remote host closed the connection]
somlo has joined #fedora-riscv
davidlt has quit [Ping timeout: 245 seconds]
pbsds has quit [Ping timeout: 260 seconds]
pbsds has joined #fedora-riscv
marinaro has joined #fedora-riscv