<dlan>
drewfustini: ffffaf8000000000 is the first address of direct mapping, so it's should be the start address of ddr/phys address? check page table for the detail if possible?
<Tenkawa>
Got my Star64 in hand now... time to start taking a look
<palmer>
drewfustini, dlan, conchuod: can you guys post on LKML if you're getting a concrete failure here? There was a post on sw-dev too, looks like there's some undocumented memory layout issues we've hit
<drewfustini>
I was moving from 6.2 to 6.4-rc1 when I noticed it. It stops the boot. I since tried 6.3 and it works okay.
<drewfustini>
I need to bisect to see where the problem started between 6.3..6.4-rc1
<drewfustini>
*I have since tried 6.3 [..]
<conchuod>
6.3.0 is fine drew?
<drewfustini>
Yes, it works okay.
meta-coder has quit [Ping timeout: 256 seconds]
MaxGanzII_ has joined #riscv
<drewfustini>
One caveat is that this SoC has errata where TVAL is not sign extended correctly. I had been observing this as an oops on invalid virtual address for badaddr. The "fix" was to sign extend bad addr in do_page_fault (a todo is to correctly use alternatives in the future). This has been working okay in 6.0, 6.1, 6.2, 6.3. One difference in 6.4-rc1 is that the name of the page fault function changed.
<dh`>
how do people ship hardware with such basic blunders? did they not boot linux even once on any test of their cpu ever?
<drewfustini>
It is a pretty trivial change so I am not leaning towards this sign-extend hack being the problem
<drewfustini>
It is not shipping :)
<drewfustini>
It's an internal project.
<drewfustini>
I mention it because I noticed the problem with 6.4-rc1
<dh`>
ah, so you are the lucky guy to run that test
<dh`>
:-)
<drewfustini>
yeah :)
<drewfustini>
I still need to bisect but I wanted to mention the caveat that I do have this "hacky" patch on top.
<jrtc27>
the U74 has that erratum
<jrtc27>
but for some unspecified subset of the time
<jrtc27>
ah no it is specified
<jrtc27>
for instruction faults
<drewfustini>
Yes, the sifive errata has similar fix. I tried for awhile to get a similar alternative to work like they did but had no luck. Most likely I just don't understand C macro stuff well enough
<jrtc27>
which you don't notice because normally you don't take such faults
<jrtc27>
but how this isn't tested by architectural tests...
<jrtc27>
(I know how, because riscv has crap testing)
<drewfustini>
Anyways, I punted until later as this simple hack is good enough for internal use for now
<jrtc27>
well, the trouble is, that's not a good fix
<jrtc27>
because then it means using a zero-extended address doesn't fault like it should
<jrtc27>
and potentially you end up in an infinite trap loop if it originated in the kernel
<drewfustini>
I think the SiFive one is CIP 453: ./arch/riscv/errata/sifive/errata_cip_453.S
<jrtc27>
(kernel accesses zero-extended address, kernel page fault handler says looks good to me, try again, fault reoccurs, repeat)
<drewfustini>
I tried to do something similar for the TVAL sign extend issue that I have but I couldn't get it to work. I'm just carrying this arch/riscv/mm/fault.c hack for now.
<jrtc27>
you can only get away with it for instruction fetch because mepc gives you the sign of PC
<jrtc27>
(sidenote: it drives me nuts that sifive still ask for personal data to download that file...)
<jrtc27>
(but it's just there in the javascript to scrape the url of and direct link to...)
<palmer>
drewfustini: if it's an internal erratum then we can't do much about it upstream. Do you know if it's the same one as the sw-dev post?
<palmer>
jrtc27: if you just Google for the PDF title then the first result skips the SiFive pages and goes straight to the CDN
<jrtc27>
we have a link on the freebsd wiki
<drewfustini>
The issue I saw was different. It gets into the linux boot. Mounts the rootfs from eMMC. And then I got the fatal "Oops - store (or AMO) access fault [#1]"
<palmer>
jrtc27: they sent someone a cease and desist for it, but they backed off after the Google bit
<palmer>
drewfustini: does it reproduce on something public? if so, can you post it on LKML?
<drewfustini>
I am doing some other things on the system right now, but later I want to bisect to understand where the problem starts between 6.3 and 6.4-rc1.
<jrtc27>
morons
<jrtc27>
how to piss off the people supporting your hardware in one swift move
<palmer>
drewfustini: thanks. I'm worried there's something lurking here. It could be a proper Linux bug, as we've gota lot in 6.4
<drewfustini>
Good question... I will try it on some of the SBCs I have
<palmer>
drewfustini: sweet, thanks. I've got most of them bouncing around somewhere, but I only use QEMU... ;)
<Tenkawa>
I'm curious when i can start to try to integrate 6.4-rc into my testing
<drewfustini>
I'm also interested to find out where the regression for my internal system started. There is a service processor that does all the heavy lifting (clocks, resets, etc), so I've always been able to run upstream Linux with just that one patch to sign extend TVAL.
<Tenkawa>
Many areas still look much thinner than the modified 6.2
<Tenkawa>
This Star64 builder is.... odd
<drewfustini>
Without that TVAL sign extend patch, upstream Linux up to and including 6.3 still works, but it copy_process will sometimes fail when badaddr gets the top bits cleared by the hardware bug and results in an invalid virtual address.
<palmer>
Tenkawa: what do you mean by thinner?
<Tenkawa>
palmer: I compared the dts I have for the work Esmil had been doing on 6.2 and its much larger than the current one in 6.4-rc for the VF2
<Tenkawa>
the nodes look very... sparse
<Tenkawa>
on 6.4-rc
Stat_headcrabed has quit [Quit: Stat_headcrabed]
<palmer>
Tenkawa: OK, that makes sense. I'm not really following the various SOC downstreams, though, so I'm probably the least likely to know ;)
<Tenkawa>
Yeah I am working on te VF2 and the Star64
<Tenkawa>
Just got the Star64 today
<Tenkawa>
the build wants yocto for this though ... thats going to have to go....
<palmer>
looks like they're both jh7110? there's some patches in the queue for that, IIUC there's no errata (the jh7100 is blocked on the DMA stuff)
<Tenkawa>
Yeah.. its a nice 7110 (all built on the board too)
<drewfustini>
jrtc27: thanks for the insights. re-reading what you wrote, I see that you mean this simple "fix" to sign extend tval could cause other, potentially worse effects.
paddymahoney has quit [Ping timeout: 265 seconds]
MaxGanzII_ has quit [Ping timeout: 240 seconds]
MaxGanzII_ has joined #riscv
___nick___ has joined #riscv
billchenchina- has quit [Ping timeout: 264 seconds]
<conchuod>
Tenkawa: Much of the jh7110 stuff is hung up on the clock drivers being applied. There's a bunch of stuff sitting around waiting for that.
<conchuod>
6.4-rc1 is usable, as long as your definition of that is "boot an initramfs & access w/ uart"
wingsorc has quit [Quit: Leaving]
aredridel has quit [Read error: Connection reset by peer]