ChanServ changed the topic of #armlinux to: ARM kernel talk [Upstream kernel, find your vendor forums for questions about their kernels] | https://libera.irclog.whitequark.org/armlinux
Tokamak has quit [Read error: Connection reset by peer]
Tokamak has joined #armlinux
narmstrong has quit [Read error: Connection reset by peer]
<arnd>
nice catch ardb! I wonder how it normally terminates the recursion, maybe the driver runs into an error condition after too many attempts to register a child device and unwinds from there
<arnd>
I haven't looked at the code here, but I've seen similar traces from drivers that register a platform_device child and set child->of_node=parent->of_node before registering
<arnd>
which then makes the driver core call into the same driver again
<ardb>
yeah that is what it look like to me
<tmlind>
hmm
<tmlind>
narmstrong: any ideas where the above might be coming from?
<arnd>
or maybe here it's the PLATFORM_DEVID_AUTO bit that sets the device name rather than the of_node
<arnd>
no, that makes no sense. Instead I suspect it's robher's cf081d009c44 ("usb: musb: Set the DT node on the child device") that caused a regression
<arnd>
so it would crash without vmap-stack as well, we just get a more readable stacktrace this way
<ardb>
i would assume so yes
<arnd>
the way that some usb host drivers insert devices to model generic vs soc-specific bits just doesn't work too well with our usual driver model, it keeps causing problems
<arnd>
of_node_reused from 2c1ea6abde88 ("platform: set of_node in platform_device_register_full()") was apparently meant to avoid the recursion, but fails to do the right thing here
<arnd>
do we have a CI log from a machine with sunxi-musb?
nsaenz has quit [Remote host closed the connection]
<tmlind>
pinephone would have that
<tmlind>
not seeing issues with the musb 2430 glue at least with commit 2c1ea6abde88
alpernebbi has quit [Ping timeout: 240 seconds]
<tmlind>
sorry i mean with commit cf081d009c44
alpernebbi has joined #armlinux
System_Error has quit [Ping timeout: 276 seconds]
System_Error has joined #armlinux
tlwoerner has quit [Ping timeout: 256 seconds]
tlwoerner has joined #armlinux
apritzel_ has joined #armlinux
System_Error has quit [Ping timeout: 276 seconds]
System_Error has joined #armlinux
System_Error has quit [Ping timeout: 276 seconds]
System_Error has joined #armlinux
<tmlind>
but that was with next-20220107, next-20220115 only boots for some of my machines, just hangs with no errors
Pali has joined #armlinux
System_Error has quit [Ping timeout: 276 seconds]
headless has joined #armlinux
headless has quit [Quit: Konversation terminated!]
jlinton has quit [Quit: Client closed]
<robher>
chipidea usb has also regressed with the same change.
<robher>
of_node_reused has no effect other than with pinctrl.
<robher>
Is the problem that setting the of_node causes a match on the parent driver instead of match by driver name? If so, I think we should be able to to check of_node_reused in the DT matching function and not match when set.
<robher>
arnd, tmlind: ^^^
apritzel_ is now known as apritzel
<marex>
arnd: hi, maybe you can give me a hint ... I've got this PCI IP, if I readl() from config space and the link is down, I get an Imprecise External Abort, so I cannot "fix it up" in a hook and restart the exact instruction which triggered it, the instruction pointer is a few instructions down the line in the fault handler hook
<marex>
arnd: is there any way I can force it into "precise" abort, so I would get the right instruction address to restart ? or is this imprecise abort due to speculation and thus unfixable ?
<marex>
s@precise@synchronous@
<ardb>
marex: there are other SOCs with the same issue
<ardb>
marex: does the read return the correct value in this case? (all 1 bits)
<marex>
ardb: nope
<marex>
ardb: it returns zeroes
<ardb>
this is a rather severe integration issue, and the only way to paper over it is to only expose the host bridge if the link is up, and pray it doesn't go down
<marex>
ardb: is there a way to block the bridge from ever dropping into L1 link state ?
<marex>
ardb: I didn't find any way to do it
<marex>
ardb: that might really be my only way out now
<marex>
(besides somehow turning the abort into synchronous one and fixing the return value up in the hook)
<marex>
I can detect the link is in L1 state before doing the config space access in the kernel, but what if someone uses pci-utils setpci in userspace ...
<arnd>
marex: have you tried doing the read access using an inline asm with the load plus a barrier, with a fixup handler on both? If you are lucky, the barrier instruction would reliably trigger the fault
<marex>
arnd: I tried a few barriers, yes, none of them triggered the fault though
<marex>
arnd: have you got a barrier instruction in mind ? dsb or dmb I guess ?
<arnd>
marex: I'm not an expert on barriers, I was thinking isb though, which should flush the pipeline
<marex>
arnd: I had isb() there already, but, lemme double-check that
<arnd>
or alternatively something that has a dependency on the data value
<marex>
arnd: I think that dependency is what triggered the abort in my case indeed
mcoquelin has quit [Ping timeout: 250 seconds]
mcoquelin has joined #armlinux
headless has joined #armlinux
<marex>
arnd: that isb in hand-rolled assembler might just be it. it's gonna look great in driver code too :)
<marex>
arnd: thank you
<arnd>
marex: anything's better than just disabling the exceptions as some pci host drivers do
<marex>
arnd: pci ... sigh ...
amitk has joined #armlinux
System_Error has joined #armlinux
headless has quit [Quit: Konversation terminated!]
amitk has quit [Ping timeout: 256 seconds]
amitk has joined #armlinux
Tokamak has joined #armlinux
amitk has quit [Ping timeout: 250 seconds]
Tokamak has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
mripard has quit [Read error: Connection reset by peer]