ChanServ changed the topic of #armlinux to: ARM kernel talk [Upstream kernel, find your vendor forums for questions about their kernels] | https://libera.irclog.whitequark.org/armlinux
Nact has quit [Read error: Connection reset by peer]
cbeznea has joined #armlinux
macromorgan has quit [Read error: Connection reset by peer]
macromorgan has joined #armlinux
milkylainen has quit [Ping timeout: 240 seconds]
iivanov has joined #armlinux
djrscally has joined #armlinux
mraynal has joined #armlinux
frieder has joined #armlinux
frieder has quit [Remote host closed the connection]
frieder has joined #armlinux
nsaenz has joined #armlinux
sszy has joined #armlinux
matthias_bgg has joined #armlinux
milkylainen has joined #armlinux
headless has joined #armlinux
alpernebbi has quit [Ping timeout: 272 seconds]
alexels has joined #armlinux
alpernebbi has joined #armlinux
prabhakarlad has joined #armlinux
<geertu>
mraynal: But the actual DMA controller register block is not part of the System Controller register block, right?
Pali has joined #armlinux
macromorgan_ has joined #armlinux
macromorgan_ is now known as macromorgan
macromorgan has quit [Killed (copper.libera.chat (Nickname regained by services))]
<geertu>
mraynal: So it's about the CFG_DMAMUX register
<geertu>
The only user for that register is ineed the DMAC driver.
<geertu>
Just export a function to access that register? We have similar things in e.g. include/linux/soc/renesas/rcar-rst.h
jlinton has quit [Ping timeout: 256 seconds]
<mraynal>
geertu: yes, the DMAMUX register is the one
<mraynal>
geertu: I was about to add a syscon compatible, a reason for not doing so in the first place?
<geertu>
mraynal: Personally, I don't like syscon compatibles ;-)
<mraynal>
geertu: why? :)
<geertu>
mraynal: It allows more access than you may want to export
<mraynal>
that's true
<geertu>
In addition, what if the RZ/N1 DMAC block is reused on a new SoC, where the DMAMUX register is located at a different offset in the SYSC block?
<mraynal>
well in this case instead of having the provider knowing where the register is, it is the caller responsibility I would say
<mraynal>
I mean, syscon or not you'll have to handle the move either way
<mripard>
mraynal: from an abstraction pov, it shouldn't really be the consumer that has the knowledge of how the provider is laid out though
<mraynal>
mripard: o/
<mraynal>
mripard: agreed, but here we're talking about a DMA controller which tries to access one of 'its' registers that is located in a syscon, maybe the word 'consumer' above was a bit too much
<mraynal>
it's quite common to have platform data structures defining how the registers are laid out depending eg. on the compatible
<mraynal>
having this logic in the clock driver does not make more sense to me
<geertu>
mraynal: the clock driver also does power management
<mripard>
I mean, I don't really know your use case, so maybe it does make sense, but in general syscon is mostly used to punch a hole into all the nice abstractions we have
<mripard>
and it's sad, really :)
<geertu>
mripard: exactly (about the punching)
ravan has quit [Ping timeout: 272 seconds]
prabhakarlad has quit [Quit: Client closed]
headless has quit [Quit: Konversation terminated!]
Amit_T has joined #armlinux
matthias_bgg has quit [Ping timeout: 272 seconds]
Misotauros has quit [Ping timeout: 252 seconds]
monstr has joined #armlinux
headless has joined #armlinux
headless has quit [Quit: Konversation terminated!]
torez has joined #armlinux
elastic_dog has quit [Ping timeout: 240 seconds]
jlinton has joined #armlinux
matthias_bgg has joined #armlinux
jlinton has quit [Ping timeout: 256 seconds]
<ajb-lina->
I'm trying to track down why some QEMU test cases are so slow and I can track it to some guest pc's triggering invalidation of PCs - but the addresses don't show up in kallsysm
<ajb-lina->
Could this be bits of the EFI code?
<ajb-lina->
head /proc/kallsyms
<ajb-lina->
ffff800010000000 T _text
<ajb-lina->
tail -n 1 /proc/kallsyms
<ajb-lina->
ffff800008b00000 T loop_register_transfer [loop]
<ajb-lina->
pc in question 0xffff800011032b5c
<ajb-lina->
hmm kallsysms isn't sorted?
* ajb-lina-
suspects EFI runtime mappings
<ardb>
ajb-lina-: EFI runtime mappings are in the user portion of the VA space
<ardb>
so anything with 0xffff in the top u16 is definitely not EFI runtime stuff
<ajb-lina->
ardb: aside from eBPF does the kernel generate any code in it's VA space? Or could it be a module?
<ajb-lina->
modules seem to go from 0xffff800008b00000 - 0xffff800008e60000 if /proc/modules is to be believed
<ardb>
ajb-lina-: it depends on which kernel version
<ajb-lina->
currently testing with Alpine's 5.15.4-0-lts
<ajb-lina->
gdbserver can see the code and put breakpoints in it - but obviously without decent debug symbols it's hard to figure out whats going on
cengiz_io has quit [Ping timeout: 245 seconds]
cengiz_io has joined #armlinux
<ardb>
ajb-lina-: check /proc/vmallocinfo?
<ardb>
or /sys/kernel/debug/kernel_page_tables if it exists
<ajb-lina->
ardb: vmallocinfo only goes up to 0x00000000ff6646f3
<ardb>
ajb-lina-: those addresses are scrambled i think
<ajb-lina->
ardb: I mean I figure /proc/kallsysms would be contiguous even if it didn't have all symbols in it.. it's just symbols that are exported for modules right?
<ardb>
ajb-lina-: no it has everything
<ardb>
except for stuff that gets inlined etc of course
<ajb-lina->
ardb: so I'm confused - so vmallocinfo shows the memory is allocated for the kernel but it doesn't contain code from the vmlinux?
<ardb>
it does
<ardb>
maybe try nokaslr?
<ajb-lina->
ardb: I'll see if I can interrupt grub
<ajb-lina->
ardb: top 5 ^ 0xffff80001057bf10 is by far the highest offender
* ardb
has no clue what he is looking at
<ardb>
ajb-lina-: SMC 'detection' sounds like it based on some heuristic but I assume QEMU can decide whether an SMC is issued pretty definitively, no?
<ajb-lina->
arnd: yes - if we generate code in a particular page we mark it as such to trigger QEMU's slow path if the page is ever written to
alexels has quit [Quit: WeeChat 3.4]
<ajb-lina->
(notdirty_write and !cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE) in QEMU's cputlb code)
<ardb>
and how is this related to SMC?
alexels has joined #armlinux
<ajb-lina->
ardb: I guess not SMC in the classic sense - but potentially invalidating existing translations because you've changed code in the page
<ajb-lina->
ardb: we don't usually have executable code in the stack frame right?
<ardb>
ajb-lina-: not sure what you mean by 'stack frame'
<ardb>
but we never execute code from the stack
<ajb-lina->
ardb: or the heap?
<ardb>
kernel code is rarely modified
<ajb-lina->
ardb: its probably not kernel space being changed but userspace pages containing code
<ardb>
ajb-lina-: heap is a bit vague, but generally, only code pages are executable, and those all live in the vmalloc area
<ajb-lina->
interestingly most of the invalidations from __arch_copy_to_user occur during boot up
jlinton has quit [Quit: Client closed]
<ajb-lina->
the current address triggering tb invalidations is
<ajb-lina->
ardb: ^ curious - this tells me 2 things, a) sometimes there are no TBs to invalidate so we must have missed clearing a flag somewhere and b) when we do invalidate a TB it's for a kernel routine
<ajb-lina->
and even stranger the code being invalidated is for kmem_cache_alloc
alpernebbi has joined #armlinux
<ajb-lina->
there is certainly a bug (or two) here. I just don't know if it's all QEMU or something has gone very wrong with the kernel
Misotauros has quit [Ping timeout: 256 seconds]
jlinton has joined #armlinux
<ardb>
ajb-lina-: without knowing why QEMU decides to perform invalidation, it is hard to reason about that
<mrutland>
ajb-lina-: are those logs for the attempt to execute or the attempt to write?
<mrutland>
because if i'ts happening at a deterministic addr, if you could find the write, it would be the smoking gun
<mrutland>
Generally I'd be surprised if we were writing to kernel text mappings since our Stage-1 maps those read-only anyhow
<mrutland>
... and so those writes should be limited to boot-time patching / alternatives, static_keys, and kprobes
<mrutland>
are we perhaps patching a boot-time alternative once, but qemu forgets it has done the invalidate, and so *every* subsequent attempt to execute that page results in an invalidate?
<mrutland>
... that could explain why __arch_copy_to_user was triggering this, since we boot-time patch that depending on PAN, etc
<mrutland>
... or at least we used to in older kernels, and I assume this is an older kernel sicne you said it's a test case
alpernebbi has quit [Ping timeout: 256 seconds]
<ajb-lina->
mrutland: attempts to write (triggering QEMUs invalidation of those TBs in the page)
alexels has quit [Quit: WeeChat 2.8]
<ajb-lina->
actually it's not strncpy it copy_to_kernel_nofault
<ajb-lina->
starts at ffff8000102926c4
<ardb>
ajb-lina-: but it is not an attempt to write to that code address, right?
<ardb>
i.e., the hot code path is not being modified, it is being executed
<ajb-lina->
0xffff800010292780 (copy_to_kernel_nofault) is the PC of the code triggering the invalidation, occasionally the tb that gets invalidated starts at 0xffff80001033184c (kmem_cache_alloc)
<ajb-lina->
it makes no sense
<ajb-lina->
although I put a breakpoint at copy_to_kernel_nofault and
<ajb-lina->
let me find one that actually invalidates a TB
<mrutland>
actually, can you bt here?
<ajb-lina->
#0 0xffff800010292788 in ?? () │
<ajb-lina->
#1 0xffff800011661000 in ?? ()
<ajb-lina->
not much really - no debug symbols or fp
<mrutland>
(to see why are we doing a copy_to_kernel_nofault)
<mrutland>
Generally, copy_to_kernel_nofault implies we're patching code, which sort-of implies the right thing is happening here, but I don't know what's going on at a high-level that causes us to call copy_to_kernel_nofault
<ajb-lina->
mrutland: ahh ok - let me dig further but I've just been called to dinner
<ajb-lina->
mrutland: bbs
<mrutland>
I suspect this is kprobes, which is self-modifying-code
<ajb-lina->
ffff800010ae4df0 t aarch64_insn_patch_text_cb
<ajb-lina->
actually probably
<ajb-lina->
ffff800010ae4c30 t __aarch64_insn_write
<ajb-lina->
via arch_jump_label_transform
<ajb-lina->
hand decoding bt from $x30 values it tiresome
monstr has quit [Remote host closed the connection]
<ajb-lina->
could this be kasan?
<ardb>
ajb-lina-: kasan does not rely on code patching, so that seems unlikely
<ajb-lina->
but patching the kernel for every executable run seems odd
<ardb>
ajb-lina-: yes that is unexpected
<ardb>
ajb-lina-: can you rebuild that kernel from source so you have a vmlinux to give to gdb?
<ardb>
ah hold on
<ardb>
"via arch_jump_label_transform"
<ajb-lina->
ardb: sadly my hand built kernels don't exhibit the same wild TB invalidation as the distro ones.. I even built the alpine kernel and ran directly and it didn't
<ardb>
so this is a static key being toggled
<ardb>
maybe this is alpine value add?
<ajb-lina->
it seems a fairly lightly patched kernel