#armlinux on 2022-02-16 — irc logs at libera.irclog.whitequark.org

2021-05-27 16:22 ChanServ changed the topic of #armlinux to: ARM kernel talk [Upstream kernel, find your vendor forums for questions about their kernels] | https://libera.irclog.whitequark.org/armlinux

00:01 Grimler has joined #armlinux

00:15 djrscally has quit [Ping timeout: 240 seconds]

00:45 prabhakarlad has quit [Ping timeout: 256 seconds]

01:32 mraynal has quit [Quit: WeeChat 3.0]

01:37 XV9 has quit [Quit: Textual IRC Client: www.textualapp.com]

01:44 Pali has quit [Ping timeout: 240 seconds]

02:46 Nact has joined #armlinux

03:51 amitk has joined #armlinux

03:58 macromorgan has joined #armlinux

04:30 Nact has quit [Read error: Connection reset by peer]

05:49 cbeznea has joined #armlinux

06:06 macromorgan has quit [Read error: Connection reset by peer]

06:07 macromorgan has joined #armlinux

06:11 milkylainen has quit [Ping timeout: 240 seconds]

07:26 iivanov has joined #armlinux

07:39 djrscally has joined #armlinux

07:44 mraynal has joined #armlinux

07:46 frieder has joined #armlinux

07:46 frieder has quit [Remote host closed the connection]

07:46 frieder has joined #armlinux

07:50 nsaenz has joined #armlinux

08:37 sszy has joined #armlinux

08:38 matthias_bgg has joined #armlinux

08:41 milkylainen has joined #armlinux

08:42 headless has joined #armlinux

08:46 alpernebbi has quit [Ping timeout: 272 seconds]

08:49 alexels has joined #armlinux

08:50 alpernebbi has joined #armlinux

08:59 prabhakarlad has joined #armlinux

09:07 <geertu> mraynal: But the actual DMA controller register block is not part of the System Controller register block, right?

09:08 Pali has joined #armlinux

09:08 macromorgan_ has joined #armlinux

09:08 macromorgan_ is now known as macromorgan

09:08 macromorgan has quit [Killed (copper.libera.chat (Nickname regained by services))]

09:09 <geertu> mraynal: So it's about the CFG_DMAMUX register

09:12 <geertu> The only user for that register is ineed the DMAC driver.

09:13 <geertu> Just export a function to access that register? We have similar things in e.g. include/linux/soc/renesas/rcar-rst.h

09:13 jlinton has quit [Ping timeout: 256 seconds]

09:14 <mraynal> geertu: yes, the DMAMUX register is the one

09:14 <mraynal> geertu: I was about to add a syscon compatible, a reason for not doing so in the first place?

09:20 <geertu> mraynal: Personally, I don't like syscon compatibles ;-)

09:21 <mraynal> geertu: why? :)

09:24 <geertu> mraynal: It allows more access than you may want to export

09:25 <mraynal> that's true

09:25 <geertu> In addition, what if the RZ/N1 DMAC block is reused on a new SoC, where the DMAMUX register is located at a different offset in the SYSC block?

09:26 <mraynal> well in this case instead of having the provider knowing where the register is, it is the caller responsibility I would say

09:26 <mraynal> I mean, syscon or not you'll have to handle the move either way

09:30 <mripard> mraynal: from an abstraction pov, it shouldn't really be the consumer that has the knowledge of how the provider is laid out though

09:30 <mraynal> mripard: o/

09:32 <mraynal> mripard: agreed, but here we're talking about a DMA controller which tries to access one of 'its' registers that is located in a syscon, maybe the word 'consumer' above was a bit too much

09:33 <mraynal> it's quite common to have platform data structures defining how the registers are laid out depending eg. on the compatible

09:34 <mraynal> having this logic in the clock driver does not make more sense to me

09:35 <geertu> mraynal: the clock driver also does power management

09:36 <mripard> I mean, I don't really know your use case, so maybe it does make sense, but in general syscon is mostly used to punch a hole into all the nice abstractions we have

09:36 <mripard> and it's sad, really :)

09:36 <geertu> mripard: exactly (about the punching)

10:02 ravan has quit [Ping timeout: 272 seconds]

10:39 prabhakarlad has quit [Quit: Client closed]

10:47 headless has quit [Quit: Konversation terminated!]

11:36 Amit_T has joined #armlinux

11:38 matthias_bgg has quit [Ping timeout: 272 seconds]

12:40 Misotauros has quit [Ping timeout: 252 seconds]

13:05 monstr has joined #armlinux

13:14 headless has joined #armlinux

14:15 headless has quit [Quit: Konversation terminated!]

14:16 torez has joined #armlinux

14:38 elastic_dog has quit [Ping timeout: 240 seconds]

14:43 jlinton has joined #armlinux

14:52 matthias_bgg has joined #armlinux

15:16 jlinton has quit [Ping timeout: 256 seconds]

15:25 <ajb-lina-> I'm trying to track down why some QEMU test cases are so slow and I can track it to some guest pc's triggering invalidation of PCs - but the addresses don't show up in kallsysm

15:26 <ajb-lina-> Could this be bits of the EFI code?

15:27 <ajb-lina-> head /proc/kallsyms

15:27 <ajb-lina-> ffff800010000000 T _text

15:27 <ajb-lina-> tail -n 1 /proc/kallsyms

15:27 <ajb-lina-> ffff800008b00000 T loop_register_transfer [loop]

15:28 <ajb-lina-> pc in question 0xffff800011032b5c

15:28 <ajb-lina-> hmm kallsysms isn't sorted?

15:37 * ajb-lina- suspects EFI runtime mappings

15:47 <ardb> ajb-lina-: EFI runtime mappings are in the user portion of the VA space

15:48 <ardb> so anything with 0xffff in the top u16 is definitely not EFI runtime stuff

15:49 <ajb-lina-> ardb: aside from eBPF does the kernel generate any code in it's VA space? Or could it be a module?

15:51 <ajb-lina-> modules seem to go from 0xffff800008b00000 - 0xffff800008e60000 if /proc/modules is to be believed

15:51 <ardb> ajb-lina-: it depends on which kernel version

15:52 <ajb-lina-> currently testing with Alpine's 5.15.4-0-lts

15:52 <ajb-lina-> gdbserver can see the code and put breakpoints in it - but obviously without decent debug symbols it's hard to figure out whats going on

15:53 cengiz_io has quit [Ping timeout: 245 seconds]

15:53 cengiz_io has joined #armlinux

15:54 <ardb> ajb-lina-: check /proc/vmallocinfo?

15:54 <ardb> or /sys/kernel/debug/kernel_page_tables if it exists

15:56 <ajb-lina-> ardb: vmallocinfo only goes up to 0x00000000ff6646f3

15:57 <ardb> ajb-lina-: those addresses are scrambled i think

15:58 <ardb> echo 1>/proc/sys/kernel/kptr_restrict

15:59 <ajb-lina-> 0xffff800000000000 - 0xffff800020000000/0xfffffbffeffc6000

15:59 <ajb-lina-> might be in there

16:01 <ajb-lina-> localhost:/# cat /proc/vmallocinfo | grep 0xffff8000110

16:01 <ajb-lina-> 0xffff800010f70000-0xffff800011060000 983040 paging_init+0x29c/0x944 phys=0x0000000068970000 vmap

16:01 <ajb-lina-> 0xffff800011060000-0xffff8000114a0000 4456448 paging_init+0x308/0x944 phys=0x0000000068a60000 vmap

16:01 <ardb> ok so it is inside the core kernel'

16:03 <ajb-lina-> ardb: I mean I figure /proc/kallsysms would be contiguous even if it didn't have all symbols in it.. it's just symbols that are exported for modules right?

16:03 <ardb> ajb-lina-: no it has everything

16:04 <ardb> except for stuff that gets inlined etc of course

16:05 <ajb-lina-> ardb: so I'm confused - so vmallocinfo shows the memory is allocated for the kernel but it doesn't contain code from the vmlinux?

16:05 <ardb> it does

16:05 <ardb> maybe try nokaslr?

16:07 <ajb-lina-> ardb: I'll see if I can interrupt grub

16:13 <ajb-lina-> hot pc is now 0xffff80001057bf10

16:13 <ajb-lina-> 0xffff800010010000-0xffff800010b00000 11468800 paging_init+0x208/0x944 phys=0x0000000067a10000 vmap

16:14 jlinton has joined #armlinux

16:14 <ajb-lina-> cat /proc/kallsyms | grep 0xffff800010 - nothing

16:14 <ajb-lina-> cat /proc/modules | grep 0xffff800010 - nothing

16:15 <ardb> kallsyms doesn't have the leading 0x IIRC

16:15 <ajb-lina-> doh!

16:16 <ajb-lina-> cat /proc/kallsyms | grep ffff80001057b

16:16 <ajb-lina-> ffff80001057bd80 T __arch_copy_to_user

16:16 <ajb-lina-> ffff80001057bfa0 T csum_ipv6_magic

16:16 prabhakarlad has joined #armlinux

16:18 <ajb-lina-> well I guess __arch_copy_to_user - but I would expect that usually to be scribbling over code pages?

16:19 <ardb> ?

16:19 <ardb> why would it be doing that?

16:19 Misotauros has joined #armlinux

16:20 <ajb-lina-> ardb: the slowdown is because QEMU's SMC detection is triggering, causing it to flush translations (a lot)

16:20 <ajb-lina-> TB invalidate count 550447

16:20 <ardb> maybe some spectre/meltdown mitigation triggering?

16:20 <ajb-lina-> compared to maybe 7000 on my debian bulleye test image

16:21 <ardb> ajb-lina-: i'd assume you can trace the source of the SMC call no?

16:21 <ajb-lina-> ardb: that was an early theory - I enabled KAISER in my test kernel but couldn't replicate those high numbers

16:22 <ajb-lina-> http://ix.io/3PLU

16:22 <ajb-lina-> ardb: top 5 ^ 0xffff80001057bf10 is by far the highest offender

16:22 * ardb has no clue what he is looking at

16:26 <ardb> ajb-lina-: SMC 'detection' sounds like it based on some heuristic but I assume QEMU can decide whether an SMC is issued pretty definitively, no?

16:27 <ajb-lina-> arnd: yes - if we generate code in a particular page we mark it as such to trigger QEMU's slow path if the page is ever written to

16:28 alexels has quit [Quit: WeeChat 3.4]

16:28 <ajb-lina-> (notdirty_write and !cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE) in QEMU's cputlb code)

16:29 <ardb> and how is this related to SMC?

16:29 alexels has joined #armlinux

16:30 <ajb-lina-> ardb: I guess not SMC in the classic sense - but potentially invalidating existing translations because you've changed code in the page

16:30 <ajb-lina-> ardb: we don't usually have executable code in the stack frame right?

16:32 <ardb> ajb-lina-: not sure what you mean by 'stack frame'

16:32 <ardb> but we never execute code from the stack

16:32 <ajb-lina-> ardb: or the heap?

16:32 <ardb> kernel code is rarely modified

16:33 <ajb-lina-> ardb: its probably not kernel space being changed but userspace pages containing code

16:33 <ardb> ajb-lina-: heap is a bit vague, but generally, only code pages are executable, and those all live in the vmalloc area

16:34 <ajb-lina-> interestingly most of the invalidations from __arch_copy_to_user occur during boot up

16:34 jlinton has quit [Quit: Client closed]

16:34 <ajb-lina-> the current address triggering tb invalidations is

16:34 <ajb-lina-> localhost:~# cat /proc/kallsyms | grep ffff8000102927

16:34 <ajb-lina-> ffff8000102927e4 T strncpy_from_kernel_nofault

16:35 <ajb-lina-> I guess I should tweak my QEMU trace to show the affected page address

16:36 <ajb-lina-> tb_invalidate_phys_page_fast page:0x27d32fbc/4 pc:0xffff800010292778

16:36 <ajb-lina-> tb_invalidate_phys_page_fast page:0x27d333d8/4 pc:0xffff800010292778

16:36 <ajb-lina-> I think those are physical addresses

16:36 headless has joined #armlinux

16:38 sudeepholla has quit [Ping timeout: 250 seconds]

16:43 <ajb-lina-> ardb: is there anyway to figure out what virtual userspace addresses will be using those physical pages?

16:45 <ardb> but the flushing is related to code translations, right?

16:45 <ardb> and the hotspot is in the kernel code?

16:45 <ajb-lina-> heh a single cat /proc/self/maps triggers about 40 pages

16:45 <ajb-lina-> ardb: the kernel pc is what triggered the flush

16:46 <ajb-lina-> (well close - it's actually the pc of the start of the tb that triggered the flush)

16:47 <ajb-lina-> it seems to repeat several times

16:47 <ajb-lina-> http://ix.io/3PM9

16:48 <ardb> ajb-lina-: so *why* does it get invalidated each time? can you log that as well?

16:51 sudeepholla has joined #armlinux

17:06 <ajb-lina-> ardb: sure - I shall add some tracepoints to QEMU - it is possible we are triggering a QEMU bug

17:25 frieder has quit [Remote host closed the connection]

17:28 <ajb-lina-> http://ix.io/3PMp

17:29 alpernebbi has quit [Ping timeout: 240 seconds]

17:29 <ajb-lina-> ardb: ^ curious - this tells me 2 things, a) sometimes there are no TBs to invalidate so we must have missed clearing a flag somewhere and b) when we do invalidate a TB it's for a kernel routine

17:32 <ajb-lina-> http://ix.io/3PMr

17:32 <ajb-lina-> and even stranger the code being invalidated is for kmem_cache_alloc

17:33 alpernebbi has joined #armlinux

17:34 <ajb-lina-> there is certainly a bug (or two) here. I just don't know if it's all QEMU or something has gone very wrong with the kernel

17:35 Misotauros has quit [Ping timeout: 256 seconds]

17:37 jlinton has joined #armlinux

17:41 <ardb> ajb-lina-: without knowing why QEMU decides to perform invalidation, it is hard to reason about that

17:42 <mrutland> ajb-lina-: are those logs for the attempt to execute or the attempt to write?

17:43 <mrutland> because if i'ts happening at a deterministic addr, if you could find the write, it would be the smoking gun

17:43 <mrutland> Generally I'd be surprised if we were writing to kernel text mappings since our Stage-1 maps those read-only anyhow

17:44 <mrutland> ... and so those writes should be limited to boot-time patching / alternatives, static_keys, and kprobes

17:45 <mrutland> are we perhaps patching a boot-time alternative once, but qemu forgets it has done the invalidate, and so *every* subsequent attempt to execute that page results in an invalidate?

17:46 <mrutland> ... that could explain why __arch_copy_to_user was triggering this, since we boot-time patch that depending on PAN, etc

17:47 <mrutland> ... or at least we used to in older kernels, and I assume this is an older kernel sicne you said it's a test case

17:53 alpernebbi has quit [Ping timeout: 256 seconds]

17:56 <ajb-lina-> mrutland: attempts to write (triggering QEMUs invalidation of those TBs in the page)

17:57 alexels has quit [Quit: WeeChat 2.8]

17:57 <ajb-lina-> actually it's not strncpy it copy_to_kernel_nofault

17:57 <ajb-lina-> starts at ffff8000102926c4

17:58 <ardb> ajb-lina-: but it is not an attempt to write to that code address, right?

17:58 <ardb> i.e., the hot code path is not being modified, it is being executed

18:01 <ajb-lina-> 0xffff800010292780 (copy_to_kernel_nofault) is the PC of the code triggering the invalidation, occasionally the tb that gets invalidated starts at 0xffff80001033184c (kmem_cache_alloc)

18:01 <ajb-lina-> it makes no sense

18:01 <ajb-lina-> although I put a breakpoint at copy_to_kernel_nofault and

18:02 <ajb-lina-> http://ix.io/3PMB

18:03 <mrutland> what's the insn at 0xffff800010292780 ?

18:03 <ajb-lina-> 0xffff800010292780: str w1, [x2]

18:04 <mrutland> can you step to there *then* print w1 and x2? they get altered between the bit in the paste and 0xffff800010292780

18:04 <ajb-lina-> so single stepping while watching my trace point

18:04 <ajb-lina-> => 0xffff800010292780: str w1, [x2]

18:04 <ajb-lina-> (gdb) p/x $x2 │

18:04 <ajb-lina-> $4 = 0xfffffbfffdbfe148

18:04 <ajb-lina-> triggered

18:05 <ajb-lina-> tb_invalidate_phys_page_fast page:0x27d31148/4 pc:0xffff800010292780

18:05 <ajb-lina-> let me find one that actually invalidates a TB

18:05 <mrutland> actually, can you bt here?

18:05 <ajb-lina-> #0 0xffff800010292788 in ?? () │

18:05 <ajb-lina-> #1 0xffff800011661000 in ?? ()

18:05 <ajb-lina-> not much really - no debug symbols or fp

18:05 <mrutland> (to see why are we doing a copy_to_kernel_nofault)

18:07 <mrutland> Generally, copy_to_kernel_nofault implies we're patching code, which sort-of implies the right thing is happening here, but I don't know what's going on at a high-level that causes us to call copy_to_kernel_nofault

18:08 <ajb-lina-> mrutland: ahh ok - let me dig further but I've just been called to dinner

18:08 <ajb-lina-> mrutland: bbs

18:08 <mrutland> I suspect this is kprobes, which is self-modifying-code

18:08 alpernebbi has joined #armlinux

18:10 mort has quit [Quit: The Lounge - https://thelounge.chat]

18:11 mort has joined #armlinux

18:21 Misotauros has joined #armlinux

18:25 sszy has quit [Ping timeout: 240 seconds]

18:27 <ajb-lina-> mrutland: why would that fire every time though?

18:29 <ajb-lina-> mrutland: but yes stepping though the ret gets to

18:30 <ajb-lina-> 0xffff800010ae4d20

18:30 <ajb-lina-> localhost:~# cat /proc/kallsyms | grep ffff800010ae4d

18:30 <ajb-lina-> ffff800010ae4df0 t aarch64_insn_patch_text_cb

18:30 <ajb-lina-> actually probably

18:30 <ajb-lina-> ffff800010ae4c30 t __aarch64_insn_write

18:34 <ajb-lina-> via arch_jump_label_transform

18:35 <ajb-lina-> hand decoding bt from $x30 values it tiresome

18:40 monstr has quit [Remote host closed the connection]

18:42 <ajb-lina-> could this be kasan?

18:43 <ardb> ajb-lina-: kasan does not rely on code patching, so that seems unlikely

18:45 <ajb-lina-> but patching the kernel for every executable run seems odd

18:46 <ardb> ajb-lina-: yes that is unexpected

18:46 <ardb> ajb-lina-: can you rebuild that kernel from source so you have a vmlinux to give to gdb?

18:47 <ardb> ah hold on

18:47 <ardb> "via arch_jump_label_transform"

18:47 <ajb-lina-> ardb: sadly my hand built kernels don't exhibit the same wild TB invalidation as the distro ones.. I even built the alpine kernel and ran directly and it didn't

18:47 <ardb> so this is a static key being toggled

18:48 <ardb> maybe this is alpine value add?

18:49 <ajb-lina-> it seems a fairly lightly patched kernel

18:49 <ajb-lina-> https://git.alpinelinux.org/aports/tree/main/linux-lts

18:49 <ajb-lina-> ardb: so what are static keys for, could is this runtime guided perf tweaking or something?

18:51 <ardb> ajb-lina-: global boolean variables that are modified so rarely [typically] that it pays to patch the code directly instead of using if/else

18:52 <ajb-lina-> ardb: is there a kernel flag that turns this on/off?

18:52 <ardb> CONFIG_JUMP_LABEL

18:53 <ardb> but for obvious reasons, this is a compile time option only :-)

18:54 <ajb-lina-> ardb: hmm odd I have CONFIG_JUMP_LABEL=y in my "good" kernel - so maybe this is a subtle interaction with something else?

18:55 <ardb> ajb-lina-: jump labels are always enabled

18:55 <ardb> the question is which code that uses a jump label gets invoked here, and not on the "good" kernel

18:55 <ardb> hence the question regarding alpine value add

18:57 * ajb-lina- continues working up the call chain

18:57 <ajb-lina-> __jump_label_update

18:58 <ajb-lina-> jump_label_update

18:59 <ajb-lina-> static_key_disable_cpuslocked

18:59 <ajb-lina-> or

18:59 <ajb-lina-> static_key_enable_cpuslocked

19:01 <ajb-lina-> toggle_allocation_gate

19:05 * ajb-lina- builds a kernel with CONFIG_KFENCE_STATIC_KEYS to check

19:13 <ajb-lina-> ok well kfence.sample_interval=0 stops the invalidations everytime I do cat /proc/self/maps

19:13 <ajb-lina-> so now I just need to work out what is happening during boot

19:14 <ajb-lina-> 163968 0xffff800011032b5c

19:15 <ajb-lina-> ffff8000110327d8 T memmap_init_range

19:19 <ajb-lina-> that at least makes sense - but I would hope most of those didn't trigger tb flush because there isn't any code in them

19:33 russ has quit [Ping timeout: 256 seconds]

19:34 russ has joined #armlinux

19:36 Amit_T has quit [Ping timeout: 272 seconds]

19:58 russ has quit [Read error: Connection reset by peer]

19:58 amitk has quit [Ping timeout: 252 seconds]

20:14 russ has joined #armlinux

20:46 djrscally has quit [Quit: Konversation terminated!]

20:49 jlinton has quit [Ping timeout: 256 seconds]

21:17 headless has quit [Quit: Konversation terminated!]

21:23 System_Error has quit [Read error: Connection reset by peer]

21:25 djrscally has joined #armlinux

22:02 abelvesa has quit [Quit: leaving]

22:03 abelvesa has joined #armlinux

22:25 wolfshappen has quit [Ping timeout: 256 seconds]

22:44 torez has quit [Quit: torez]

22:46 XV8 has joined #armlinux

23:10 iivanov has quit [Remote host closed the connection]

23:27 matthias_bgg has quit [Ping timeout: 256 seconds]

23:35 olofj has quit [Ping timeout: 252 seconds]

23:35 pjw has quit [Read error: Connection reset by peer]

23:35 dianders has quit [Read error: Connection reset by peer]

23:35 arnd has quit [Read error: Connection reset by peer]

23:35 broonie has quit [Read error: Connection reset by peer]

23:35 ccaione has quit [Read error: Connection reset by peer]

23:35 unixsmurf has quit [Read error: Connection reset by peer]

23:35 narmstrong has quit [Read error: Connection reset by peer]

23:35 jamestperk has quit [Read error: Connection reset by peer]

23:35 maennich has quit [Read error: Connection reset by peer]

23:35 roxell has quit [Read error: Connection reset by peer]

23:35 mturquette has quit [Read error: Connection reset by peer]

23:35 drewfustini has quit [Read error: Connection reset by peer]

23:35 netonaut_ has quit [Read error: Connection reset by peer]

23:35 robclark has quit [Read error: Connection reset by peer]

23:35 zx2c4 has quit [Read error: Connection reset by peer]

23:35 robher has quit [Read error: Connection reset by peer]

23:36 ccaione has joined #armlinux

23:36 pjw has joined #armlinux

23:36 broonie has joined #armlinux

23:36 unixsmurf has joined #armlinux

23:36 jamestperk has joined #armlinux

23:36 mturquette has joined #armlinux

23:36 roxell has joined #armlinux

23:36 arnd has joined #armlinux

23:36 robclark has joined #armlinux

23:36 robher has joined #armlinux

23:36 dianders has joined #armlinux

23:36 olofj has joined #armlinux

23:36 maennich has joined #armlinux

23:36 drewfustini has joined #armlinux

23:36 narmstrong has joined #armlinux

23:36 netonaut_ has joined #armlinux

23:37 zx2c4 has joined #armlinux

23:46 XV8 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

23:49 djrscally has quit [Quit: Konversation terminated!]