ChanServ changed the topic of #armlinux to: ARM kernel talk [Upstream kernel, find your vendor forums for questions about their kernels] | https://libera.irclog.whitequark.org/armlinux
prabhakarlad has quit [Quit: Client closed]
XV8 has quit [Quit: Textual IRC Client: www.textualapp.com]
mrlemke has quit [Quit: Bye.]
Tokamak has quit [Ping timeout: 256 seconds]
Tokamak has joined #armlinux
nsaenz has quit [Ping timeout: 268 seconds]
apritzel has quit [Ping timeout: 256 seconds]
<palmer> so I find myself puzzled about the arm64 memory model again. This time it's about store->fetch vs store->load ordering, which I can't really find any documentation on. I wrote it up a bit here <https://github.com/palmer-dabbelt/jdk/commit/bad600b90295ac6ba5cf35fd5a0bd6d178b7e7cf>, but I'm not even quite sure that makes sense...
Tokamak has quit [Ping timeout: 240 seconds]
<HdkR> The weak memory model is great. I love it. It never causes me problems ever. <wince/>
<palmer> hey, at least you guys have one -- we've just got a "hopefully it works" on the fetch side of things ;)
Tokamak has joined #armlinux
<HdkR> Is that the riscv memory model? :D
<palmer> ya
<HdkR> That's spooky AF.
<HdkR> riscv has an atomic extension though doesn't it?
<palmer> ya, so the load/store side of things seems pretty solid. it's the fetch side of things where we really don't have much, that's how I got roped into look at java ;)
<HdkR> oh! Instruction fetch. Right
<palmer> I'm not really a memory model person though, hence why I'm asking here... ;)
<HdkR> It also hurts my brain. Just make everything atomic is my response :P
Pali has quit [Ping timeout: 256 seconds]
ravan has joined #armlinux
apritzel has joined #armlinux
ravan has quit [Read error: Connection reset by peer]
ravan has joined #armlinux
guillaume_g has joined #armlinux
nsaenz has joined #armlinux
prabhakarlad has joined #armlinux
Pali has joined #armlinux
guillaume_g has quit [Quit: Konversation terminated!]
guillaume_g has joined #armlinux
ravan has quit [Remote host closed the connection]
ravan has joined #armlinux
headless has joined #armlinux
guillaume_g has quit [Quit: Konversation terminated!]
russ has quit [Ping timeout: 240 seconds]
headless has quit [Ping timeout: 240 seconds]
headless has joined #armlinux
russ has joined #armlinux
headless has quit [Quit: Konversation terminated!]
sicelo has quit [Quit: Bye!]
sicelo_ is now known as sicelo
rockosov has joined #armlinux
ravan has quit [Remote host closed the connection]
ravan has joined #armlinux
russ has quit [Ping timeout: 245 seconds]
russ has joined #armlinux
<ndesaulniers> @palmer: talk to Will Deacon?
<ardb> palmer: this has more to do with cache maintenance than with weakly ordered memory
<ardb> you need to clean the D-side to PoU + DSB/ISB and then invalidate the I-cache
<ardb> (unless you have a recent core that doesn't require this)
<palmer> I wasn't sure if WIll was on vacation, so I figured just talking here would be good enough -- this isn't really an emergency or anything
russ has quit [Ping timeout: 240 seconds]
<palmer> ardb: IIUC the issue is that the D-side ordering is just to the inner shared domain, not the point of unification
<palmer> I was talking to one of the Power guys though and he says my example is written pretty poorly, so i'll go try and re-write it
<palmer> I think it made sense after I explained it, but not 100% on that
<ardb> inner shareable and PoU are orthogonal
<ardb> PoU is a level in the cache hierarchy
<ardb> inner shareable is a sharability domain
<ardb> you can ignore the latter in this case as it is all inner shareable
<palmer> OK, ya, that's what I thought
<palmer> the reason I'm bringing up inner sharable is because the code is doing a "dmb ish", and I think it needs to do a "dmb nsh"
<palmer> but I don't really know the arm64 terminology, so I'm never sure how this works
<ardb> no dmb nsh is not broadcast so it does nothing here
<palmer> OK, then now I'm super confused
<ardb> the concern is that the literal value may not be visible to other CPUs after they fetch the new instructions?
<palmer> my concern is that they may fetch the instruction (and thus branch to the load of the new target address) without seeing the store to the new target address
<palmer> not sure if that's the same thing as you're saying
<ardb> ah ok
russ has joined #armlinux
<palmer> so there's one thread doing "store to function pointer", "dmb ish", "store to code buffer, to retarget the jump"
<palmer> and if the executing thread sees that code buffer update before the function pointer is there, then it'll go somewhere else (and unless I'm missing something that makes sure that somewhere else is safe, that's bad)
<ardb> ok
<palmer> IIUC there's some implicit orderings enforced on the executing thread, so no explicit fence is necessary
<palmer> (something about "instruction pull" vs "instruction fetch", haven't really quite sorted that out yet)
<palmer> anyway, I think it's probably sanest to try and write this all up
<palmer> I think I can do a better job than what's there
<ardb> so the cache maintenance is necessary in general but does not affect the issue at hand
<palmer> yes
<palmer> or I guess, the cache maintiance is necessary to make sure that new jump is taken, so for when you clean up the code buffer
<ardb> the question is whether a CPU that does not have the instructions in any of its caches can fetch them and observe the old function pointer but the new instruction
<palmer> yes
<palmer> you can end up at the old function or the new function, just not in the middle ;)
<ardb> yes
<palmer> and I guess that's a wrinkle I need to look at, because if the trampoline is pre-generated to point to the old address (even though it's not executed), then this may be safe
<palmer> didn't see that was the case, but it kind of goes off somewhere and I didn't look closely so I should do that again
<palmer> (that came up in the context of RISC-V, where we have no way no enforce this ordering, as a workaround)
<ardb> i think DMB ISH + ISB should be sufficient here, if i read the ARM ARM correctly
<palmer> OK
<palmer> in this case it's "store; dmb ish; store; isb"
<palmer> you're saing "store; dmb ish; isb; store"?
<ardb> the latter
<palmer> yes, so I think that second one would work as well
<palmer> but IMO the first one isn't sufficient
<ardb> and dsb+isb when doing the cache maintenance
<ardb> ; Coherency example for data and instruction accesses within the same Inner Shareable domain.
<ardb> ; Enter this code with <Wt> containing a new 32-bit instruction,
<ardb> ; to be held in Cacheable space at a location pointed to by Xn.
<ardb> STR Wt, [Xn]
<ardb> DC CVAU, Xn
<ardb> ; Clean data cache by VA to point of unification (PoU)
<ardb> DSB ISH
<ardb> ; Ensure visibility of the data cleaned from cache
<ardb> IC IVAU, Xn
<ardb> ; Invalidate instruction cache by VA to PoU
<ardb> DSB ISH
<ardb> ; Ensure completion of the invalidations
<ardb> ISB
<ardb> ; Synchronize the fetched instruction stream
<ardb> (from the ARM ARM)
<ardb> but i guess you got that part covered
<palmer> ya, so IMO the cleanup phase is just a different question
<ardb> right
<palmer> I didn't find anything broken there, at least in the arm port (trying to do it on RISC-V gets super tricky ;))
<ardb> i'm sure :-)
<palmer> or I guess, trying to do it without the big "IPI to every core and flush the icache" is tricky
<palmer> what we have now is simple, it's garbage ;)
<palmer> anyway, thanks
<palmer> I'll try to write it up so it makes a bit more sense, and then see if I missed something
<palmer> our Oracle CLA isn't in yet, so there's no rush ;)
rockosov has quit [Quit: Connection closed for inactivity]
russ has quit [Ping timeout: 256 seconds]
Nact has joined #armlinux
elastic_dog has quit [Quit: elastic_dog]
elastic_dog has joined #armlinux
russ has joined #armlinux