<geist>
though it being a university project it doesn't have to be particularly balanced or whatnot. they may have just tossed lots of stuff at the wall to see what works
cs_0x6373 has quit [Changing host]
cs_0x6373 has joined #riscv
aburgess has quit [Remote host closed the connection]
aburgess has joined #riscv
jwillikers has joined #riscv
aburgess has quit [Remote host closed the connection]
aburgess has joined #riscv
aburgess has quit [Remote host closed the connection]
tgamblin has joined #riscv
jwillikers has quit [Remote host closed the connection]
mahmutov_ has joined #riscv
devcpu has quit [Quit: leaving]
jwillikers has joined #riscv
mahmutov has quit [Ping timeout: 252 seconds]
devcpu has joined #riscv
mahmutov_ has quit [Quit: WeeChat 3.1]
mahmutov has joined #riscv
chripo_ has joined #riscv
chripo_ has quit [Client Quit]
chripo_ has joined #riscv
chripo_ has quit [Client Quit]
gioyik has joined #riscv
josuah has joined #riscv
<josuah>
hello! I am new to assembly and ISA and try to understand synchronisation through lr.w sc.w [...]
<dh`>
yes...?
<josuah>
I understand running sc.w will succeed only if the lr.w just before did succeed [...]
<dh`>
are you typing [...] or is your client printing that because it's broken?
<josuah>
but it seems like there is the race that is yet to be solved with that
<dh`>
lr always succeeds
<dh`>
the idea with sc is that it only succeeds if nothing else has touched the same memory
<josuah>
as lr.w and sc.w are store and load instructions with special magic that tells if it was atomic
<josuah>
thanks dh`, and yes, [...] was typed out by me
<dh`>
ok just checking
<dh`>
nowadays we have people connecting from all kinds of busted gateways
<winterflaw>
what's the basic atomic mechanism? cache-line locking, or ERG?
<dh`>
up to the implementation afaik
<josuah>
oh, probably why I could not get a hint from the spec (still harsh for me though)
mahmutov has quit [Ping timeout: 240 seconds]
Tranmi has quit [Quit: Leaving]
<dh`>
if you want to understand how it works, it's equivalent to compare-and-swap
<josuah>
so it requires some kind of atomic operation underneath... is there any example?
<dh`>
are you interested in it from the software or hardware side?
<dh`>
for software, it _is_ an atomic operation
<josuah>
maybe "cache-line locking, or ERG?" by winterflaw was some
<josuah>
dh`: hardware I would say
<dh`>
for hardware, cache-line locking is one way to make it go
<dh`>
lr marks the cache line reserved, if anything else grabs it it clears that mark, sc checks if it's still there
<winterflaw>
Intel uses that - I may be wrong, but I think ERG scales better
<winterflaw>
the problem with ERG is if you want performance you need to put pointers you care about in their own ERG
<winterflaw>
on ARM ERG is 8 bytes to 2048 bytes
<winterflaw>
and you can only find out how big if you're in supervisor mode (which is bloody stupid, far as I see it)
<josuah>
dh`: and to avoid race conditions, sc uses one of the available mechanisms to avoid TOCTOU?
<winterflaw>
a lock-free btree element state structure easily has three pointers you care about
<winterflaw>
6kb per element is a lot of overhead
<dh`>
josuah: the mechanism I described is sufficient for that
<winterflaw>
actually you can empirically determine ERG on ARM, from user-mode, which is what I do
<winterflaw>
but that's something the user has to do
<winterflaw>
I can't use it during compile
<dh`>
there are restrictions on what code is allowed to do between lr and sc that make simple implementations feasible
<dh`>
(and if you try to relax those restrictions you are quickly in hardware transactional memory territory and your life will become terrible)
<winterflaw>
Intel disabled those recently
<winterflaw>
unfixable from a security POV
<winterflaw>
no more lock elision...
<josuah>
good note to take dh`
<winterflaw>
(actually I may be mixing up lock elision / transactional - it's been a while)
<josuah>
Looking-up cache-line and ERG will tell me more about how locking works. Many thanks dh` and winterflaw! This, I did not learn from that book...
gioyik_ has joined #riscv
gioyik has quit [Remote host closed the connection]
<winterflaw>
look at the Intel man page for cmpxchg to find out how it works
<winterflaw>
only Intel and I think Itanium use that method
<winterflaw>
on ARM CAS has been implemented by users using load/store with ERG
<winterflaw>
historically
<winterflaw>
maybe it's just a wrapper for that
<jrtc27>
if it's using reservations it's not CAS
<winterflaw>
I could be completely wrong, but I think CAS is independent of the underlying mechanism
<winterflaw>
it's a higher-level concept
<Finde>
there is hardware CAS
pecastro has quit [Ping timeout: 252 seconds]
<winterflaw>
ERG is hardware CAS, too
<winterflaw>
it's hardware which lets you perform CAS
<winterflaw>
so is cache-line locking
<winterflaw>
if you have CAS, you have to have some mechanism to provide atomicity or detect failure of atomicity, so if you say "hardware CAS" you mean there's hardware which is doing this -
<winterflaw>
that's fine, it has to be so -
<winterflaw>
historically it's been ERG
<winterflaw>
I'd be surprised if it changed - ERG works fine and I think is better than cache-line locking
<Finde>
and when I say there is hardware CAS, I mean there is hardware which does CAS and only CAS, no emulation on top of other mechanisms
<solrize>
i have no idea what that is though
Andre_H has quit [Quit: Leaving.]
<jrtc27>
it's arm's name for the size of what riscv calls the "reservation set"
freakazoid333 has quit [Read error: Connection reset by peer]
<jrtc27>
it's just a generalisation of cache line reservation
<jrtc27>
and in practice is always going to be equal to your cache line size because your implementation is just going to be cache line reservation
freakazoid333 has joined #riscv
<winterflaw>
hmm
<winterflaw>
very interesting thought
<winterflaw>
that claim looks on the face of it to be plausable
<winterflaw>
all the processor families I have info for
<winterflaw>
do indeed all seem to have ERG == cache line
<winterflaw>
ARM I've used worse case 2048, they're an exception
<jrtc27>
(arm is more permissive than riscv in what you can do, it allows loads and stores in between the ldrex/strex, but provided they're all within a 512 byte range, which allows you to implement the exclusives monitor as just cache line reservation so long as you're careful about eviction)
<jrtc27>
that upper limit is only there so software has something to rely on
<jrtc27>
if you make it unbounded then there's no way for software to avoid it
<winterflaw>
well, I do absolute minimum between ldrex/strex, for the obvious reasons
<jrtc27>
2048 is stupidly high that it should never matter, but smaller than a page
jwillikers has quit [Remote host closed the connection]
<winterflaw>
I used to work at ARM actually =-)
<winterflaw>
more than twenty years ago though
<josuah>
by writing an userland program, you do not know in advance the actual execution sequence of the multiple process (say for a single-core system)
<josuah>
but at the scale of CPU implementation, it might be possible to have purely sequential things out of which it is much easier to organise locking
<josuah>
I must change how I think about computers in order to understand some lower-level concepts, that is also why I like looking at lower level. :P
valentin has quit [Read error: Connection reset by peer]
<sorear>
berkeley rocket has ECC caches
<sorear>
so *all* writes to the cache are actually read-modify-write operations on cache lines, to update the ECC
<sorear>
it was simple to extend that into an AMO system
<dh`>
if you can do additional loads and stores, it's no longer equivalent to compare-and-swap but can express a few more things
<dh`>
whether those things are useful without being able to do two atomic ops at once, idk.
<dh`>
what I don't understand is why, if the atomicity unit is in fact a cache line, this isn't exposed to software, because it wuold make a lot of lockfree stuff a lot easier and possibly faster
<sorear>
double-width CAS would be somewhat useful at least for being able to implement languages that assume all-the-world-is-x86-or-arm
<dh`>
if you can update both pointers of a doubly linked list at once, atomic list ops become *much* easier
<sorear>
the coherency protocol works with cache lines but in order to guarantee forward progress you can (normally) evict between any two instructions, possibly two instructions executed out of order
<dh`>
right, but being able to do additional accesses to the same cache line should not affect the progress guarantee (until you do ridiculously many such that you can never finish the sequence without getting an interrupt or something)
<sorear>
you'd also need some kind of mechanism to perform multiple writes without worrying about an interrupt in the middle
<sorear>
preferably without exposing store buffer differences to sw
<jrtc27>
bear in mind riscv is designed to operate in a world without cache lines too
<jrtc27>
ie the TCM microcontroller case
<dh`>
you just clear the reservation if you get an interrupt
<sorear>
that handles the front end of the sequence but not the back end
<dh`>
I suppose the problem is that you might need to revert the writes
<dh`>
and that is firmly HTM territory
<sorear>
if the user code looks like load; load; store; store and an interrupt or exceptional condition exists between the stores
<dh`>
right
<dh`>
I guess you could just discard the cache line
<dh`>
if you flush it at lr time, but maybe that would suck
<winterflaw>
double-width CAS lets you implement certain lock-free data structures *without* needing safe memory reclamation
<winterflaw>
but you pretty much want SMR anyway, always, so... bit of a shrug
<sorear>
that's more or less my opinion on it
<winterflaw>
itanium has a nice double width CAS implementation though
<winterflaw>
it compares *one* word, but swaps *two*
<winterflaw>
which is actually all you need for counter/pointer pairs
<winterflaw>
sleep time
<winterflaw>
goodnight
winterflaw has quit [Quit: Leaving]
<dh`>
I expect there are cases where reading several words and writing one atomically would be useful
<dh`>
idk, I've never spent a lot of effort on lock-free structures because there's usually plenty of work to do making basic locked structures correct