sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv
littlebo1eep has quit [Ping timeout: 240 seconds]
littlebo1eep has joined #riscv
littlebobeep has quit [Ping timeout: 240 seconds]
pecastro has quit [Ping timeout: 246 seconds]
lemoniter has quit [Ping timeout: 244 seconds]
dh` has quit [Quit: brb]
sobkas has quit [Quit: sobkas]
dh` has joined #riscv
dh` has quit [Ping timeout: 255 seconds]
jacklsw has joined #riscv
jack_lsw has joined #riscv
jacklsw has quit [Ping timeout: 240 seconds]
aerkiaga has quit [Remote host closed the connection]
jack_lsw1 has joined #riscv
jack_lsw has quit [Ping timeout: 240 seconds]
seninha has quit [Remote host closed the connection]
littlebo1eep has quit [Ping timeout: 240 seconds]
motherfsck has quit [Ping timeout: 240 seconds]
dh` has joined #riscv
dh` has quit [Client Quit]
yeirr has joined #riscv
dh` has joined #riscv
jack_lsw1 has quit [Quit: Back to the real world]
mlaga97 has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
mlaga97 has joined #riscv
littlebobeep has joined #riscv
motherfsck has joined #riscv
<muurkha> dh`: heh, I was wondering what happened with brb
<muurkha> dh`: well, you were warning me more about the issues of moribund transactions looping forever or violating memory safety because they had seen things no mortal transaction must ever see. you might have brought up the risks of deadlock in this situation but I'm not sure
marcj has quit [Ping timeout: 246 seconds]
marcj has joined #riscv
haise01 has joined #riscv
BootLayer has joined #riscv
littlebobeep has quit [Ping timeout: 240 seconds]
jjido has joined #riscv
jacklsw has joined #riscv
lemoniter has joined #riscv
lemoniter has quit [Remote host closed the connection]
jack_lsw has joined #riscv
jacklsw has quit [Ping timeout: 255 seconds]
jack_lsw1 has joined #riscv
jack_lsw has quit [Ping timeout: 244 seconds]
strlst has joined #riscv
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
aerkiaga has joined #riscv
yeirr has quit [Ping timeout: 255 seconds]
jack_lsw2 has joined #riscv
jack_lsw1 has quit [Ping timeout: 240 seconds]
yeirr has joined #riscv
jack_lsw2 has quit [Quit: Back to the real world]
jmdaemon has quit [Ping timeout: 244 seconds]
jjido has joined #riscv
yeirr has quit [Ping timeout: 255 seconds]
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
pecastro has joined #riscv
littlebobeep has joined #riscv
littlebobeep has quit [Ping timeout: 240 seconds]
indy has quit [Quit: ZNC 1.8.2 - https://znc.in]
indy has joined #riscv
littlebobeep has joined #riscv
indy_ has joined #riscv
indy has quit [Ping timeout: 258 seconds]
indy_ has quit [Ping timeout: 240 seconds]
indy has joined #riscv
indy has quit [Ping timeout: 240 seconds]
indy has joined #riscv
indy has quit [Ping timeout: 246 seconds]
indy has joined #riscv
BootLayer has quit [Quit: Leaving]
theruran has quit [Quit: Connection closed for inactivity]
raym has quit [Ping timeout: 255 seconds]
raym has joined #riscv
seninha has joined #riscv
bauruine has joined #riscv
TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM_ has joined #riscv
jjido has joined #riscv
littlebobeep has quit [Ping timeout: 240 seconds]
BootLayer has joined #riscv
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
yeirr has joined #riscv
ldevulder has joined #riscv
sobkas has joined #riscv
eroux has quit [Read error: Connection reset by peer]
eroux has joined #riscv
eroux has quit [Client Quit]
nvmd has joined #riscv
gorgonical has joined #riscv
<gorgonical> A few questions: before an MMU is enabled, is there anything else other than a PMP that would prevent access to any memory?
<gorgonical> In Linux's head.S (but should be applicable elsewhere), trampoline page tables are loaded before the real kernel page tables. The only thing that happens between these two loads is setting up a trap vector holding pen and re-loading the global pointer. Anyone got any insights on this?
<gorgonical> Actually, the only thing that happens between them is re-loading the global pointer
vagrantc has joined #riscv
bgamari_ is now known as bgamari
bauruine has quit [Remote host closed the connection]
adjtm has joined #riscv
<palmer> gorgonical: PMAs could also prevent no-translation accesses (ie, M-mode or S-mode with the MMU off)
vagrantc has quit [Ping timeout: 260 seconds]
ldevulder has quit [Quit: Leaving]
littlebobeep has joined #riscv
vagrantc has joined #riscv
<gorgonical> Why would I need to reload the global pointer? There's a line here that makes very little sense to me
<gorgonical> We load the global pointer in _start_kernel. Then when we're here, after changing to trampoline page tables, we load it again?
littlebobeep has quit [Ping timeout: 240 seconds]
yeirr has quit [Ping timeout: 260 seconds]
yeirr has joined #riscv
littlebobeep has joined #riscv
littlebobeep has quit [Ping timeout: 240 seconds]
jjido has joined #riscv
nvmd has quit [Quit: WeeChat 3.5]
yeirr has quit [Quit: yeirr]
littlebobeep has joined #riscv
<palmer> yes, I see your email ;)
<palmer> so I wrote something, but have now managed to convince myself that head.S is just a broken mess
<muurkha> heh
<muurkha> but it's a broken mess that happens to work on existing hardware, right?
<muurkha> at least usually?
<palmer> it's more of an issue with undocumented interfaces between the kernel and firmware
<palmer> we all test against each other (ie, I test OpenSBI and Atish/Anup/Alistair test Linux) so it works, but the edge cases that OpenSBI isn't triggering look to have bit rotted in Linux
littlebobeep has quit [Ping timeout: 240 seconds]
JTL is now known as JLT
JLT is now known as JTL
littlebobeep has joined #riscv
vagrantc has quit [Quit: leaving]
seninha has quit [Quit: Leaving]
<muurkha> ugh
<strlst> can I ask dumb questions? what is really the purpose of head.S? to me it looks like it initializes page tables, registers and the FPU (if there is one)
<strlst> also, what is head.S called by? does a bootloader jump into it?
<strlst> just trying to learn
<la_mettrie> there's no dumb questions #here (...only questions that reveal you're dump)
jmdaemon has joined #riscv
<strlst> muurkha: hi there, thanks again for sending me your "my evolution as a programmer" post; there's many interesting things on the mailing list
<courmisch> it's the kernel entrypoint for the boot CPU
<courmisch> so yeah, it gets called by the previous stage of the boot process
<strlst> okay, makes sense :)
<strlst> good that my understanding is not too far off
<strlst> thank you courmisch
BootLayer has quit [Quit: Leaving]
mahmutov has joined #riscv
<gorgonical> palmer: thanks very much for the quick reply, that clears up a lot of things. I think the bugs you alluded are above my head a little bit, at least for right now. I hadn't been considering that the bootloader may have set up paging and so the divergent code paths you mentioned seemed superfluous to me
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
<muurkha> strlst: I'm glad you're enjoying it!
Xark has quit [Ping timeout: 255 seconds]
Xark has joined #riscv
jjido has joined #riscv
seninha has joined #riscv
<muurkha> dh`: do you know of anyone else who's tried to use software transactional memory to guarantee real-time bounds on response latency?
<muurkha> I've found a paper from 02010 by Schoeberl, Brandner, and Vitek on *hardware* TM
<dh`> no
<dh`> but I haven't looked at any of the literature in ~15 years because it became clear that it was unrelated to the things I was doing and also chasing off in unhelpful directions
<dh`> the occasional thing that's come by me anyway in the meantime has done nothing to alter that perception :-/
<conchuod> heh palmer - they are pretty fast w/ the accs :)
<dh`> (and i haven't had time to look at my project since ~2007)
<conchuod> I really need to figure out the screen name -> real name mappings here
<palmer> conchuod: you're talking about the k.o sysadmins giving accounts?
<conchuod> ye since I figured it'd take a while but you said they were quick
<palmer> gorgonical: well, looks like the bootloader folks don't actually do that in practice because that code in Linux has bit-rotted, so maybe they are surpurflous now ;)
<palmer> conchuod: ya, Konstandin is pretty great
<palmer> there's a ton of magic in the kernel.org git repos too, it's mostly invisible to users
<conchuod> You mean the shared object stuff so that it reduces the bandwidth requirements for kernel trees?
<palmer> also there's some security stuff
<palmer> and IIRC some stuff related to autobuilders not having to poll, though that might just be email-based
<muurkha> dh`: oh interesting, what were the unhelpful directions, and what were you interested in doing?
<dh`> the unhelpful direction as I think I've ranted about before is trying to get the implementation to figure out the locking granularity for you
<dh`> and I was interested in large-scale transactions, like whole system calls or beyond
<dh`> and pretty much all the TM work is about trying to replace spinlocks with microtransactions
<dh`> or was; but as I said, I haven't seen any evidence this has changed
<conchuod> palmer: if there is some "autobuilders dont have to poll" I should figure out what the story is there - if even just to get my own CI to trigger
drmpeg has quit [Ping timeout: 250 seconds]
<palmer> IIUC there's at least some shim that will turn every merged patch into an email, and some folks have autobuilders that trigger that way
littlebobeep has quit [Ping timeout: 240 seconds]
dh` has quit [Quit: brb]
dh` has joined #riscv
drmpeg has joined #riscv
<muurkha> dh`: aha, I see
<muurkha> how would you go about figuring out mechanically what the lock or transaction granularity needed to be? I guess you'd at a minimum need a formal statement of the invariants that the concurrency control was supposed to preserve, and a proof that a candidate granularity did in fact preserve those properties under all possible interleavings?
seninha has quit [Ping timeout: 240 seconds]
<muurkha> conchuod: in general if the autobuilder's correctness depends on eventually noticing a commit, it has to poll
<dh`> in the transactional memory world, you don't, it magically falls out of the implementation
<muurkha> well, it doesn't really
<dh`> (and this is the chief reason the implementations suck, AFAICT)
<dh`> in general when writing concurrent code, it's a design choice
<muurkha> you can still have a transaction that is too big and thus gets retried when it shouldn't be
<muurkha> or break up a big transaction into two or more smaller transactions, introducing a bug
<muurkha> conchuod: I mean if you're doing some kind of push notification from your version control server, you could conceivably declare that it's the version control server's responsibility if it fails to eventually push a notification to the autobuilder
<dh`> well
<dh`> what needs to be in a transaction is a different question
<muurkha> is it? isn't it the same question?
<dh`> no!
<muurkha> I mean, why isn't it?
<dh`> maybe nobody understands this and that's why they're all off in the weeds
<muurkha> conchuod: which means that if you ever remove the autobuilder you need to edit the configuration of the push thingy on the version control system to keep it from retrying the autobuilder every hour for the next ten years
<muurkha> well I certainly don't!
<dh`> anyway consider a generic RDBMS
<muurkha> I can't speak for everyone else
littlebobeep has joined #riscv
<dh`> well, you're not expected to necessarily, faculty leading research programs ought to though
<dh`> anyway consider a generic RDBMS
<dh`> you have tables, and you have a lock for each table because there are things you need to lock the whole table for
<muurkha> conceivably, yeah
<muurkha> I mean you don't have to do it that way, that's not totally generic
<dh`> and you probably also have a lock on each row because locking whole tables all the time gets you mysql levels of performance
<muurkha> but it's common
<dh`> when you go to do a transaction, you touch many rows in many tables
<muurkha> okay
<dh`> and in a conventional system you'll lock them as you go
<muurkha> sadly I don't have any faculty to lead my own research program, so if I'm missing something basic I don't have anyone to set me straight
<dh`> and hold the locks until you either commit or abort
<dh`> (well, that's one of the things freenode's for)
<muurkha> sure
<dh`> and the locking infrastructure has a deadlock detector, so if you and someone else try to touch the same things in a different order one of you will get "oops, try again"
<muurkha> right
<dh`> the purpose of the transaction system and the locking system that's part of it is to guarantee consistency
<muurkha> and hopefully whatever application code is banging on the database from without is able to retry correctly
<dh`> but the question of what distinct locks you have is orthogonal
<dh`> that is, whatever locks you have it's the job of the transaction system to let you acquire them as you need them and tell you when you're hosed
<muurkha> oh! you mean the granularity of *data* covered by the lock, not the *code* covered by the lock
<dh`> oh
<dh`> yes
<dh`> sorry that wasn't clear, that changes ... everything I guess :-/
<muurkha> right, so if you just have one giant lock (or, in an STM system, one giant root pointer to the entire universe's state) everything is correct but slow
<muurkha> because you have no concurrency
<dh`> right
<muurkha> if you refine the granularity to tables you get less contention and more concurrency but you also spend more time acquiring and releasing locks
<muurkha> and you can now deadlock
<dh`> yes, and same for rows
<muurkha> and if you refine it further to individual pages the same, and individual rows or even fields the same
<dh`> you might lock groups of rows instead
<muurkha> right
<dh`> or decide whether to lock individual rows depending on the traffic to that table and its size and other considerations
<dh`> in most concurrent systems you make those decisions statically, but databases sometimes get frisky about it
<muurkha> in an STM I think that manifests as how much stuff you put in mutable transactional variables versus how much stuff you put in immutable things they point at
<muurkha> if you implement an RDBMS in an STM, you could reasonably have one FP-persistent immutable red-black tree for each table, say, with a transactional variable pointing at its root, corresponding to the lock-per-table scenario
<muurkha> (except that now you have the option of being lock-free)
<dh`> possibly
<dh`> but there's an implicit assumption in there: each variable is locked independently
<muurkha> well, or maybe not locked
littlebobeep has quit [Ping timeout: 240 seconds]
<muurkha> but sure, you could have a transactional memory that tracks writes at memory page granularity or something
<muurkha> like the card-marking write barrier in Self
<dh`> yeah but that's also wrong
<muurkha> but assuming that each variable is tracked independently, I'm not sure exactly how you would realize row-level locking
<dh`> that gives you "false sharing" and there's a ton of glop written about that
<muurkha> yeah, you'd have false sharing
<muurkha> but that might be okay if it trades off against something else you want
<dh`> tracking each variable independently and expecting the system to figure out the groupings is the bug
<muurkha> like a bound on transaction memory use, say
<muurkha> a goofy thing I was thinking about was tracking the read and write sets as Bloom filters. it doesn't save you from making shadow copies or undo-log entries for the things you write to
<dh`> whereas if you explicitly define locks and groups of things jointly protected by the same lock
strlst has quit [Quit: Lost terminal]
<dh`> then you move back into the real world, and the performance becomes reasonable
<dh`> whether or not they're actual _locks_ per se
<dh`> the units of access
seninha has joined #riscv
<dh`> that is how I read all this anyway
<muurkha> but I was thinking that maybe you could fairly rapidly verify that two transactions didn't conflict with bitwise operations on the Bloom filters
<muurkha> hmm, have you looked at Fraser's FSTM?
<dh`> that is what the systems I've worked on do and they've all been performant
<dh`> no
<muurkha> instead of tracking reads and writes at word granularity it tracks them at block granularity
<muurkha> so you ask the STM to allocate you a transactional block of, say, 128 bytes, and it gives you a raw pointer and a handle
<dh`> it needs to be under programmer control
<muurkha> then you can store the handle in your data structures
<muurkha> in the future when you want to read or write the block you call open_for_read or open_for_write with the handle
<muurkha> and the STM gives you back some pointer (initially, the same one, but not after another write to it has been committed)
<dh`> oh right and anything multiversion has to be done very carefully in other ways to stay performant
<muurkha> what do you mean?
<dh`> copying memory is expensive
<muurkha> yes
<dh`> extra indirections are also expensive in cache misses
<muurkha> yes
<dh`> and multiversion means buying into a lot of both
<muurkha> at least for mutable data, yeah
littlebobeep has joined #riscv
<muurkha> Fraser's FSTM doesn't do the extra indirections
<dh`> especially with pointer-based data structures like you have in low-level code that needs to be performant
<muurkha> I still don't understand how it works
<muurkha> I think the answer is that he magically solves the problem where you want to write something that somebody else is reading with "recursive helping"
<muurkha> I've looked at the implementation and I still don't understand it, but I haven't, like, stepped through it in GDB yet
<muurkha> but in his dissertation he claimed that it gave new speed records for lock-free variants of several important data structures
<muurkha> red-black trees, skip lists, I forget what else
<dh`> with or without parallel scaling?
<muurkha> he was totally focused on parallel scaling, which isn't what I'm interested in
<muurkha> I'm interested in guaranteed responsiveness
<muurkha> on a uniprocessor!
<dh`> if you go to like 256-way it's not that hard to beat traditional schemes because the cache lines all melt down
<muurkha> heh
<muurkha> I think probably at those scales RCU will beat anything CAS-based
<dh`> which is more because spinlocks on top of cache consistency protocols is fundamentally stupid than because there's anything wrong with locking per se
<muurkha> (Fraser's FSTM is made out of CASes)
<muurkha> but Fraser was writing this stuff in 02004 when RCU was pretty new
<dh`> I also don't care about being officially lock-free, people use that as a purity totem
<muurkha> heh
<muurkha> well, for throughput and scaling, it *is* just a purity totem
<muurkha> but what I'm interested in is guaranteed responsiveness, where it very much is not
<muurkha> WCET, baby
<dh`> well, sorta
<dh`> liveness is a system-level property
<dh`> it is not necessarily true that the best way to pursue it is by building the system entirely out of subunits that themselves exhibit guaranteed liveness
<dh`> the obvious example being a distributed system that remains live even though half its components are failing all over the place
<conchuod> muurkha: ye ultimately my CI runs every week or so anyway, but having it run on push would be nice anyway
<dh`> it is difficult to rule out the existence of deadlocks in a complex system built out of locks
<muurkha> conchuod: makes sense, so it's more a handy optimization
<dh`> it's considerably more difficult though to rule out the existence of livelock states or weird dynamic behavior under load
<muurkha> dh`: yeah, I don't need everything in the system to be lock-free, just everything in the paths that have to be responsive
<dh`> and using lock-free structures doesn't help you at all with any of that
<muurkha> I would like it because I think it will make thing simpler
<dh`> (arguably it makes it worse)
<muurkha> hmm, I think a system that suffers livelock doesn't qualify as "lock-free"
<conchuod> A manual trigger requires being on the corporate network, so on push would be handy ;)
<dh`> perhaps not
<muurkha> weird dynamic behavior seems unavoidable in any case
<dh`> but you definitely can't rule out livelock states just by using lock-free subcomponents
<muurkha> agreed!
<dh`> if you specifically care about response latency, the reason the traditional approach is to make everything simple and dumb and slow is that it becomes deterministic that way, and that's more important
<muurkha> well, I don't want *everything* to be simple and dumb and possibly slow, I want the things that need to be responsive to be that way
<muurkha> I want them to be able to coexist with things that aren't simple and dumb without those other things potentially blocking them
<dh`> lock-free structures are not simple :-)
<muurkha> their *implementation* isn't
<muurkha> but an STM is very simple indeed to *use*
<dh`> yes, but it's the implementation that matters for latency
<dh`> hence stuff like disabling ram caches
<dh`> those are extremely simple to use!
<muurkha> I have good luck there
<muurkha> my processor doesn't have one in the first place!
<dh`> anyway I should get back to work
<muurkha> thanks for the chat!
<dh`> yw :-)
adjtm has quit [Read error: Connection reset by peer]
adjtm_ has joined #riscv
mahmutov has quit [Ping timeout: 240 seconds]
littlebobeep has quit [Ping timeout: 240 seconds]
adjtm_ has quit [Read error: Connection reset by peer]
adjtm_ has joined #riscv
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
haise01 has quit [Ping timeout: 255 seconds]
aerkiaga has quit [Remote host closed the connection]
pecastro has quit [Ping timeout: 246 seconds]