sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv
pecastro has quit [Ping timeout: 272 seconds]
freakazoid332 has joined #riscv
frkzoid has quit [Ping timeout: 272 seconds]
vagrantc has quit [Quit: leaving]
crabbedhaloablut has quit [Quit: No Ping reply in 180 seconds.]
crabbedhaloablut has joined #riscv
yang2 has quit [Ping timeout: 264 seconds]
yang2 has joined #riscv
BootLayer has joined #riscv
wingsorc has quit [Remote host closed the connection]
elastic_dog is now known as Guest722
Guest722 has quit [Killed (zinc.libera.chat (Nickname regained by services))]
elastic_dog has joined #riscv
The_Decryptor has quit [Quit: The Lounge - https://thelounge.chat]
The_Decryptor has joined #riscv
Guest2949 has joined #riscv
Guest2949 has quit [Changing host]
Guest2949 has joined #riscv
Guest2949 is now known as Trifton
Trifton is now known as Trifton_
EchelonX has quit [Quit: Leaving]
jacklsw has joined #riscv
theruran has quit [Quit: Connection closed for inactivity]
davidlt_ has joined #riscv
mobius has quit [*.net *.split]
dionysos has quit [*.net *.split]
muurkha has quit [*.net *.split]
nmeum has quit [Remote host closed the connection]
muurkha has joined #riscv
dilfridge has joined #riscv
nmeum has joined #riscv
mobius has joined #riscv
mz___ has joined #riscv
_whitelogger has joined #riscv
ornx has joined #riscv
tucanae47 has joined #riscv
geist has joined #riscv
pjw has joined #riscv
mithro has joined #riscv
arnd has joined #riscv
NishanthMenon has joined #riscv
h2t has joined #riscv
sauce has joined #riscv
Trifton has joined #riscv
matoro has quit [Ping timeout: 246 seconds]
matoro has joined #riscv
matoro has quit [Changing host]
matoro has joined #riscv
shoragan has quit [Ping timeout: 264 seconds]
mz__ has joined #riscv
mz___ has quit [Ping timeout: 272 seconds]
shoragan has joined #riscv
davidlt_ has quit [Ping timeout: 272 seconds]
mz__ has quit [Quit: Konversation terminated!]
jacklsw has quit [Read error: Connection reset by peer]
tusko has quit [Remote host closed the connection]
tusko has joined #riscv
laanwj has joined #riscv
fanta1 has joined #riscv
pecastro has joined #riscv
qwer has quit [Ping timeout: 272 seconds]
qwer has joined #riscv
rafael_bento has joined #riscv
fanta1 is now known as fanta1|away
fanta1|away is now known as fanta1
hrberg has quit [Ping timeout: 246 seconds]
qwer has quit [Remote host closed the connection]
aerkiaga has joined #riscv
davidlt_ has joined #riscv
jmdaemon has quit [Ping timeout: 255 seconds]
crabbedhaloablut has quit [Remote host closed the connection]
crabbedhaloablut has joined #riscv
BootLayer has quit [Quit: Leaving]
qwer has joined #riscv
qwer has quit [Remote host closed the connection]
qwer has joined #riscv
aerkiaga has quit [Remote host closed the connection]
bjoto has quit [Read error: Connection reset by peer]
pedja has joined #riscv
elastic_dog has quit [Killed (molybdenum.libera.chat (Nickname regained by services))]
elastic_dog has joined #riscv
frkzoid has joined #riscv
freakazoid332 has quit [Ping timeout: 272 seconds]
frkazoid333 has joined #riscv
freakazoid332 has joined #riscv
frkzoid has quit [Ping timeout: 250 seconds]
Andre_H has joined #riscv
frkazoid333 has quit [Ping timeout: 272 seconds]
frkzoid has joined #riscv
freakazoid332 has quit [Ping timeout: 272 seconds]
BootLayer has joined #riscv
awita has joined #riscv
epony has quit [Quit: QUIT]
jacklsw has joined #riscv
freakazoid332 has joined #riscv
frkzoid has quit [Ping timeout: 250 seconds]
Gravis has quit [Ping timeout: 272 seconds]
frkzoid has joined #riscv
freakazoid332 has quit [Ping timeout: 272 seconds]
GenTooMan has quit [Ping timeout: 264 seconds]
GenTooMan has joined #riscv
Gravis has joined #riscv
___nick___ has joined #riscv
awita has quit [Ping timeout: 276 seconds]
___nick___ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
___nick___ has joined #riscv
<Esmil> conchuod: i'm not fussed, but isn't the procedure for dt patches normally to wait for Rob's ack?
rafael_bento has quit [Ping timeout: 250 seconds]
<conchuod> For dts? No
<conchuod> krzysztof will often look at them, Rob rarely does
davidlt_ has quit [Ping timeout: 250 seconds]
freakazoid332 has joined #riscv
frkzoid has quit [Ping timeout: 250 seconds]
___nick___ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
jacklsw has quit [Read error: Connection reset by peer]
pecastro has quit [Ping timeout: 272 seconds]
pecastro has joined #riscv
___nick___ has joined #riscv
___nick___ has quit [Client Quit]
___nick___ has joined #riscv
frkzoid has joined #riscv
freakazoid332 has quit [Ping timeout: 272 seconds]
frkzoid has quit [Ping timeout: 250 seconds]
cwebber has quit [Ping timeout: 260 seconds]
GenTooMan has quit [Ping timeout: 250 seconds]
GenTooMan has joined #riscv
lagash has quit [Quit: ZNC - https://znc.in]
lagash has joined #riscv
qwer has quit [Remote host closed the connection]
vagrantc has joined #riscv
Gravis has quit [Ping timeout: 250 seconds]
frkazoid333 has joined #riscv
mps has quit [Quit: leaving]
cwebber has joined #riscv
davidlt_ has joined #riscv
frkzoid has joined #riscv
frkazoid333 has quit [Ping timeout: 250 seconds]
mps has joined #riscv
Gravis has joined #riscv
freakazoid332 has joined #riscv
frkzoid has quit [Ping timeout: 250 seconds]
dor has joined #riscv
BootLayer has quit [Quit: Leaving]
<muurkha> is my explanation of trap handling in https://news.ycombinator.com/item?id=33301376 correct?
<muurkha> I haven't actually implemented RISC-V trap handling so I know I'm kind of treading on thin ice there
<muurkha> and unfortunately the RISC-V Privileged Architecture manual is not nearly as nicely organized as the user-mode ISA manual; there's no section that says "Here's how trap handling works" or even "the registers affected by the ECALL instruction are the following"
greaser|q has joined #riscv
greaser|q has quit [Changing host]
greaser|q is now known as GreaseMonkey
<dh`> the conventional way to do it is to leave the kernel stack address in sscratch while running in userland, and null when in the kernel
<dh`> then on trap entry you exchange sp with sscratch, if it's null exchange back, then push all the registers on the stack, then you can do real work
<muurkha> oh right because sscratch is preserved between trap handler invocations
<muurkha> how do you exchange sp without using a third register as scratch space?
<dh`> taking a trap disables interrupts and there's an assumption in most environments that it's ok for faults in the trap handler to be nastily lethal
<dh`> that's why the control register access instructions are so complex
<muurkha> hmm, sounds like I need to review Zicsr
<muurkha> to whom should I credit this explanation? dh`, dholland, David Holland, David A. Holland, or "a helpful person"?
<dh`> it is not much headroom and it's not really an optimal arrangement
___nick___ has quit [Ping timeout: 250 seconds]
<dh`> credit jrtc27
<muurkha> I was thinking that sometimes you want recursive interrupt handling
<muurkha> haha
frkzoid has joined #riscv
<dh`> yeah, but you don't turn interrupts back on until you've finished saving state
<muurkha> right
<muurkha> but she has been silent! is she using your terminal?
<dh`> otherwise it's unrecoverable
<dh`> shyeah
<dh`> no, she explained that technique to me a couple years ago after I failed to figure it out for myself
<muurkha> it seems like it would be adequate to save sscratch, sepc, scause, stval, and sip?
frkazoid333 has joined #riscv
<muurkha> well, if you don't mind, I'll explain, "dh` explained the following technique to me, as explained to him by jrtc27"
freakazoid332 has quit [Ping timeout: 250 seconds]
<dh`> if you really want to
<dh`> anyway, no, you have to save the entire register file on interrupt or you lose the prior machine state
frkzoid has quit [Ping timeout: 250 seconds]
<muurkha> well, you do have to save and restore the entire register file eventually, or at least whatever registers you're going to clobber
<dh`> well, in theory you can avoid saving the callee-save registers because your own code should preserve them, but nobody does that
<muurkha> but what I meant was that I think you can re-enable interrupts before that
<muurkha> because if an interrupt comes in before you've saved registers like x8, that interrupt handler is responsible for saving and restoring x8, and then your own handler resumes execution and saves them
<dh`> ah
davidlt_ has quit [Ping timeout: 272 seconds]
<dh`> possibly, but you can't save the control registers without first saving at least one general-purpose register
<dh`> and it's probably better to save a bunch of general-purpose registers and then pull the control registers all at once, unless you know for certain that you're only executing on an in-order cpu
<dh`> also if you are relying on sscratch being null to know when you came from the kernel, you can't turn on interrupts until you null sscratch
dor has quit [Remote host closed the connection]
<dh`> anyway in general in a kernel you want to examine the trap code before you do much of anything else, and it's much nicer to do that in C code, so the normal thing is to just push everything and call into C code, which then figures out what to do.
<dh`> for small embedded device firmware there are different considerations (which is I assume how arm trap handling came to be so weird)
<muurkha> yeah, I was thinking of small embedded devices where, as I understand it, minimizing worst-case interrupt latency is sometimes a key consideration
<dh`> yes
<dh`> in that case everything is different
<muurkha> does swapping x2 and sscratch back if sscratch was null imply that you're saving the registers on the user's stack?
<dh`> no, the idea is to always dump the trapframe on the kernel stack
<dh`> in order to do that, you need to (a) know if you came from userland or not and (b) if you did, locate the kernel stack
<muurkha> that seems more desirable yeah
<dh`> on a multi-cpu machine you need to be able to find the kernel stack without any easy means of knowing which cpu you are
<dh`> and without any headroom to speak of to work too
<dh`> hence sscratch
<muurkha> why do you need to know if you came from userland or not? because if you came from userland the kernel stack pointer is in memory, while if you came from kernel-land it's in x2 (and may not be updated in memory)?
<dh`> if you came from userland, you need to fetch the starting kernel stack from somewhere and then dump registers into it
<dh`> if not, sp already has the stack you want to use
<dh`> and it's incorrect to reload it from the top because that destroys the existing state
<dh`> so yeah
<dh`> on machines with adverse layouts there are ways to cope but it gets expensive
<muurkha> hmm, I forget how you find out what hart you're running on
<dh`> not sure you do, particularly in s-mode
<dh`> tends to be a mainboard function rather than a cpu one
<dh`> also even when there's a way it tends to not be suitable for trap handling
<muurkha> but I guess on a single-hart machine you would overwrite the 0 you swapped from sscratch into x2 with a constant address for the stack pointer (either lui;addi or auipc;addi) and then a load to get the vaue of the stack pointer for memory
<muurkha> *from memory
<dh`> I have not really thought about the specific way you might do it on a uniprocessor riscv
<muurkha> so how do you cope on machines with adverse layouts?
<dh`> you still need to know if you came from userland and there isn't enough headroom to figure that by looking at the status register, so you're still using sscratch to indicate
<dh`> at which point putting anything other than the kernel stack address in there is pretty silly
<dh`> typically you make a per-cpu memory mapping and spill enough there to make room to work
<muurkha> aah, I see
<dh`> on arm you don't need the per-cpu mapping but you pretty much do need a per-cpu spill area
<muurkha> yeah, I don't know enough about arm to know where you might stash its address there
<dh`> arm trap handling is very weird
<dh`> not recommended.
<muurkha> heh
<dh`> there's a different cpu mode for each trap type
<muurkha> well, I'll probably have to be doing it pretty soon, but fortunately on a uniprocessor
<dh`> and some of the registers are bank-switched depending on that mode
<muurkha> and it's a Cortex-M, so I'm not sure if it even has different CPU modes
<muurkha> yeah, bank-switching registers is a nice cheap way to do trap handling
<muurkha> I think the Z80 was the first time I saw that
<dh`> it does if it has traps, which it must
<dh`> it is not nice the way arm did it, and not cheap if you swap the whole register file
<muurkha> yeah, how do you save the registers that are bank-switched out if you decide you need to do a context switch on return?
<muurkha> you need some backdoor
<dh`> basically there's a different return address register for each mode and that's where the pre-trap program counter lands, so you have to save that before switching out of trap mode
<dh`> depending on the version there may be ways to access the other modes' registers
<dh`> I have no idea what model they had in mind when designing it; it isn't consistent with any normal kernel proceedings
<muurkha> if you control the whole system you could reserve one of the ordinary registers for trap handling (x3, say) to get a little more headroom for trap handling
<dh`> most processors you can tell if you read the docs enough that there's a way you're supposed to do things
<muurkha> (on RISC-V, I mean)
<dh`> mips does that
rburkholder has quit [Remote host closed the connection]
<dh`> arm has only 16 registers and they burn one slot by mapping the PC
<dh`> so that's not a realistic option there
<muurkha> yeah
<muurkha> it's also not useful for conventional OSes, which load arbitrary machine code and run it in user mode, relying on the CPU to protect the kernel from user-mode
<dh`> it is if it's part of the ABI
<dh`> also, if you're an OS you at least notionally control the compiler and can rearrange its register handling
<muurkha> well, one question is using x3 as an extra scratch register in trap handlers
<muurkha> which means user-mode code isn't entitled to assume its value is ever preserved
<muurkha> something different is your interrupt handlers relying on your background-execution code to never clobber x3, even temporarily, which the ABI does sort of strongly suggest
<muurkha> I think that at the point you depend on the compiler for security you're out of "conventional OS" territory and you don't really need separate user and kernel mode at all
<dh`> clobbering user registers isn't a security issue
<muurkha> which I think is a very interesting and under-explored design space!
<dh`> unless you leak kernel information
<muurkha> right, but "user" code clobbering "kernel" registers is
<dh`> sure
<dh`> using x3 as a _scratch_ register in trap handling so it's clobbered by every trap is one thing
<muurkha> yeah. but, like, storing the "kernel" stack pointer in x3
<dh`> relying on user code not to change it won't do
<dh`> I used to think single-user systems that don't need protection were an interesting design space
<dh`> then browsers happened
<geist> it's a choice riscv made (though i think this stuff follows pretty closely what MIPS was already doing). there are no banked registers, especially banked SP, that some other arches do to solve this problem
<dh`> it is not the same as mips
<geist> i do wish riscv had just a few more exception entry points, like one from when you were in U mode vs S mode which would also solve it
<dh`> yeah
<geist> the sscratch trickery to figure out which mode you were from is pretty grody
<dh`> every extra config register is more hypervisor overhead and I'm sure that's the reason
<geist> yah and it's tuned for allowing there to be *one* entry vector for everything
<geist> which is a noble goal, but a little unweildy. but so it goes, only a handful of people really ever need to care about this :)
<dh`> I've ranted at length before about how the riscv privileged stuff is all wrong
<muurkha> dh`: I'm not talking about systems that don't need protection; I'm talking about systems that implement the protection in a compiler rather than in hardware
<muurkha> oh really? where can I read your rant?
<dh`> you've probably seen most of it here
<dh`> "implement the protection in a compiler" -> "has no protection" :-)
<muurkha> you mean, just now?
<geist> iirc, microbalze has a similar model but iirc i think the hardware simply blows away the 'k0' register. ie, r15 or whatnot is simply overwritten by exceptions, so it's hardware forced that user had better not touch that reg
<dh`> no, I mean over the last couple years
<muurkha> that's a bummer, because I haven't kept logs :(
<geist> but i think it's cause they're even simpler than riscv. it puts what would be in the epc or whatnot in a GP reg
<muurkha> hmm, I think you can implement working protection in a compiler, at least for some architectures (not for Intel's amd64, obviously)
<dh`> for any architecture
<dh`> why not amd64?
<geist> but the nice thing is you already have a GP reg with the infos you need in it to do whatever logic at the top of your exception
<dh`> the problem is that compilers are too complex to be correct so working doesn't imply secure
<muurkha> it would be feasible for amd64 in the abstract, but there have been a lot of timing leaks in recent years due to speculative execution
<dh`> unless you redo compcert
<geist> also a problem is in general kernels are user runtime agnostic. user space may agree to play along and not trash a reg, but in general kernels cannot and shall not ever trust that user space follows the rules, or runs this or that runtime
<muurkha> and the mitigations for spectre make context switches a lot more expensive
<dh`> oh, in that sense you're out of luck, you will never avoid timing leaks with a compiler
<dh`> or maybe you will but that's a 5-10 year research program in secure compilation
<geist> ie, kernels shall always treat user code as hostile
<muurkha> I think it's feasible to avoid timing leaks with a compiler, but not the kind of compilers you're thinking about ;)
<muurkha> and only on hardware whose timing is pretty well nailed down
<dh`> there is no such hardware you'd want to run a general-purpose os on
<dh`> geist: it's reasonable to have a world where only binaries from the system compiler are allowed, and there are various possible ways to enforce that
<muurkha> well, it certainly excludes anything like Linux, yeah
<geist> one could build some sort of tightly coupled appliance etc where you do that
<dh`> pff, you could do it with unix
<geist> OTOH those sort of systems also are in general not as suceptable to meltdown/spectre stuff because they aren't running arbitrary code
<muurkha> you could do some kind of Unix
<dh`> (require that all binaries come from the system compiler)
<geist> yeah i mean you *could* do that. i just wouldn't want to use it
<muurkha> you could do some kind of Unix while providing protection between processes with only compilation
<dh`> without a lot of work it would basically require making the compiler setugid and there would be a million leaks, but it's at least possible in principle
<geist> but yeah, totally
<muurkha> what I'm thinking is a very simple compiler which compiles from a pretty low-level abstract machine to the physical instruction set
<muurkha> so your compiler for C or Lua or whatever would target the abstract machine, and wouldn't be in the TCB
<dh`> it is hard to have a safe abstract machine that you can target with C
<dh`> oh also someone should have mentioned CHERI by now
<muurkha> well, webassembler kind of did it
<muurkha> yeah, I've been thinking about CHERI a lot
<muurkha> *webassembly
<dh`> I don't think you can write a fully compliant C compiler that targets webassembly
<muurkha> why not?
<dh`> because there are things you're allowed to do in C that are inherently unsafe
<muurkha> being compliant C doesn't mean that those unsafe things do anything useful though
<dh`> but in particular, you can inspect the representation of any data element
<muurkha> webassembly puts all your data elements in a "linear memory" made of bytes
<muurkha> so a single "module" (think "Unix binary") is running on top of a large byte array
<dh`> actually, I take it back. it's absolutely possible, it's just likely prohibitively expensive
<muurkha> current implementations of C on Webassembly are reasonably efficient though
<dh`> they probably aren't fully compliant
<dh`> and that's probably fine, too, but there will be things that don't work
<muurkha> and they don't keep you from, for example, reading an int as 4 or 8 bytes
motherfsck has quit [Ping timeout: 268 seconds]
<muurkha> I admit I haven't looked at how llvm handles tricky cases, but it certainly intends to be ANSI-compliant
Gravis has quit [Quit: No Ping reply in 180 seconds.]
<dh`> what happens if you send a pointer through a pipe?
Gravis has joined #riscv
<dh`> you can make pointers be indexes into the linear memory and mask them at every use
<dh`> but that gets really expensive
<dh`> maybe that expense no longer matters; mostly performance no longer matters
<muurkha> yes, pointers are indexes into the linear memory
<muurkha> it's not usually that expensive because, though that implies a base-address addition and bounds check on every pointer dereference, you can usually hoist those out of inner loops
<dh`> more like "sometimes", loop analysis is Hard
<muurkha> but wasm also has another big advantage, that its virtual machine has local variables that aren't stored in memory
<dh`> and one of the problems with compiling is that the language gives enough latitude that you have to be very conservative
<muurkha> and, as I understand it, you can compile scalar C local variables into wasm local variables unless you take their addresses
<dh`> that's basically the same as registerization
<muurkha> yes
<dh`> just you get 256 or 65536 or whatever registers
<muurkha> right, and you don't have to save and restore them on call and return
<dh`> not explicitly at least
<muurkha> right, if the wasm compiler allocates registers to them, it has to save and restore them!
<muurkha> but it might just leave them in memory and index off a frame pointer
<dh`> er. s/one of the problems with compiling/one of the problems with compiling C/ above
<dh`> anyway
<muurkha> yeah, that's what I thought you meant :)
<muurkha> anyway I think the main design goal for wasm was for llvm to be able to compile any compliant C++ to it and get reasonable performance
motherfsck has joined #riscv
<muurkha> so for example three years ago https://www.usenix.org/system/files/atc19-jangda.pdf found slowdowns of 45% for Firefox and 55% for Chrome for the SPEC CPU benchmarks
<dh`> on this topic, have a look at the rocksalt paper if you haven't yet
<muurkha> rocksalt paper?
<dh`> from... PLDI 2012
<dh`> verified runtime for google native client
<muurkha> aha, thanks!
<muurkha> yeah, native client is where wasm kind of came from
<dh`> right
<muurkha> I think the typical wasm slowdown is lower now than it was three years ago
<dh`> 50% is a lot even today
<muurkha> and I don't think that's based on breaking ANSI C or C++ compliance
<muurkha> I don't know, is it?
<dh`> I would describe 50% as "not performant" but then, I come from an era where we cared about 5%
<dh`> which people definitely do not today
<muurkha> people typically accept 2000% slowdowns by writing single-threaded C or 100% slowdowns by running their applications on the CPU instead of the GPU
<dh`> that is not the same
<muurkha> sometimes on top of an additional 500% slowdown by writing their code in JS or 4000% by writing in in Python
<muurkha> isn't it?
<dh`> python is, not spending six months to parallelize something is a bit different
<muurkha> I guess there probably aren't a lot of situations where compiling your code to wasm helps you to parallelize it across cores. more the opposite actually
<dh`> I mean, in some sense it's a continuum, people choose to write in Python because they think writing in a more performant environment will take much more work
<dh`> (and sometimes they're right)
<dh`> wasm is not concurrent at all iirc
<muurkha> but there might be other abstract machines implemented in a wasm-like fashion that *would* help you parallelize across cores
<dh`> unlikely
<dh`> parallelization is hard
<muurkha> well, in particular, if you can enforce STM with a JIT-compiling virtual machine inside your TCB, there are a substantial number of programs that would become easier to parallelize than using threads-and-locks approaches
<dh`> no?
<muurkha> and I feel that a JIT compiler from a simple virtual machine is a much more appealing place to implement transactional memory than in hardware
<dh`> I mean, apart from STM itself in general not being performant
<muurkha> Keir Fraser's dissertation seemed to show some pretty decent performance numbers for his STM implementation
<muurkha> faster than the non-STM state of the art in many cases, in fact
<dh`> I haven't looked in a long time but it used to be that STM slowdowns were measured in orders of magnitude or at least N times slower, not percentages
<muurkha> this was before the STM work you were doing, I think, so maybe I'm just wrong?
<dh`> idk
<dh`> I am a long way from up to date on that topic
<dh`> and the work I was doing was very non-mainstream
<muurkha> pp. 91-100
<dh`> yeah, looks like 3-4x slowdown for STM
<muurkha> hmm, yeah, it does
cwebber has quit [Remote host closed the connection]
<muurkha> it was his MCAS implementations that were actually faster than the SOTA, not his STM implementations
<dh`> which is not surprising
<muurkha> well, it depends on the task
<dh`> the scaling result in figure 6.3a is interesting but it's also a stupid task
<muurkha> for skip lists (figure 6.1) his FSTM was about 2.5-4 times slower than CAS-based, MCAS-based, or per-node locks
<muurkha> well, let's say about 2.5× slower. it was only Herlihy STM that was 4×
<dh`> anyway, if you are choosing platforms and aggressive scaling is a consideration, you want to be writing in a channel-based language
<muurkha> for binary search trees (figure 6.2) he didn't do an STM implementation, and for 6.3 (red-black trees) he got much better performance with STM than with locks, but didn't try an MCAS-based implementation
<dh`> and forget about locking shared memory
<dh`> red-black trees are pretty much optimized for contention, you can't really conclude much from that
<muurkha> heh
<muurkha> well, his STM seems to have handled the contention better than his lock-based implementation, but I agree that that's not super compelling
<dh`> cache contention is absolutely awful so anything that doesn't do that will win, no matter how slow it is
<muurkha> in absolute terms it doesn't look too bad though
<muurkha> he was getting like 12 microseconds per update or read from a large skip list on 1.2 GHz UltraSPARCs, up to 4 processors or so, so maybe 2000 clock cycles
<muurkha> wait what am I saying
<muurkha> 14000 clock cycles
<dh`> also all of that is obsoleted by rcu-style techniques where reads are free
<dh`> if you only contend for writes, in a _large_ skip list or tree you'll ~never contend
<dh`> at least if accesses are reasonably distributed around the tree
<muurkha> yeah, RCU is a pretty big win there, and it doesn't have the impedance mismatches with things like I/O and locks that transactions do
awita has joined #riscv
<muurkha> but I don't have experience programming with RCU and I have the impression that it's extremely hairy?
<dh`> neither do I, because patents are evil, but yes and no
<dh`> we have been using a related technique in netbsd and the interface is not much different from a lock
<muurkha> that sounds extremely damning ;)
<dh`> but that's not multiversion
<dh`> anything multiversion gets horribly messy unless you have language-level support
pedja has quit [Quit: Leaving]
<dh`> and if you do, it tends to get expensive because part of what you trade off for not having to see the gears turning is control over allocation
<muurkha> yeah, I don't really understand how you're supposed to do allocation in Fraser's FSTM
awita has quit [Client Quit]
epony has joined #riscv
<dh`> my opinion for the moment is that if you have something that really demands massive parallelism, you should write it using messages so you can run it on a shared-nothing cluster supercomputer
<dh`> and if you don't, it's not worth worrying about, and more than 32-64 cores on the desktop is a solution in search of a problem
<dh`> and also, there are very few real user workloads that can actually make use of more than more than a handful of cores
<muurkha> 64 cores is already kind of a lot though
<dh`> and most of _those_ are limited by RAM rather than the core count
<dh`> like make -j
<muurkha> and maybe the reason real user workloads can't actually make use of more than a handful of cores is that they're written in paradigms where parallelizing things takes six months and requires dealing with interfaces that are not much different from a lock :)
<dh`> <dh`> anyway, if you are choosing platforms and aggressive scaling is a consideration, you want to be writing in a channel-based language
<dh`> writing a kernel in a channel-based language is one of those things that's been on my "sometime" project list for years now
<dh`> (held up partly by the lack of a suitable language implementation)
<muurkha> it sounds like fun
<muurkha> you don't like golang or erlang?
<dh`> anyway we have drifted way, way offtopic
<dh`> (though I enjoy these kinds of discussions)
<dh`> go is not suitable for kernels (has a nontrivial runtime and a GC)
<dh`> and erlang has ... issues
<muurkha> true
<muurkha> erlang also has a nontrivial runtime and a GC
<muurkha> and unlike Golang you can't avoid generating garbage all the time
<dh`> for me it fails before that on being dynamically typed
<muurkha> yeah, that's not really my favorite attribute for kernels ;)
<dh`> it should be possible to avoid needing heap allocation at all (except for threads themselves) in a channel language
<dh`> without most of the contortions this requires in an ordinary language
<muurkha> yeah, Erlang is the opposite extreme from that
<muurkha> because everything is immutable and there is no linearity in the type system (because it doesn't have a nontrivial static type system)
<dh`> because anytime you need to have a persistent thing, you just make it a thread
<muurkha> right
<dh`> every vnode is its own thread
<dh`> some real questions about how the kernel provides threads that are written in this, but I'm sure it can be done
<dh`> however, the last thing I need is another kernel project...
fanta1 has quit [Quit: fanta1]
<muurkha> heh
<muurkha> basically the extremist Actors approach
<muurkha> if there's *no* allocation you can just point your frame pointer at the vnode or whatever; the point at which providing threads becomes complicated is when you need to dynamically allocate stacks for them
<dh`> right, but if _all_ you're allocating is threads that's a considerable simplification
<dh`> of course, it probably won't work, since there are such things as strings
mz___ has joined #riscv
<muurkha> Hewitt doesn't shrink from creating an Actor for each string, and if their length is immutable string-append can allocate a new string, but at that point you start really wishing for a global GC
<muurkha> returning back a couple of stack levels, I think there's probably a space of systems where you can get high levels of concurrency pretty easily with memory transactions, and where the cost of the transaction mechanism won't be the system performance bottleneck. but that's just an intuition, not a running system
<muurkha> and transactions potentially give you a principled, safe way to handle failures other than concurrent update conflicts, but I'm probably preaching to the choir here on that count :)
frkzoid has joined #riscv
freakazoid332 has joined #riscv
frkazoid333 has quit [Ping timeout: 250 seconds]
<muurkha> heap allocation in a kernel seems like much less of a non-starter if you have a way to handle failure that doesn't involve either panicking the kernel or blindly forging ahead
<muurkha> Erlang also has that!
frkzoid has quit [Ping timeout: 272 seconds]
freakazoid332 has quit [Ping timeout: 272 seconds]
Trifton has quit [Read error: Connection reset by peer]
Trifton_ has quit [Read error: Connection reset by peer]
Guest8128 has joined #riscv
Guest8128 has quit [Changing host]
Guest8128 has joined #riscv
Trifton has joined #riscv
<dh`> that is really a matter of engineering; there are very few places in a kernel where failures are really unrecoverable, it's just that it can be a giant pain
<dh`> (and re failure and transactions, yes)
jmdaemon has joined #riscv
<muurkha> yeah
aburgess_ has joined #riscv
aburgess has quit [Ping timeout: 260 seconds]
mz___ has quit [Ping timeout: 272 seconds]
Andre_H has quit [Ping timeout: 276 seconds]
elastic_dog has quit [Ping timeout: 272 seconds]
elastic_dog has joined #riscv
mz___ has joined #riscv
<tusko> What companies are worth a look in the risc-v space, particular in processor design?
haritz has quit [Remote host closed the connection]