#riscv on 2021-09-10 — irc logs at libera.irclog.whitequark.org

2021-08-01 01:31 sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv

00:00 hendursaga has quit [Quit: hendursaga]

00:01 hendursaga has joined #riscv

00:15 riff-IRC has quit [Read error: Connection reset by peer]

00:18 riff-IRC has joined #riscv

00:43 frost has joined #riscv

00:46 jwillikers has quit [Remote host closed the connection]

01:12 ___nick___ has quit [Ping timeout: 265 seconds]

01:19 jonasbits has quit [Ping timeout: 240 seconds]

01:35 tgamblin has quit [Remote host closed the connection]

01:36 tgamblin has joined #riscv

01:41 tgamblin has quit [Remote host closed the connection]

01:47 tgamblin has joined #riscv

01:48 tgamblin has quit [Remote host closed the connection]

01:50 tgamblin has joined #riscv

01:51 tgamblin has quit [Remote host closed the connection]

02:01 tgamblin has joined #riscv

02:04 tgamblin has quit [Remote host closed the connection]

02:05 jacklsw has joined #riscv

02:06 tgamblin has joined #riscv

02:07 tgamblin has quit [Remote host closed the connection]

02:10 tgamblin has joined #riscv

02:11 tgamblin has quit [Remote host closed the connection]

02:13 tgamblin has joined #riscv

02:15 tgamblin has quit [Remote host closed the connection]

02:19 tgamblin has joined #riscv

02:21 tgamblin has quit [Remote host closed the connection]

02:25 tgamblin has joined #riscv

02:27 tgamblin has quit [Remote host closed the connection]

02:29 tgamblin has joined #riscv

02:30 tgamblin has quit [Remote host closed the connection]

02:38 tgamblin has joined #riscv

02:39 tgamblin has quit [Remote host closed the connection]

02:46 tgamblin has joined #riscv

02:47 tgamblin has quit [Remote host closed the connection]

02:50 tgamblin has joined #riscv

02:51 tgamblin has quit [Remote host closed the connection]

02:56 tgamblin has joined #riscv

03:22 TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

03:22 TMM_ has joined #riscv

03:23 tgamblin has quit [Remote host closed the connection]

03:25 tgamblin has joined #riscv

03:26 tgamblin has quit [Remote host closed the connection]

03:29 tgamblin has joined #riscv

03:30 tgamblin has quit [Remote host closed the connection]

03:34 tgamblin has joined #riscv

03:37 tgamblin has quit [Remote host closed the connection]

03:49 tgamblin has joined #riscv

03:51 tgamblin has quit [Remote host closed the connection]

03:53 tgamblin has joined #riscv

03:54 tgamblin has quit [Remote host closed the connection]

04:00 tgamblin has joined #riscv

04:02 tgamblin has quit [Remote host closed the connection]

04:05 jamtorus has joined #riscv

04:07 jellydonut has quit [Ping timeout: 252 seconds]

04:09 compscipunk has quit [Quit: WeeChat 3.2.1]

04:11 tgamblin has joined #riscv

04:13 tgamblin has quit [Remote host closed the connection]

04:15 mahmutov has quit [Ping timeout: 252 seconds]

04:15 tgamblin has joined #riscv

04:16 tgamblin has quit [Remote host closed the connection]

04:18 bgamari has quit [Ping timeout: 265 seconds]

04:21 tgamblin has joined #riscv

04:22 tgamblin has quit [Remote host closed the connection]

04:23 freakazoid343 has joined #riscv

04:25 tgamblin has joined #riscv

04:26 freakazoid333 has quit [Ping timeout: 245 seconds]

04:26 tgamblin has quit [Remote host closed the connection]

04:28 tgamblin has joined #riscv

04:30 tgamblin has quit [Remote host closed the connection]

04:32 tgamblin has joined #riscv

04:33 bgamari has joined #riscv

04:33 tgamblin has quit [Remote host closed the connection]

04:38 tgamblin has joined #riscv

04:39 tgamblin has quit [Remote host closed the connection]

04:42 tgamblin has joined #riscv

04:50 BOKALDO has joined #riscv

05:00 riff-IRC has quit [Remote host closed the connection]

05:01 riff-IRC has joined #riscv

05:01 tgamblin has quit [Remote host closed the connection]

05:05 tgamblin has joined #riscv

05:06 tgamblin has quit [Remote host closed the connection]

05:09 tgamblin has joined #riscv

05:10 tgamblin has quit [Remote host closed the connection]

05:10 <kaddkaka[m]> Which of the tune targets does u74 correspond to? `‘rocket’, ‘sifive-3-series’, ‘sifive-5-series’, ‘sifive-7-series’, ‘size’`

05:14 jonasbits has joined #riscv

05:14 <pierce> Afaik 7 series

05:14 <pierce> U74

05:19 tgamblin has joined #riscv

05:21 tgamblin has quit [Remote host closed the connection]

05:22 Doraemon has joined #riscv

05:24 NeoCron has quit [Ping timeout: 260 seconds]

05:26 tgamblin has joined #riscv

05:27 tgamblin has quit [Remote host closed the connection]

05:32 tgamblin has joined #riscv

05:33 tgamblin has quit [Remote host closed the connection]

05:35 tgamblin has joined #riscv

05:36 tgamblin has quit [Remote host closed the connection]

05:37 <kaddkaka[m]> Thanks, U7 has dual issue in-order, U5 and E3 are both single-issue in-order. Sifive have a lot of cores (https://www.sifive.com/documentation) so I guess many don't have a dedicated tune target or use the same (or similar enough) processor implementation

05:39 tgamblin has joined #riscv

05:40 tgamblin has quit [Remote host closed the connection]

05:46 tgamblin has joined #riscv

05:47 tgamblin has quit [Remote host closed the connection]

05:50 tgamblin has joined #riscv

05:51 tgamblin has quit [Remote host closed the connection]

05:54 tgamblin has joined #riscv

05:55 tgamblin has quit [Remote host closed the connection]

05:58 tgamblin has joined #riscv

06:00 tgamblin has quit [Remote host closed the connection]

06:02 tgamblin has joined #riscv

06:08 tgamblin has quit [Remote host closed the connection]

06:11 tgamblin has joined #riscv

06:16 jamtorus is now known as jellydonut

06:20 tgamblin has quit [Remote host closed the connection]

06:23 tgamblin has joined #riscv

06:24 tgamblin has quit [Remote host closed the connection]

06:26 <kaddkaka[m]> Are there any documentation for the rocket riscv impl? I can only find a tutorial from 2015

06:31 tgamblin has joined #riscv

06:32 tgamblin has quit [Remote host closed the connection]

06:37 tgamblin has joined #riscv

06:39 tgamblin has quit [Remote host closed the connection]

06:43 tgamblin has joined #riscv

06:45 tgamblin has quit [Remote host closed the connection]

06:46 tgamblin has joined #riscv

06:49 tgamblin has quit [Remote host closed the connection]

06:50 jjido has joined #riscv

06:51 tgamblin has joined #riscv

06:54 tgamblin has quit [Remote host closed the connection]

06:57 tgamblin has joined #riscv

06:58 tgamblin has quit [Remote host closed the connection]

07:00 tgamblin has joined #riscv

07:18 pecastro has joined #riscv

07:26 tgamblin has quit [Remote host closed the connection]

07:28 tgamblin has joined #riscv

07:30 tgamblin has quit [Remote host closed the connection]

07:32 tgamblin has joined #riscv

07:33 tgamblin has quit [Remote host closed the connection]

07:38 jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

07:41 tgamblin has joined #riscv

07:42 tgamblin has quit [Remote host closed the connection]

07:47 tgamblin has joined #riscv

07:48 tgamblin has quit [Remote host closed the connection]

07:57 tgamblin has joined #riscv

08:00 tgamblin has quit [Remote host closed the connection]

08:02 tgamblin has joined #riscv

08:04 tgamblin has quit [Remote host closed the connection]

08:06 hendursa1 has joined #riscv

08:06 tgamblin has joined #riscv

08:06 eduardas has joined #riscv

08:07 tgamblin has quit [Remote host closed the connection]

08:09 hendursaga has quit [Ping timeout: 276 seconds]

08:13 tgamblin has joined #riscv

08:16 tgamblin has quit [Remote host closed the connection]

08:23 tgamblin has joined #riscv

08:24 tgamblin has quit [Remote host closed the connection]

08:26 tgamblin has joined #riscv

08:29 tgamblin has quit [Remote host closed the connection]

08:30 valentin has joined #riscv

08:31 smartin has joined #riscv

08:32 tgamblin has joined #riscv

08:33 tgamblin has quit [Remote host closed the connection]

08:35 tgamblin has joined #riscv

08:37 tgamblin has quit [Remote host closed the connection]

08:39 tgamblin has joined #riscv

08:40 tgamblin has quit [Remote host closed the connection]

08:45 BOKALDO has quit [Quit: Leaving]

08:46 jacklsw has quit [Quit: Back to the real world]

08:48 tgamblin has joined #riscv

08:49 tgamblin has quit [Remote host closed the connection]

08:51 tgamblin has joined #riscv

08:52 tgamblin has quit [Remote host closed the connection]

08:55 tgamblin has joined #riscv

08:56 tgamblin has quit [Remote host closed the connection]

09:01 tgamblin has joined #riscv

09:02 tgamblin has quit [Remote host closed the connection]

09:12 tgamblin has joined #riscv

09:13 jacklsw has joined #riscv

09:13 tgamblin has quit [Remote host closed the connection]

09:15 tgamblin has joined #riscv

09:17 tgamblin has quit [Remote host closed the connection]

09:23 tgamblin has joined #riscv

09:24 tgamblin has quit [Remote host closed the connection]

09:29 tgamblin has joined #riscv

09:30 tgamblin has quit [Remote host closed the connection]

09:35 tgamblin has joined #riscv

09:36 tgamblin has quit [Remote host closed the connection]

09:39 tgamblin has joined #riscv

09:40 tgamblin has quit [Remote host closed the connection]

09:42 tgamblin has joined #riscv

09:46 smartin has quit [Quit: smartin]

09:50 tgamblin has quit [Remote host closed the connection]

09:55 tgamblin has joined #riscv

09:58 tgamblin has quit [Remote host closed the connection]

10:01 tgamblin has joined #riscv

10:02 tgamblin has quit [Remote host closed the connection]

10:07 tgamblin has joined #riscv

10:08 tgamblin has quit [Remote host closed the connection]

10:10 tgamblin has joined #riscv

10:14 tgamblin has quit [Remote host closed the connection]

10:16 tgamblin has joined #riscv

10:18 tgamblin has quit [Remote host closed the connection]

10:22 tgamblin has joined #riscv

10:25 tgamblin has quit [Remote host closed the connection]

10:27 tgamblin has joined #riscv

10:33 tgamblin has quit [Remote host closed the connection]

10:36 tgamblin has joined #riscv

10:37 tgamblin has quit [Remote host closed the connection]

10:39 tgamblin has joined #riscv

10:42 tgamblin has quit [Remote host closed the connection]

10:43 jwillikers has joined #riscv

10:44 tgamblin has joined #riscv

10:46 tgamblin has quit [Remote host closed the connection]

10:50 tgamblin has joined #riscv

10:52 tgamblin has quit [Remote host closed the connection]

10:59 tgamblin has joined #riscv

11:00 tgamblin has quit [Remote host closed the connection]

11:03 tgamblin has joined #riscv

11:04 tgamblin has quit [Remote host closed the connection]

11:10 tgamblin has joined #riscv

11:11 tgamblin has quit [Remote host closed the connection]

11:14 tgamblin has joined #riscv

11:15 tgamblin has quit [Remote host closed the connection]

11:19 Guest9190 has joined #riscv

11:20 tgamblin has joined #riscv

11:21 Guest9190 has quit [Client Quit]

11:21 dolonbus has joined #riscv

11:21 dolonbus has quit [Remote host closed the connection]

11:22 tgamblin has quit [Remote host closed the connection]

11:22 dolonbus has joined #riscv

11:26 dolonbus has quit [Remote host closed the connection]

11:32 tgamblin has joined #riscv

11:33 tgamblin has quit [Remote host closed the connection]

11:40 tgamblin has joined #riscv

11:41 tgamblin has quit [Remote host closed the connection]

11:43 jjido has joined #riscv

11:53 <jrtc27> xypron: I gave a whole load of feedback on gnu-efi patches at https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/merge_requests/4836 without realising they've since been upstreamed

11:53 <jrtc27> most of those look like they still apply to the patches you upstreamed

11:54 <jrtc27> also https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/1227#note_674058473 gives you a way to make EFI_SUBSYSTEM work

12:00 jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

12:03 riff_IRC has joined #riscv

12:03 riff-IRC has quit [Killed (NickServ (GHOST command used by riff_IRC))]

12:03 riff_IRC is now known as riff-IRC

12:26 <xypron> @jrtc27: https://github.com/rhboot/gnu-efi/pull/3 contains similar patches. I will look how your comments apply there.

12:26 aburgess_ has joined #riscv

12:27 <jrtc27> ack

12:27 <jrtc27> my "most" was perhaps overstating things, just so happened the few I sampled at first applied, but a fair few do look like they've been resolved

12:28 <xypron> @jrtc27: unfortunately every project seems to be maintaining its own version of gnu-efi instead of mainting a good upstream.

12:28 <jrtc27> :(

12:28 aburgess has quit [Ping timeout: 260 seconds]

12:36 tgamblin has joined #riscv

12:44 frost has quit [Quit: Connection closed]

13:34 BOKALDO has joined #riscv

13:38 BOKALDO has quit [Client Quit]

13:44 pehaef has joined #riscv

13:45 hendursa1 has quit [Quit: hendursa1]

13:45 hendursaga has joined #riscv

13:58 wolfshappen has quit [Ping timeout: 252 seconds]

13:58 wolfshappen has joined #riscv

13:58 BOKALDO has joined #riscv

15:14 aburgess_ is now known as aburgess

16:47 pehaef has quit [Quit: leaving]

17:01 jjido has joined #riscv

17:06 jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

17:20 mahmutov has joined #riscv

17:26 vagrantc has joined #riscv

17:48 jacklsw has quit [Read error: Connection reset by peer]

17:48 pjw has quit [Read error: Connection reset by peer]

17:48 sorear has quit [Ping timeout: 252 seconds]

17:48 adomas has quit [Ping timeout: 240 seconds]

17:48 geist has quit [Read error: Connection reset by peer]

17:48 mobius has quit [Read error: Connection reset by peer]

17:49 rsalveti has quit [Ping timeout: 252 seconds]

17:49 NishanthMenon_ has quit [Read error: Connection reset by peer]

17:49 rsalveti has joined #riscv

17:51 geist has joined #riscv

17:51 pjw has joined #riscv

17:51 sorear has joined #riscv

17:51 mobius has joined #riscv

17:52 NishanthMenon_ has joined #riscv

17:52 valentin has quit [Quit: Leaving]

17:52 adomas has joined #riscv

17:53 elastic_dog has quit [Quit: elastic_dog]

17:54 jjido has joined #riscv

17:54 elastic_dog has joined #riscv

18:02 jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

18:05 jjido has joined #riscv

18:22 cwebber has quit [Ping timeout: 245 seconds]

18:28 tgamblin has quit [Quit: Leaving]

18:47 <palmer1> is kito in here?

18:49 <jimwilson> palmer1, don't think so, and it is way too early for him

18:49 <palmer1> ya, makes sense

18:49 <palmer1> do you know if he was going to submit that plumbers talk he was talking about yesterday?

18:49 <palmer1> about the probing extensions to turn on ifuncs?

18:50 <jimwilson> I don't know

18:51 <palmer1> OK

18:51 <palmer1> I'll email him

18:53 jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

19:08 BOKALDO has quit [Quit: Leaving]

19:18 eduardas has quit [Quit: Konversation terminated!]

19:54 jjido has joined #riscv

19:58 pehaef has joined #riscv

19:59 pehaef has quit [Client Quit]

20:06 cwebber has joined #riscv

20:07 <jrtc27> don't know the content of it but would be good to make sure it's not linux-specific so we can have a standard means to do this across all OSes

20:08 <meowray> anyther instance of gcc atomic mix-and-match libcalls vs open coding: https://github.com/tikv/jemallocator/pull/14#issuecomment-917179231

20:09 <jrtc27> technically yes in practice no

20:10 <jrtc27> I thought philipp had patches to rewrite gcc's riscv atomics though

20:10 <jrtc27> did those ever get finished off?

20:12 <sorear> mix and match won't cause problems for subword. it WILL cause problems for double-word if that's ever added to gcc but not libatomic

20:12 <jrtc27> it can cause problems because whether or not you use libatomic is ABI

20:12 <jrtc27> since a legal libatomic implementation is to take a spinlock not to do a hardware atomic

20:12 <jrtc27> which then doesn't synchronise with a real atomic (in this case, masked)

20:13 <jrtc27> in practice the libatomic implementation does use the equivalent instructions

20:13 <jrtc27> but it technically doesn't have to

20:15 <meowray> aarch64 without lse can use lib call for __atomic_exchange_1 as well. is that benign?

20:20 <jrtc27> I've never seen that

20:22 <meowray> compile my example with aarch64-linux-gnu or clang --target=aarch64-linux-gnu. you'll get __aarch64_swp1_acq_rel (default is -moutline-atomics for libgcc>=9.3.1 and compiler-rt) unless -march=armv8-a+lse

20:22 <jrtc27> that's not __atomic_exchange_1, that's outlined atomics

20:22 <jrtc27> which is specifically "do a real hardware atomic"

20:23 <jrtc27> which will be either an LSE AMO or an older LDX/STX thing

20:23 <jrtc27> but importantly those still compose fine

20:24 <jrtc27> though I do find it hard to believe that the overhead of a function call for every atomic doesn't outweigh the cost of an LDX/STX loop...

20:24 <meowray> for riscv libgcc, how is __atomic_exchange_1 mixed with open coding benign? doesn't __atomic_exchange_1 use mutex?

20:24 <jrtc27> (yes there are also contention and progress issues, but... most atomics are not contended)

20:25 <jrtc27> I assume the implementation is just what the open-coded version would be

20:25 <xentrac> is it a tail call with no extra movs to set up the arguments? that's what the github thread seems to suggest

20:25 <xentrac> that sounds pretty cheap

20:26 <xentrac> (if there's no issues with contention and progress anyway)

20:26 <jrtc27> you still have to juggle arguments to be in argument registers

20:26 <jrtc27> and in a leaf function you now are no longer a leaf

20:27 <xentrac> tail calls don't preserve leafness?

20:27 <jrtc27> it's only a tail call if it's in a tail position

20:27 <xentrac> true (and you would normally expect it to not be, though the github example seems to have been?)

20:27 <jrtc27> well yes github examples are deliberately minimal

20:28 <xentrac> so perhaps it's not good for me to imagine that the real-world case would have a similar performance cost?

20:29 <sorear> meowray: it doesn't use the mutex array, it's just a cas loop

20:29 <xentrac> so it does guarantee progress

20:30 <meowray> sorear: libatomic/exch_n.c uses libatomic/cas_n.c. ok, i find no mutex array. compiler-rt's impl uses mutex array...

20:31 <jrtc27> riscv/atomic.c in libgcc

20:31 <jrtc27> oh but that's the __sync_foo

20:31 <sorear> i would strongly argue compiler-rt is in the wrong here, since it can't be mixed with libgcc and the intent is obvious in the manuals

20:32 <jrtc27> you can't mix and match compiler-rt and libgcc

20:32 <jrtc27> they conflict wrt symbols

20:32 <jrtc27> that's like linking two libc's

20:33 <meowray> compiler-rt is definitely bad regarding atomics

20:33 <meowray> it's simple, though. reading libatomic/libgcc impl on atomics seems challenging

20:37 zjason` has joined #riscv

20:38 <jrtc27> wow it looks like libatomic really uses pthread_mutex_lock for the fallback case when there is no atomic of that size to use

20:39 <jrtc27> that seems a bit heavyweight

20:39 zjason has quit [Ping timeout: 260 seconds]

20:39 <xentrac> is libatomic pervasively coupled to pthreads or is that limited to a few places?

20:39 <jrtc27> there's an abstraction layer

20:40 <sorear> you can't just use a spinlock because spinlocks on a uniprocessor are bad times

20:40 <jrtc27> that's just the posix implementation of the locking

20:40 <xentrac> that's not so bad then, I guess

20:40 <sorear> even pthread mutexes are Not Great because of priority inversion...

20:40 <xentrac> though, as you say, heavyweight

20:40 <jrtc27> the FreeBSD-specialised implementation of locks in compiler-rt uses its umtx

20:41 <sorear> pthread_mutex is just a couple of CASes in the uncontended case, although it goes through a fair amount of code to do so

20:41 <xentrac> sorear: presumably the lock in question is only held within the atomic swap, avoiding the priority inversion problem, no?

20:41 <jrtc27> no futex equivalent for Linux though

20:41 <jrtc27> yeah exactly

20:41 <jrtc27> it still has a fast path, but it's not as fast as a spinlock in the uncontended case

20:42 <sorear> xentrac: low priority thread starts to do an atomic swap, gets preempted in the middle by a high priority thread that wants to do an atomic on the same lock bucket

20:42 <sorear> really wish pthreads didn't try to multiplex so many lock types on a single set of functions, much of the overhead is dispatching between implementations

20:43 <xentrac> sorear: hmm, so then the high priority thread blocks, and if there's a medium-priority thread that has a lot of work to do, the low-priority thread may not get scheduled for a long time. my mistake!

20:44 <xentrac> somehow I had thought that you needed two locks to get priority inversion. thanks!

20:45 <xentrac> (a long time or never, if you're using strict priorities, as you would in the cases where this matters most)

20:45 jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

20:55 <xentrac> in more spartan news, stikonas has reduced the hex0 RV64 seed down to 392 bytes: https://github.com/oriansj/stage0-posix/pull/43/files

21:01 <jrtc27> running on what though?

21:01 <xentrac> Linux

21:02 <xentrac> stage0-posix uses exit, execve, fork, waitpid, brk, open, close, read, write, lseek, chmod for up to M2-Planet+mescc-tools (excluding Kaem)

21:03 <xentrac> although in that case you can see the seed is invoking openat instead of open

21:04 <xentrac> it's using the Linux system call interface

21:04 <xentrac> no pthreads though :)

21:04 <jrtc27> how do you trust that linux kernel?

21:04 <jrtc27> especially if it's a riscv one?

21:05 <xentrac> it would be better to have a very simple bootstrapping kernel, but nobody has written one yet

21:05 <jrtc27> beyond a certain point I just don't see what the use of all this is, unless you go right back to hardware you built running some bare-metal code you hand assembled and input into the machine

21:06 <xentrac> yes, but getting there will take time

21:06 <jrtc27> sure, but you wouldn't use the hex0 for that, you'd skip a load of those stages

21:06 <jrtc27> because whatever runs those needs a far more complex environment than those do

21:07 <xentrac> once you have a hand-assembled kernel sufficient to run hex0, you can use it to compile the later stages

21:07 <jrtc27> but the point is currently you need one per architecture

21:07 <jrtc27> which isn't practical

21:07 <jrtc27> if you just have a single more featureful one you can skip all the hex0 stuff

21:07 <xentrac> no, you only need one architecture

21:08 <xentrac> because once you have one architecture bootstrapped in a trustworthy way, you can cross-compile the others and verify that they're bit-identical

21:08 <jrtc27> then why are there x86, arm, aarch64 and riscv64 hex0's?

21:08 <jrtc27> all but one of those are a waste of time

21:08 <xentrac> yes, but we don't know which one yet

21:08 <jrtc27> and that's my point, it's investing effort now that you know is going to be thrown away

21:09 <jrtc27> if it were me I'd work up not down

21:09 <jrtc27> to avoid that

21:09 <jrtc27> with a tiny amount of down just to get beyond the point of some kind of trusting trust attack being practically feasible

21:09 <xentrac> my guess is that someone will build an inspectable RV64 CPU out of individual transistors long before anyone builds an inspectable aarch64 or amd64 CPU :)

21:10 <xentrac> but oriansj was not very sanguine about risc-v at all once he saw the headaches involved in the instruction encoding

21:10 <jrtc27> I mean it's a mess for humans but it should mean fewer transistors...

21:11 <xentrac> right!

21:11 <xentrac> he's more enthusiastic about the Knight, a TTL design from the 01970s, which I think is about 20 times more complicated to implement than RV64

21:11 <jrtc27> but likely harder to reason that it's correct

21:12 <xentrac> oh? how would you design an ISA to be as easy as possible to reason that it's correct, without giving up the possibility of running things like a C compiler on a POSIX implementation?

21:12 <jrtc27> as in just that the jumbled bits are harder to look at and say "yes that's correct"

21:12 <jrtc27> laying down a big block of bit slicing is easier

21:12 <jrtc27> even if it's larder

21:12 <jrtc27> *ger

21:13 <xentrac> ah, so, like, 64-bit instructions, or 128-bit like Kay and Nguyen's Chifir?

21:13 <xentrac> (also if you have *two* architectures that are fully bootstrapped, then a Karger–Thompson attack would have to have compromised *both* of them, so it's not *completely* a waste of time)

21:13 <jrtc27> 32 bits is enough for a simple CPU

21:14 <jrtc27> only gets too small when you want to cram all kinds of features in

21:14 <xentrac> Chifir avoids needing multiple types of linker relocations ;)

21:14 <jrtc27> my point is just an un-jumbled riscv, whilst less efficient in terms of hardware resources, is likely easier to understand the implementation of

21:14 <xentrac> aaaah, I see! that's a great idea that I hadn't thought of at all

21:15 <xentrac> thank you!

21:16 <jrtc27> having said that, if you're not doing C (why would you for this) there's not all that much jumbling

21:16 <xentrac> stage0 gets to C fairly quickly

21:16 <jrtc27> just JAL and branches

21:16 <jrtc27> I meant C the RISC-V extension, not the language

21:16 <xentrac> oh sorry

21:16 <xentrac> naturally

21:17 <xentrac> you can see hex0 uses addi a lot

21:17 <xentrac> well, hopefully you can. M1 assembly is not very readable

21:19 <xentrac> but yeah, M1 uses $ for J jumbling, @ for B jumbling, and ! for I jumbling

21:21 <xentrac> thoughts about the minimal practical ALU repertoire for this kind of bootstrapping?

21:28 <jrtc27> hmm

21:29 <jrtc27> well you could ditch the immediate instructions other than addi and/or ori (since you need one of those to load constants in the first place)

21:29 <jrtc27> don't need jal if you have jalr

21:29 <jrtc27> well, I guess you pick jal or auipc

21:29 jjido has joined #riscv

21:30 <jrtc27> probably keep just blt and beq or similar

21:30 elastic_dog has quit [Ping timeout: 245 seconds]

21:30 <jrtc27> could ditch one of those too but starts to get annoying to code for

21:30 <jrtc27> slt[u] not needed if you have branches

21:30 <jrtc27> don't need sub if you have add and xor

21:31 <jrtc27> you probably want all three types of shifts

21:31 <xentrac> yeah, you could ditch beq but it would be a pain

21:31 <jrtc27> (not having the sign-extending ones is painful, and zero-extending ones are easy if you have sign-extending)

21:32 <xentrac> you could probably drop non-immediate add

21:32 <jrtc27> obviously fences and exception-y things you can do without unless relevant

21:32 <jrtc27> you'd do it that way round?

21:32 <jrtc27> I'd drop the immediate add instead

21:32 <jrtc27> more general

21:32 <xentrac> it's easier to make add out of sub than vice versa

21:32 <jrtc27> oh if you keep sub, sure

21:32 <jrtc27> I'd ditch sub and addi and just use add for everything

21:33 <jrtc27> or no

21:33 <jrtc27> I'd keep addi solely for lui+addi being li

21:33 <xentrac> yeah. and addi is subi after all

21:33 <xentrac> to use add for non-immediate sub you need some way to negate

21:34 elastic_dog has joined #riscv

21:34 <jrtc27> so then it's just add or sub

21:34 <jrtc27> and I don't see why you'd pick sub over add

21:34 <xentrac> well, a += b; is b = 0 - b; a -= b;

21:34 <xentrac> but a -= b; in terms of addition requires some way to negate b. I guess you could xor with an immediate -1?

21:35 elastic_dog has quit [Client Quit]

21:35 <xentrac> which you loaded with li

21:35 elastic_dog has joined #riscv

21:35 <jrtc27> yes + and - are symmetric, you just do the opposite with -

21:35 <xentrac> which would be less of a pain in rv32

21:35 <jrtc27> it's entirely equivalent

21:35 <jrtc27> and yeah

21:35 <jrtc27> xor with -1 and add 1

21:35 <xentrac> it's not symmetric in the sense that b = 0 + b; a += b; does not perform a -= b;

21:36 <jrtc27> it is if you view your `b = 0 - b` as a unary negation of b

21:36 <jrtc27> which is what you actually do

21:36 <xentrac> yeah, but you can do b = 0 - b; with sub and r0; you don't need unary negation

21:36 <xentrac> you don't need r0 either since you can do z = z - z;

21:37 <jrtc27> oh I see, yes, that's true, sub gets you negation for fre

21:37 <jrtc27> *e

21:37 <xentrac> but sooner or later you're going to want xor and some nonlinear bitwise operation like and or or

21:38 <xentrac> so maybe add vs. sub is just a question of taste, I have an ugh field around 'xor with -1 and add 1' that may not be actually justifiable

21:39 <jrtc27> I think it just depends on the relative frequency of operations

21:39 <jrtc27> I would naively assume add is more frequent and thus warrants making crappily emulating sub worth it

21:39 <xentrac> IIRC Chifir only supplies NAND for bitwise operations :)

21:39 <xentrac> yeah, you're probably right about that

21:39 <jrtc27> then again, could just support both in hardware, it's pretty trivial to make an adder into a subtracter...

21:40 <xentrac> I think Wirth-the-RISC omits bitwise operations entirely, just like Pascal

21:40 <xentrac> hmm, that sounds like an exercise I should try

21:40 <jrtc27> I can imagine that not going so well for things like writing assemblers...

21:41 <xentrac> well, you *can* do bit shifts with division (which both Chifir and Wirth-the-RISC supply, which seems like a terrible idea to me)

21:42 <xentrac> and multiplication, of course, which you can do with addition

21:42 <xentrac> and you *can* do LSB tests with modulo, which likewise

21:43 <jrtc27> "we don't give you bitwise operators but we give you multiplication, division and modulo" is an interesting take for a minimal design

21:43 <xentrac> IKR?

21:44 <xentrac> original MIX also did that, for an arguably slightly less ridiculous reason in the context of 01965

21:44 jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

21:44 <xentrac> Wirth-the-RISC also has floating-point, as did, for example, Zuse's Z3

21:45 <jrtc27> o.O

21:45 <xentrac> I remembered wrong about it, though; it has bitwise and, or, xor, and abjunction

21:50 <xentrac> speaking of minimal sets of operations, intuitively it seems like abjunction ought to be more expressive than NAND or NOR (if you have constants!) since it's non-commutative, but so far I haven't found that circuits are systematically simpler in abjunction gates than in NAND or NOR gates, although of course there are individual circuits that are (abjunction is one gate with abjunction gates, 3 gates with

21:50 <xentrac> NAND)

21:53 <xentrac> stikonas points out that it's probably simpler to run cut-down versions of hex0, hex1, and hex2 as subroutines that read and write RAM than it is to implement a kernel in hex that is sufficient to run them

21:54 <xentrac> and if you do that you can write the kernel in assembly instead of hex, which would be a big improvement

21:55 stikonas has joined #riscv

21:56 <xentrac> I think doing floating-point, multiplication, and division in hardware would probably be fatal to hand-auditability

21:59 <stikonas> well, floating-point multiplication and division are not required for early bootstrapping software (assembler can be written with just integer operations)

22:00 <xentrac> agreed. that was in the context of 21:44 < xentrac> Wirth-the-RISC also has floating-point, as did, for example, Zuse's Z3

22:01 <xentrac> but I meant integer multiplication and division, which are also not needed for compilers and assemblers until they're fairly sophisticated, at which point it's easy enough to supply them as subroutines as we did on, for example, the 8080 or 6502

22:01 <xentrac> or RV32I :)

22:01 TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

22:02 TMM_ has joined #riscv

22:05 <dh`> surely you mean "08080" and "06502"

22:05 * dh` hides

22:06 <jrtc27> back when I was an undergrad, and in my first year, our intro to assembly lab was to implement division in RV32I assembly (that we'd later use to write pong)

22:06 <jrtc27> though that was before the big re-encoding of RISC-V

22:11 <stikonas> most of the work for software division is I guess writing some high level prototype (e.g. in C)

22:12 gioyik has joined #riscv

22:13 <xentrac> dh`: haha

22:13 <xentrac> jrtc27: oh heh, I didn't realize you were younger than I am

22:14 <jrtc27> I know, with how cynical I can be you're forgiven for thinking I'm old :P

22:14 <xentrac> I'm reminded of Shannon's story about Tukey

22:14 <xentrac> uh, Hamming's

22:14 * stikonas never had any assembly classes...

22:15 <xentrac> (not that I'm comparable to Hamming or Shannon)

22:15 <xentrac> > One day about three or four years after I joined, I discovered that John Tukey was slightly younger than I was. John was a genius and I clearly was not. Well I went storming into Bode's office and said, ``How can anybody my age know as much as John Tukey does?'' He leaned back in his chair, put his hands behind his head, grinned slightly, and said, ``You would be surprised Hamming, how much you would

22:15 <xentrac> know if you worked as hard as he did that many years.''

22:16 <xentrac> was it a good intro to assembly? I'm surrpised they didn't give you an easier problem for the first lab...

22:16 <jrtc27> dunno, I already knew assembly

22:17 <jrtc27> I think people mostly managed to get it done though

22:19 <jrtc27> wrote pong for a soft-core attached to an LCD screen and various controller inputs later in the course (in C though), that was a bit of fun

22:19 <xentrac> nice!

22:19 <jrtc27> main challenge was the pitiful amount of memory the core exposed...

22:19 <jrtc27> loads of BRAMs on that chip but only like 4K or 8K instruction memory available

22:20 <xentrac> a lot more than the original Pong machine had! and the Atari 2600 potentially had 64KiB of instruction memory but only 1024 bits of RAM

22:20 <xentrac> I guess RV32I is less dense than 6502

22:20 <jrtc27> yeah

22:21 <jrtc27> also I'm sure writing in assembly would make it denser than the emitted C

22:21 <jrtc27> given it was probably a rather less mature GCC than exists today and dutifully following the ABI

22:24 <xentrac> and especially a rather less mature RISC-V backend

22:24 <jrtc27> yeah that's what I meant

22:24 <xentrac> ah

22:25 <jrtc27> it wasn't *that* long ago, x86 and arm were mature

22:25 <jrtc27> just not this weirdo out-of-tree port

22:26 <xentrac> GCC has gotten substantially better in the last ten years

22:27 <jrtc27> this was late 2015 (and actually my second year, not first year, bleh, memory)

22:27 <jrtc27> that or early 2016, one of those terms

22:27 <xentrac> sounds more reasonable for a second-year class, by then all the CS majors would have learned to program a little even if they didn't program before going to college

22:28 <xentrac> one of my favorite microbenchmarks is stupid fibonacci (fib(n) { return n < 1 ? 1 : fib(n-1) + fib(n-2); }) because you can write it in one line of code; it exercises integer arithmetic, recursion, and control flow; and it's easily adaptable to a wide range of implementation speeds by giving it different arguments

22:28 <xentrac> but modern GCC has totally obsoleted this

22:29 <xentrac> used to be, a simple Forth would beat GCC at this on i386 or amd64, because it's *so* recursion-heavy

22:30 jwillikers has quit [Remote host closed the connection]

22:31 <xentrac> in March I tried it on a reasonably modern GCC (though maybe not post-02015), and that (well, with n < 2 and ANSI declarations!) compiled to 169 instructions

22:31 <xentrac> I think GCC inlined it into itself 13 times

22:31 <jrtc27> 185 lines on godbolt

22:31 <jrtc27> what in gods name is that

22:32 <xentrac> that was what I said!

22:32 <jrtc27> presumably there's a fib benchmark out there that calls it for n < 13 or whatever...

22:33 <xentrac> maybe, but maybe it's just standard optimizations

22:33 <xentrac> it makes it a lot faster for large n

22:33 <xentrac> like about 40 times faster

22:34 <xentrac> and by "large" of course I mean "in the 30s or 40s"

22:34 <jrtc27> I mean, if it wants to make it faster, why doesn't it just notice that fib(n) is pure and thus do CSE on the inlined recursive version and some form of induction...

22:34 <xentrac> I suspect it might be doing that, yeah

22:35 <xentrac> because it's not plausible that it could be doing all the subtractions and additions and get 40 times faster

22:35 <xentrac> though I admit I haven't grokked the astounding assembly program GCC produces these days

22:37 <xentrac> I mean the naive assembly compilation doesn't spend 96% of its time in argument passing and prologues and epilogues

22:42 <jrtc27> clang doesn't do a whole lot, but does notice it can turn it into fib(n-1) + fib(n-3) + ... + fib(3/2) + fib(1/0) and that fib(0) = fib(1) = 1 so it turns that into a + 1 instead

22:42 <jrtc27> still explosive but a bit less so

22:43 <jrtc27> as has GCC I think in the mess of it all?

22:43 <jrtc27> there's ultimately a tight inner loop that has a sub 2 in there

22:43 <jrtc27> though even that bit's an utter mss

22:43 <xentrac> heh

22:44 <jrtc27> 7 stack spills and 7 stack reloads in the loop

22:45 <jrtc27> so yeah I imagine for small n GCC might fare better but as you ramp up it's going to tank compared to Clang

22:46 <xentrac> ramp up how far?

22:48 <jrtc27> that would require me to work out what the other 80% of the code does...

22:48 <xentrac> haha

22:48 <xentrac> or you could just run int

22:48 <xentrac> *it

22:50 <jrtc27> yeah I should...

22:50 <xentrac> if it's, say, inlining things four levels deep, then it has a tree of 31 calls which CSE can simplify down to maybe 13

22:50 <xentrac> which might be more important than the stack spills in the loop

22:51 rpb has joined #riscv

22:51 gioyik_ has joined #riscv

22:55 gioyik has quit [Ping timeout: 276 seconds]

22:58 elastic_dog has quit [Ping timeout: 245 seconds]

22:59 <dh`> if it's really doing enough CSE to bust it out of being exponential it should be easy to tell by running it

23:04 <xentrac> it isn't, but it's doing something that knocks a quite hefty constant factor off the exponential

23:10 winterflaw has quit [Ping timeout: 276 seconds]

23:12 <jrtc27> hm, interesting, clang is about 80% slower for larger numbers

23:12 <jrtc27> so maybe the sheer number of calls to the smaller numbers makes it better in the long run

23:17 <xentrac> hmm, I hadn't thought about that! you're right, even if it doesn't do CSE on the internal nodes of the call tree, trimming down the leaf nodes would help a lot

23:18 <xentrac> the call tree for F₃₈ has F₃₈ leaf nodes, after all, and cutting that down to F₃₄ would speed it up a lot

23:19 TMM_ is now known as TMM

23:19 TMM is now known as TMM_

23:20 elastic_dog has joined #riscv

23:21 <xentrac> anyway, that level of optimization makes it useless for my intended purpose, which is to quickly get a crude first-order feel for how fast or slow a language is. bash, for example, takes 2.6 seconds to run fib 15

23:21 <xentrac> as fib() { if [[ $1 -lt 2 ]]; then echo 1; else echo $(($(fib $(($1 - 1))) + $(fib $(($1 - 2))) )); fi

23:23 <xentrac> which makes it about F₃₃/F₁₅ times slower than CPython

23:23 <xentrac> about 6000

23:25 <xentrac> it's not a very precise benchmark but it's usually good to within an order of magnitude. but not with modern GCC

23:32 <dh`> it seems plausible for a compiler to unroll it, do CSE, and thereby make it nonexponential

23:33 <dh`> but I haven't looked at it that closely, maybe there's a good reason that doesn't work

23:34 <xentrac> well, you can't unroll it to infinite depth

23:35 <xentrac> you have to stop inlining at some point

23:35 elastic_dog has quit [Ping timeout: 260 seconds]

23:35 <xentrac> and at that point you have, say, calls to fib(n-8), fib(n-9), fib(n-10),...

23:36 <xentrac> as long as you have more than one such call it's still exponential, and inlining doesn't decrease the number of such calls; it increases them

23:42 elastic_dog has joined #riscv

23:47 pecastro has quit [Ping timeout: 265 seconds]

23:53 gioyik has joined #riscv

23:55 <jrtc27> https://paste.debian.net/hidden/6aa655d5/ FWIW

23:55 <jrtc27> same growth rate, just a constant factor between them

23:56 gioyik_ has quit [Ping timeout: 276 seconds]