sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv | Backup if libera.chat and freenode fall over: irc.oftc.net
jporquet has joined #riscv
<jporquet> Hi all!
<jporquet> I'm a bit confused by the multilib feature in gcc, I was wondering if someone could clarify it
<jporquet> according to https://github.com/gcc-mirror/gcc/blob/master/gcc/config/riscv/t-linux-multilib, it looks like the few multilibs that are actually generated are rv{32,64}imac+rv{32,64}imacfd
<jporquet> and then for all the other possible combination (e.g. rv32imafd == rv32g), these libs are reused
<jporquet> the problem is that if libgcc is compiled using a rv32imac configuration, then it won't work when running gcc with --march=rv32g since the application will be linked against code that contain compressed instructions
<jporquet> in other words, if my cpu doesn't support the C extension, I can't make sure that all my code is compiled in rv32g
<jporquet> am I missing something?
choozy has quit [Remote host closed the connection]
<jrtc27> the output is valid and works, it's just not helpful for your specific use case
vagrantc has quit [Ping timeout: 250 seconds]
<jrtc27> I agree though that some of those mappings seem rather strange
<jrtc27> but at the same time, imac and imafdc are by far the most common sets of extensions
vagrantc has joined #riscv
<jrtc27> it's strongly recommended that you support one of those
<xentrac> does the c extension usually make code run faster?
<jrtc27> it reduces icache pressure
<xentrac> right
<jrtc27> and, if you have virtual memory, itlb pressure
<xentrac> (I mean obviously that's a question about implementations, not architectures, so I'm asking about your experience with popular implementation techniques)
<jporquet> what's weird is that the G meta-extension (IMAFD) was marketed as the default extension, but it seems like with the default configuration, GCC is compiled for GC only
<pabs3> do many RISC-V CPU designs not support the C extension?
<xentrac> not many, surely some
<jrtc27> at the time G was defined it was not clear what the state of C would be
<jporquet> if you have a CPU that doesn't support the C extension, you can't use vanilla GCC
<jrtc27> distributions, implementations and embedded OS'es all settled on C being assumed in the end
<jrtc27> sure you can
<jrtc27> just make sure you have the right multilib
<jrtc27> I assume
<jporquet> all the generated multilib use the C extension
<jporquet> variable `MULTILIB_REQUIRED`
<jrtc27> yes but if you build an rv32imafd libgcc and put it in the right place will it not be picked up?
<jrtc27> and if not, well, -L is your friend I guess
<jporquet> sure, but as said previously, it still means that vanilla GCC does not strictly support RV{32,64}G even though it's supported to be the default set of extensions :|
<jporquet> *supposed to be
<jrtc27> "support" is not a well-defined term here
<jporquet> what do you mean?
<jrtc27> GCC the compiler supports everything
<jrtc27> it ships with some pre-built libgcc's for convenience
<jrtc27> those happen to not include a configuration that you support
<jrtc27> but that's libgcc the runtime, not GCC the compiler
<jrtc27> which is optional
<jporquet> gotcha
<jporquet> I'll rephrase by saying: I'm surprised that no libgcc is pre-compiled to be strictly compatible with rv32/64g since the G extension is supposed to be the default hardware-wise
<jrtc27> the second part of your statement is false
<jrtc27> and it's not surprising because it would be a waste of space when the linux community has collectively agreed on GC as the base ISA
<jrtc27> with IMAC if you want to do soft-float
<jrtc27> but on a linux-capable system the complexity of implementing C is insignificant
<jporquet> I hear you but I don't think my statement is false
<jporquet> there's a whole chapter in the specs about the G ISA
<jrtc27> G means general-purpose not default
<jporquet> hmmm
<jrtc27> because C doesn't add any new *functionality*
<jporquet> I'm not a native speaker so I won't argue further, but I find it confusing nonetheless
<jporquet> anyway, thanks for your insight, really appreciate it
<jporquet> it definitely clarifies my confusion
<jrtc27> although the repo's since been reworked (and the branch renamed to main)
jporquet has quit [Quit: Client closed]
<xentrac> ah, they renamed from master to main?
<xentrac> that link still seems to work tho
<jrtc27> yeah, just means the file's old
<xentrac> maybe they should put a note at the top of the github page, above the file, about the branch renaming
<xentrac> oh well. I'm not going to go pester github about it
<jrtc27> if you rename the branch in GitHub though I think they do various bits of redirection
<jrtc27> so might be that they side-stepped that
<xentrac> oh, could be
Sos has quit [Quit: Leaving]
peepsalot has quit [Ping timeout: 268 seconds]
peepsalot has joined #riscv
aquijoule_ has joined #riscv
richbridger has quit [Ping timeout: 265 seconds]
davidlt has joined #riscv
FluffyMask has quit [Quit: WeeChat 2.9]
vagrantc has quit [Ping timeout: 272 seconds]
dionysos has quit [Ping timeout: 252 seconds]
riff_IRC has quit [Quit: PROTO-IRC v0.73a (C) 1988 NetSoft - Built on 11-13-1988 on AT&T System V]
hendursaga has joined #riscv
davidlt has quit [Ping timeout: 265 seconds]
dmang has quit [Ping timeout: 258 seconds]
dmang has joined #riscv
gector has joined #riscv
dmang has quit [Ping timeout: 272 seconds]
hendursaga has quit [Ping timeout: 244 seconds]
dmang has joined #riscv
gector has quit [Ping timeout: 258 seconds]
hendursaga has joined #riscv
jeancf_ has joined #riscv
jeancf_ has quit [Quit: Konversation terminated!]
helium-3 has joined #riscv
jeancf_ has joined #riscv
jeancf_ has quit [Client Quit]
jeancf_ has joined #riscv
helium-3 is now known as dionysos
jeancf_ has quit [Ping timeout: 265 seconds]
abelvesa_ has joined #riscv
abelvesa has quit [Ping timeout: 252 seconds]
jeancf_ has joined #riscv
jeancf_ has quit [Ping timeout: 272 seconds]
TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM_ has joined #riscv
frost has joined #riscv
choozy has joined #riscv
mahmutov has quit [Ping timeout: 250 seconds]
choozy has quit [Remote host closed the connection]
zjason` is now known as zjason
mahmutov has joined #riscv
FluffyMask has joined #riscv
gector has joined #riscv
mhorne has quit [Ping timeout: 252 seconds]
gector has quit [Ping timeout: 272 seconds]
Andre_H has joined #riscv
frost has quit [Quit: Connection closed]
gector has joined #riscv
mhorne has joined #riscv
vagrantc has joined #riscv
vagrantc has quit [Quit: leaving]
gector has quit [Ping timeout: 258 seconds]
gector has joined #riscv
riff-IRC has joined #riscv
TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM_ has joined #riscv
gector has quit [Ping timeout: 244 seconds]
gector has joined #riscv
smaeul has quit [Remote host closed the connection]
smaeul has joined #riscv
riff_IRC has joined #riscv
smaeul has quit [Quit: SIGPWR]
riff-IRC has quit [Ping timeout: 244 seconds]
<leah2> what kind of speeds to you get on nvme disks with the unmatched?
gector has quit [Ping timeout: 265 seconds]
<xentrac> unmatched speeds?
<leah2> i only seem to get 120mb/s which sounds a bit slow?
<xentrac> sorry, I was just joking. it does sound slow!
Sos has joined #riscv
<jrtc27> well how fast is memcpy for you?
<geist> also may only be a pcie 1x link?
<jrtc27> that should give you an upper bound
<jrtc27> I think it's 2x for the NVMe
<jrtc27> or maybe it was 4x
<geist> ah yeah that should be good. even with 1x pcie 2.0 that'd be 500MB/sec
<geist> always have to go back to the table to see
<leah2> jrtc27: what's a good benchmark for that?
<jrtc27> yeah x4 for NVMe, x1 for the M.2 E, x2 for the xHCI
<jrtc27> leah2: dd if=/dev/zero of=/dev/null (with appropriate bs and count) does more than a memcpy but is closer to real I/O
<leah2> gives me 1GB/s which sounds realistic
<leah2> [ 3.386049] nvme nvme0: 4/0/0 default/read/poll queues
<geist> yah 'pv' is a nice app for a quickie benchmark: 'pv /dev/zero > /dev/null'
<leah2> yeah i used pv :)
<xentrac> ha cool
<jrtc27> too scared of dd? :P
<xentrac> pv tells you the answer before it exits
<geist> it gives uyou a running display which is nice
<jrtc27> so does dd if you killall -INFO it
<jrtc27> or, if you're on FreeBSD, just press ^T
<geist> also helpfulk for quickie benchmarks like 'pv /dev/zero | md5sum'
<xentrac> oh neat, I didn't know that about dd
<xentrac> I miss ^T from VMS
<xentrac> (because I'm not running BSD, of course. self-inflicted injury)
<geist> pv has a kinda neat thing that it also uses splice() fairly aggressively on linux, so sometimes depending on what you're piping from and to, it avoids a copy
<geist> so pv /dev/zero > /dev/null *probably* just splices between those two fds
<geist> OTOH, that may also skew your particular benchmark here, depending on if linux decides to short circuit that internally
<leah2> i'll try fio later
<leah2> but machine is building rust atm
<xentrac> thanks jrtc27!
<geist> oh that's slow no matter what arch youi're on!
<jrtc27> macOS also has ^T
<jrtc27> it's really just Linux that sucks here
<geist> yah that's why i dont get too invested in particular flavors of dd
<geist> it's like tar, you're always finding that your version doesn't have this or that
<jrtc27> it's the OS not supporting ^T for SIGINFO, not a property of the dd
<geist> sure, but same result
<xentrac> Linux doesn't support ^Y either, which I also used to miss a lot
riff_IRC is now known as riff-IRC
<xentrac> although I don't actually know if ^Y is useful with ssh
<geist> hmm, what is ^Y supposed to do?
<xentrac> dsusp
<geist> and yeah VMS DCL and whatnot is pretty neat
<xentrac> it sends SIGTSTP like ^Z, but not to the tty pgrp, but rather to whatever process tries to read the poisoned ^Y
<geist> the job control is wonky if you came from unix, but it's pretty pwoerful once you grok it
<jrtc27> (also I meant -USR1 for Linux, not -INFO, because Linux doesn't have SIGINFO, except on alpha because Linux's ABI is a mess)
<xentrac> so in particular you could rsh hercules, start some stuff on hercules, ~^Y, rsh hephaestus, and do stuff on hephaestus, while still seeing the output from whatever you were doing on hercules
<xentrac> because only the outbound rsh process got suspended, not the inbound half
FluffyMask has quit [Quit: WeeChat 2.9]
<xentrac> ssh doesn't do the forking into two unidirectional processes thing that rsh did, so I don't think it would work
<xentrac> I never learned to use DCL job control. I liked DCL a lot but didn't understand it much. but I was just a kid
<xentrac> BSD does have ^Y. not sure about MacOS
<jrtc27> ^Y is the opposite of ^U for me
<jimwilson_> nvme speed is discussed in this forum thread, with iflag=direct and bs=1024 you should get close to 2GB/s, https://forums.sifive.com/t/ssd-performance/4850/3
<geist> ah actually more specifically bs=1024k
<geist> that's kinda expected, IMO. far less syscalls than the probable default of 512
<geist> (1MB instead of 512)
<xentrac> yeah, Emacs ^Y is "yank" (paste "killed" text), and bash and zsh copied that. I think tcsh too
<xentrac> amusingly enough Emacs doesn't have convenient keys for either Unix's ^U (originally @) or ^W (which I think was added in BSD, post-printing-teletype)
<xentrac> I end up using alt-← for ^W in Emacs (which also works in bash's readline)
<xentrac> 2GB/s sounds significantly different from 0.12GB/s. does that help, leah2?
gector has joined #riscv
<leah2> so i think something is wrong here
<leah2> READ: bw=75.7MiB/s (79.4MB/s), 75.7MiB/s-75.7MiB/s (79.4MB/s-79.4MB/s), io=3070MiB (3219MB), run=40559-40559msec
<leah2> WRITE: bw=25.3MiB/s (26.5MB/s), 25.3MiB/s-25.3MiB/s (26.5MB/s-26.5MB/s), io=1026MiB (1076MB), run=40559-40559msec
<leah2> mixed rw test. it just adds up to 100mb/s
<leah2> split:
<leah2> READ: bw=117MiB/s (123MB/s), 117MiB/s-117MiB/s (123MB/s-123MB/s), io=4096MiB (4295MB), run=34945-34945msec
<leah2> WRITE: bw=78.4MiB/s (82.2MB/s), 78.4MiB/s-78.4MiB/s (82.2MB/s-82.2MB/s), io=4096MiB (4295MB), run=52238-52238msec
<xentrac> sounds like a 100MiB/s bottleneck somewhere, yeah
<leah2> hmm
<leah2> dd if=/dev/nvme0n1p1 of=/dev/null bs=1024k status=progress iflag=direct iflag=fullblock gives me 1.6Gb/s tho
<leah2> ok, note that the fio tests random access
<geist> the block size of the access probably matters a lot here
<geist> the bs=1024k is doing a syscall per 1MB
<geist> vs whatever block size the other tests are
<geist> i bet if you start lowering that bs to 256k then 64k etc you'll see the speed roll off
<dh`> ^U in emacs is ^A^K, which is deeply wired into emacs users' fingers to the extent that running screen means your windows disappear without realizing what happened
<dh`> (^A is screen's attention key and ^K means kill this screen)
<xentrac> more recent versions of screen have rebound that to ^Aky
<xentrac> because it used to be a real annoyance
<dh`> I solved that problem by not using screen
<dh`> I still don't understand why screen never fixed their shit so you could use a function key as the attention key
<dh`> it has to be a single byte, not an escape sequence
<xentrac> my cousin rebound F1 to ^A and ^A to ^Aa in his xterm, problem solved
<sorear> hah
<xentrac> not all his xterms, only the ones he launches to connect to screen sessions
<xentrac> using a function key as the attention key in screen itself requires some kind of timeout mechanism to decide when to just pass the initial ^[ on to vi rather than waiting for the rest of the function key escape sequence
<dh`> sure
<dh`> but every other damn program does that, why can't screen?
<xentrac> well, most don't; instead they just don't use ^[ by itself
<dh`> lots do, including vi
<xentrac> vim and irssi do the timeout thing, vi doesn't last I checked
<dh`> I mean, this is stupid and should have been fixed at the os level 40 years ago
<dh`> it has to, there is no other way to use esc as a keystroke
<xentrac> it's profoundly annoying in irssi because in irssi network latency makes it guess wrong pretty often
<xentrac> sure there is, you can support esc and not support function keys
gector has quit [Ping timeout: 268 seconds]
<xentrac> as for ^A^K, by default I think of a two-chord sequence as being profoundly different from a one-chord sequence, but I guess that's mostly for things that repeat, like ^W^W^W. and repeating ^U doesn't make sense
<dh`> that's not a viable proposition for programs that actually do any kind of input editing
<xentrac> not sure what you mean by input editing but vi spent decades supporting esc and not supporting function keys
<xentrac> I don't know if the OS really needs to be involved with solving this. you just need a protocol for sending streams of keystrokes and maybe other events like touchmove events that doesn't have this kind of parsing ambiguity
<dh`> the OS needs to be involved with this because it's the OS that sends the input stream
<dh`> anyway yeah true, archaic vi doesn't support anything that has an escape prefix
<xentrac> maybe in a virtual console, but I'm typing this in gnome-terminal
<dh`> which gets its input from a pty
<xentrac> well no, it sends my keystrokes to a pty
<xentrac> it gets the keystrokes as input over its socket to the X server
<dh`> yes, and the pty munges them because that's what unix ttys do
<xentrac> (it does get input from a pty but that input isn't keystrokes, it's screen contents)
<xentrac> a little, right now the pty is in raw mode because I'm running ssh on it
<dh`> anyway the standards for what you read from ptys as input keystrokes are an OS thing
<xentrac> potentially? I mean historically Unix treats them as a private matter between the terminal and the application
<dh`> which is why the situation remains broken, because nobody has the authority to fix it
<dh`> yes and no, the mapping between input sequences and any kind of useful keystroke concept beyond ascii sits in libcurses
<xentrac> sure, if you consider curses part of the operating system. and, hey, not only curses but also terminfo are in posix
<xentrac> my design for Wercam sends the keystrokes over a seqpacket socket rather than a pty and represents key events with packets containing "key up %d %n" or "key down %d x", where %d is a numeric scan code from the USB HID standard and x is a URL-encoded string which represents its default UTF-8 value
<dh`> curses is definitely part of the operating system
<xentrac> it's just a library. it doesn't have anything to do with securely multiplexing resources
<dh`> next you're going to say /bin/sh isn't part of the operating system
<xentrac> agreed, it's not
<xentrac> although there are lots of historical operating systems where the shell *was* part of the operating system
<xentrac> it's just a semantic argument, though
<dh`> this is a silly thing to argue about, but there is historical practice
<xentrac> I recognize that the sense I'm using "operating system" in is a bit old-fashioned
<xentrac> so in the broader sense of "functionality shared between many or all applications" certainly key event encoding is part of the operating system
<xentrac> oh I see I wrote "key up %d %n" where I meant "key up %d x"
<dh`> anyway this is all bust in unix because it was stuffed into curses 40+ years ago when doing it properly would have been unacceptably expensive, and nobody since has taken the trouble to make it work nicely
<xentrac> at the time it *wasn't* a matter of the OS either
<xentrac> because your VT100 was what was encoding your keystrokes into a bytestream, with the ambiguity already baked in
<xentrac> and the OS didn't have any control over that
<dh`> yes, but instead of interpreting it in a driver like it should have been done, it was passed straight to applications
<dh`> just like printers in msdos
<xentrac> you mean, it should have been done in the kernel instead of in a library?
<xentrac> that would help in a few cases but it doesn't help the underlying ambiguity problem
<dh`> no but it isolates the problem so it can be fixed rather than making it part of the standard os/application interface
<xentrac> now you've switched back to using *my* definition of operating system, it seems ;)
<dh`> no, because if everything used the input interpreter in curses there wouldn't be a problem
<xentrac> all the ridiculous complexity of curses and termcap and later terminfo was an attempt to work around the inability to reprogram the commonly used terminals to do more convenient things
<dh`> but they don't
<dh`> for various reasons
<xentrac> curses inherently can't tell the difference between me pressing ↑ and typing Esc [ A
<xentrac> on a VT100 or on a modern emulator of it like gnome-terminal
<dh`> neither can anything the way the world is structured
<xentrac> so there would still be a problem
<dh`> nardly
<dh`> er
<dh`> hardly
<dh`> because if this had been fixed properly when it should have been, today typing esc [ A would cause curses to feed you ESCAPE LEFT-BRACKET A
<xentrac> yeah, that would be nice. and actually curses does guess and get that right most of the time
<dh`> whereas pushing the up arrow would cause you to receive UP
<dh`> today the escape sequences are generated in your keyboard driver in order to be ambiguous for curses to try to cope with
<xentrac> but at low baud rates and particularly with packet loss and retransmission and jitter, it doesn't work reliably enough
<dh`> well yes
<xentrac> which of course leads to application writers trying to fix it
<xentrac> and since it can't be fixed they just end up making different tradeoffs
<dh`> and you end up with more and more layers of hack pasted on because nobody can be arsed to fix it properly
<xentrac> which is why every once in a while I'll send a message in irssi beginning with U preceded by a different message I'd decided not to send
<xentrac> because the lack of delay around my ^U caused by TCP retransmission made irssi decide that I was pasting text (another thing not initially contemplated)
<xentrac> or there will be literally a [[A or something in there
jotweh has quit [Ping timeout: 258 seconds]
<xentrac> I think the basic reason we're still dealing with unfixable workarounds for 50-year-old protocol design errors like this is that nobody's come up with a protocol that's a Pareto improvement
<dh`> nobody's seriously tried
<dh`> how much work would it be to add a tty mode that produces useful input symbols?
<dh`> might take a whole weekend.
jotweh has joined #riscv
<xentrac> you don't need to modify the kernel; you just need to change the protocol that applications like IRC clients and text editors speak to the "terminal emulator"
<xentrac> I wrote a bit about the problem a couple of months ago in https://news.ycombinator.com/item?id=26815196
<dh`> yes you do, because the input characters ultimately come from the kernel
<xentrac> to a great extent what has happened is that DHTML has displaced VT100 emulation, for better or worse
<xentrac> what comes ultimately from the kernel in this case are USB HID input events, not characters
<dh`> when you're in X
<xentrac> Wayland too
<xentrac> it's true that if you're on a virtual console it's the kernel that does it
<xentrac> the X server (in my case) transforms those into XKeyEvents, and then gnome-terminal (via GTK) does the translation to ASCII
<xentrac> if you have an application that's running on a virtual console, though, it doesn't have to suffer through the kernel's lossy transformation to ASCII; it can read the key events from (on Linux) /dev/input/*
<xentrac> anyway, defining an unambiguous represntation for <k><e><y><s><r><BS><t><r><o><k><e><s> isn't the hard part; it's changing all the applications to use it
<dh`> yes, but the raw console interface isn't standard
<dh`> anyway there are only about a dozen things to patch to get a large amount of traction (just readline, curses, vim, and emacs will go a long way)
<xentrac> I compiled a longer list at the link above
<GreaseMonkey> if i'm reading this correctly, what you'd want to do here is roll your own termcap config
<GreaseMonkey> see how far you get with that
<GreaseMonkey> ...also i'm curious as to if anyone's working on to-riscv dynamic recompilers, currently i'm having a go at writing a backend for dosbox-staging
<xentrac> no, that doesn't address the problem at all
<sorear> there's a qemu tcg backend
<sorear> haven't tried to use it
<GreaseMonkey> a fun part is as part of the process of writing one you end up with gold like this:
<GreaseMonkey> => 0x0000003fb1ffb3cc:lbua0,0(zero) # 0x0
<GreaseMonkey> (i accidentally fed in a pointer to what expected a register)
<xentrac> heh
<GreaseMonkey> i really do need to confirm if i'm actually using a dynarec when using qemu
<GreaseMonkey> i mean, qemu works, but don't expect performance miracles
<jrtc27> yeah I've done that in LLVM before and got immediates and registers confused, though -verify-machineinstrs is a godsend for finding that kind of thing
<GreaseMonkey> i'm also quite impressed with dosbox's admittedly slightly wonky core_dynrec core, backends are about 1000 lines each
<GreaseMonkey> could be made smaller almost trivially, but still a pretty good effort
<sorear> if you didn't build it with --enable-tcg-interpreter and you don't have KVM loaded (and you're not in an environment where HVF/etc is applicable), you can establish by exclusion that you're using a dynrec
<GreaseMonkey> alright, probably got a dynarec then
<jrtc27> I think it's safe to say that GreaseMonkey isn't running RISC-V binaries on HVF :P
<jrtc27> Xen is technically kinda a thing
<GreaseMonkey> the only "hypervisor" i've got here is OpenSBI
<GreaseMonkey> speaking of which, the unaligned memory access code is in dire need of some optimisation
<GreaseMonkey> a 64-bit load is done as 8 individual "temporarily give ourselves U-mode privileges and suppress traps" byte loads
<jrtc27> opensbi is not a hypervisor...
<GreaseMonkey> yeah, hence the "quotes"
<jrtc27> yeah, uh, don't do unaligned accesses
<jrtc27> just because it works doesn't mean it's fast and a good idea
<jrtc27> really they should've just been banned like sparc did
<jrtc27> but then you break crappy software and hurt adoption of your new architecture
<GreaseMonkey> yeah my intention is to avoid them in my dynarec backend
<GreaseMonkey> and also nowadays there's plenty of code that runs on ARM that had to go via cores which didn't support unaligned accesses
<GreaseMonkey> GCC handles those things decently
<jrtc27> arm has supported unaligned accesses for ages
<jrtc27> the weirdo rotation on unaligned accesses is in the past
<GreaseMonkey> except it's not, because the Cortex-M0+ exists
<GreaseMonkey> it's in an alternate timeline, but it's not merely in the past
<jrtc27> no, there it faults
<jrtc27> which is the correct behaviour
<jrtc27> I'm talking about the pre-armv4(?) behaviour where loading 4 bytes from address 0x1003 would load 4 bytes from address 0x1000 and rotate it by 24 bits
<geist> yah pre-armv5 indeed
<jrtc27> pre-armv6 apparently, that's newer than I thought...
<geist> they added an ability to generate a fault in v5 i think, and then in v6 they started to add the ability to just deal with unaligned, etc
<geist> and v7 made it the default
<jrtc27> ah that sounds more like the right timeline
<jrtc27> the old behaviour actually made sense from a hardware perspective though :P
<dh`> I blame the mips lwl/lwr patent rubbish
<geist> oh yeah? there was a period where it was dangerous to implement it?
<dh`> yes
<geist> ah that's interesting. probably mostly impacting other load/store architectures more than CISC ones?
<dh`> idk
<geist> never heard of it, but totally not surprised
<dh`> I didn't hear about it until years after
<jrtc27> alpha's equivalent sucked
<geist> just not allowing yuo to do unaligned at all and only on a 64bit boundary?
<jrtc27> but was probably done that way so there were no data dependencies for the loads
<geist> (EV4 at least)
<GreaseMonkey> ...oh right
<GreaseMonkey> if i understand correctly, ARMv7-M goes for the 32-bit -> 16+16 or 8+16+8 approach for unaligned accesses
<jrtc27> also lwl/lwr only worked for 32-bit values, the 16-bit version was more verbose
<GreaseMonkey> ah yes, 16-bit, the curse of many a 32-bit or 64-bit RISC machine
<dh`> because so much code does 16-bit accesses :-)
<dh`> though there was more when mips was invented
<jrtc27> lots of short's in the TCP/IP stack
<GreaseMonkey> riscv64 gives you 32-bit sign extends as ADDIW rd, rs, 0
<GreaseMonkey> and 16-bit sign extends as two shifts
<geist> alpha EV4 is hilarious to watch the codegen for string routines. it's always doing 64bit load/stores and a bunch of shifting and masking
<GreaseMonkey> although 8-bit sign extends are also two shifts but at least the zero extends are one op
<jrtc27> bitmanip adds single-instruction zext.[hw] and sext.[bh]
<jrtc27> all in Zbb, and ext.[hw] are in Zbp
<jrtc27> *zext.[hw]
<jrtc27> (and if you don't have the relevant extension, they're implemented as pseudoinstructions that expand to the shifts in the assembler)
<GreaseMonkey> what i've seen of bitmanip is a nice mix of impressive and weird
* sorear still unsold on the "single bit" and "shift ones" instructions, does anyone else have or use those
Andre_H has quit [Quit: Leaving.]