#riscv on 2021-06-23 — irc logs at libera.irclog.whitequark.org

2021-05-20 20:58 sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv | Backup if libera.chat and freenode fall over: irc.oftc.net

00:01 <jrtc27> yeah no

00:01 <jrtc27> H was held up because people felt it was dependent on AIA

00:02 <jrtc27> or, really, they wanted to make sure that AIA wouldn't cause issues with a premature H ratification

00:02 <jrtc27> but, that's silly, H is pretty simple

00:02 <xentrac> I think probably someone is confusing the H extension with H mode

00:02 <xentrac> again

00:03 <sorear> let's not ascribe to malice that which can be explained by the universal slowness of bureaucracies

00:04 <jrtc27> exactly

00:04 crabbedhaloablut has quit [Ping timeout: 244 seconds]

00:04 <jrtc27> save the tin foil hats for another day

00:04 crabbedhaloablut has joined #riscv

00:06 <xentrac> do we really want bureaucracies on the critical path for technical progress then?

00:06 <jrtc27> yes, because the alternative is a clusterfuck of a mess where there's zero standardisation

00:07 <jrtc27> and it's not really bureaucracy in this case, it's just that many involved are volunteers so progress is slow

00:07 <xentrac> that sounds implausible

00:08 <jrtc27> who decides what the standard extensions are if there's zero bureaucracy?

00:08 <xentrac> (that the only alternative is zero standardization, I mean, not the volunteers)

00:08 <xentrac> there have been a lot of alternatives tried for that in the past. some have worked better than others

00:09 <xentrac> right now I'm reading through the PDF spec, which came out of Adobe's bureaucracy and then miraculously got pushed through as ISO 32000, and it's an amazing clusterfuck of a mess

00:10 <jrtc27> that's the complete opposite

00:10 <xentrac> did you know that in PDF strings, a line break represents 0x0a, regardless of whether the PDF file encodes it as 0x0a, 0x0d, or 0x0d 0x0a, while in annotation text, you have to separate paragraphs with 0x0d characters?

00:10 <jrtc27> developed behind closed doors then shoved into the open and declared a standard

00:10 <xentrac> yup. and it's the epitome of bureaucracy

00:10 <jrtc27> your argument seems very confused

00:11 <jrtc27> your example is entirely different to what you think the issue with riscv is

00:11 <jrtc27> bureaucracy can encompass many things

00:11 <jrtc27> and be involved in many different ways

00:11 <xentrac> sorry, allow me to attempt to state my position more clearly; it's hardly surprising that the above is confusing!

00:16 <xentrac> sorear ascribes the slow development of the H extension to the universal slowness of bureaucracies. plausibly this is correct. slow development of technical progress is a drawback, because it hurts everyone everywhere; they benefit only once the progress is actually realized and available. you argue that the bureaucracy involved in the slow development is worth the drawback, because "the alternative is

00:17 <xentrac> a clusterfuck of a mess where there's zero standardisation". but in fact standardization without bureaucracy or development slowdowns or a clusterfuck of a mess is possible, and the available evidence suggests that adding more bureaucracy tends to increase clusterfucks of a mess, not decrease them, the PDF standard being today's exemplary shit sandwich.

00:18 <xentrac> you offered a second hypothesis: progress is slow because many involved are volunteers. many certainly are, but it's not clear that this is a situation that tends to slow progress

00:20 <xentrac> I mean, that's also the case with, for example, 3-D printers, and progress there is enormously more rapid than before the volunteers showed up

00:20 <xentrac> from my point of view, the crucial question is whether the situation enables the volunteers or other workers to build on one another's progress, or instead causes them to get in one another's way

00:21 <sorear> "progress" is a nasty term, can mean whatever people want it to

00:21 <xentrac> there's a whole ideological question there, it's true

00:22 <sorear> and there's nothing stopping you from building on others' work, the drafts exist today

00:22 <xentrac> maybe not; certainly you can tape out silicon implementing the draft extension

00:23 <sorear> if you want something that will be supported in X0 years with no further action on your part, then yes, you have a schelling point problem with a large number of stakeholders and it will take a while for that to reach equilibrium

00:24 <xentrac> yup

00:34 pierce has joined #riscv

00:37 <sorear> i could say some things about the real problem being trying to make an ISA be all things to all people

00:39 pierce has quit [Ping timeout: 256 seconds]

00:39 <xentrac> it's a real problem, yeah

00:39 <jrtc27> that's primarily because the embedded world is very different to everything else

00:40 <jrtc27> I wouldn't be opposed to the idea of two privileged specs, one embedded and one application

00:40 <jrtc27> but that ship has saild

00:40 <jrtc27> *ed

00:40 <xentrac> interesting, what's the relevant difference? paging?

00:41 <jrtc27> (and that's kind of what the fast interrupt group is trying to do, but you just can't do it the way they're trying to do and have it fit properly into the existing privileged ISA)

00:41 <jrtc27> paging is already optional

00:41 <xentrac> it is, yes

00:42 <xentrac> what's the friction with the existing privileged ISA and fast interrupts?

00:44 <jrtc27> https://github.com/riscv/riscv-fast-interrupt/issues/111 is the primary set of issues I have with it

00:44 <jrtc27> but the basic problem is that it's taking Arm-M's interrupt design and trying to bolt it onto RISC-V which looks more like Arm-A

00:45 <jrtc27> my primary objection is the fact that there are cases where xEPC ends up holding *data pointers* not *code pointers*

00:45 <jrtc27> which is complete nonsense

00:46 <jrtc27> a data pointer is never a program counter

00:46 <jrtc27> and thus should never be the exception program counter

00:47 <jrtc27> but, really, I still don't get why they have all this special handling for resumable interrupt handling

00:47 <jrtc27> because restarting entirely should function identically...

00:48 <jrtc27> the only real justification is possible hardware simplicity due to being able to break things up slightly more

00:49 <jrtc27> but... that's a poor one when it leads to such an insane architecture

00:49 <xentrac> agreed, a data pointer is never a program counter

00:50 <jrtc27> (specifically, if there is an access fault in loading the interrupt handler address from the handler table, both xepc and xtval get set to the address of the entry in the table...)

00:50 <jrtc27> (which is nuts)

00:50 <xentrac> shades of VAX

00:51 <jrtc27> (previously it was just xepc, which I pointed out was a real WAT for software to handle because access fault handlers will look at xtval already, so their "solution" was to just write it to both rather than fix the insanity)

00:52 <xentrac> krste is claiming "I don't believe in general we can avoid resuming table load (versus restart of original interrupted instruction)." but I don't understand his argumet

00:53 <jrtc27> the arguments are (a) possible slight microarchitectural simplicity (b) edge-triggered interrupts, but if you lose them in that case *then your core is already broken because if you temporarily masked interrupts you would have lost that one*

00:54 <jrtc27> (i.e. the interrupt controller MUST latch edge-triggered interrupts internally and not view them as completed until it has successfully vectored to the handler)

00:54 <jrtc27> (which is a requirement *anyway* and renders (b) completely invalid an argument)

00:54 <xentrac> hmm, you don't have to implement interrupt masking in such a way as to fail to latch edge-triggered interrupts

00:55 <jrtc27> right, but the same thing that makes interrupt masking not lose interrupts is the same thing that makes trapping on loading from the handler table not lose interrupts

00:55 <xentrac> but I guess you could implement it in such a way as to clear the latch too early in a case like this

00:55 <xentrac> right

00:55 <jrtc27> yeah

00:55 <jrtc27> which is just crappy hardware

00:55 <jrtc27> not a spec issue

00:55 <jrtc27> well, it's a spec issue in that such hardware should be non-compliant

00:56 <xentrac> it sounds like it but i'm not confident enough to be sure whether you or krste is right

00:56 <xentrac> since you both know a lot more about hardware design than i do

00:56 <jrtc27> it's quite possible there is another reason that he *hasn't* said

00:56 <jrtc27> but what he *has* said does not change my mind

00:56 <jrtc27> and I will never be convinced that a data pointer should end up in xepc

00:56 <jrtc27> ever

00:56 <xentrac> agreed :)

00:56 <jrtc27> doesn't mean *my* proposal is necessarily the right one though

00:57 <jrtc27> if there are things I'm missing

00:59 <xentrac> it does make sense that fast interrupts would be critically important for real-time stuff in a way that they just aren't for a cellphone or laptop

01:00 <xentrac> the worst-case latency vs. average-case throughput tradeoff pervads systems design

01:00 <xentrac> *e

01:06 <sorear> it's not a data pointer

01:06 <sorear> it's an instruction in a special ISA mode where jumps have a 32-bit immediate

01:09 <sorear> and it's not reasonable to roll back the interrupt after a subsequent instruction page fault because the instruction has already been vectored

01:26 Sos has quit [Quit: Leaving]

02:07 <xentrac> haha

02:07 <xentrac> that's a novel concept

02:08 <xentrac> my ISA doesn't have variable-length instructions! it just has a bunch of mode switches that only ever stay on for a single instruction!

02:09 <sorear> there's only one reasonable way to do this. are you going to create an entire separate redundant set of paths for loading a word from memory, trapping if it's not accessible, and using it to redirect fetch, or are you going to use the one that's already there?

02:10 <xentrac> couldn't you just prohibit the OS from paging out the page tables its real-time interrupts need to find their interrupt handlers? maybe I'm misunderstanding the scenario

02:10 <xentrac> I feel like a page fault in an interrupt handler in response to a time-critical IRQ line is going to be bad news no matter what

02:11 <sorear> congratulations! you just introduced a hypervisor vulnerability

02:16 <xentrac> hmm, so you're running your real-time tasks under a real-time OS that's running in U-mode under an S-mode hypervisor?

02:17 <sorear> you need to not be able to break a hypervisor by doing weird things under it

02:17 <xentrac> but the hypervisor is letting the RTOS code set up page tables that it will then use to run interrupt handlers? I suppose the hypervisor has to translate both the page tables and the interrupt vectors before handing them off to the hardware, right?

02:17 <sorear> "why are you trying to run real-time tasks" is not a valid excuse for deadlocking or otherwise violating security properties

02:17 <xentrac> oh, agreed!

02:18 <xentrac> I mean I feel like if the hypervisor is letting the RTOS set up its own interrupt vectors that point to unmapped memory, the RTOS can probably get up to other mischief as well by pointing those interrupt vectors at random bits of hypervisor code

02:19 <sorear> why is there hypervisor code in the guest address space?

02:19 <xentrac> why would there be?

02:19 <sorear> you just claimed there was

02:19 <xentrac> I did?

02:20 <sorear> > pointing those interrupt vectors at random bits of hypervisor code

02:20 <sorear> you can't do that unless it's in the relevant address space

02:20 <xentrac> presumably the interrupts in question aren't going to be handled in U-mode after context-switching to an RTOS task address space, right?

02:20 <sorear> every part of this is wrong

02:20 <xentrac> hardly surprising!

02:20 <xentrac> what's the truth?

02:21 <sorear> let's start from scratch

02:21 <xentrac> okay!

02:25 <sorear> you have a core which implements the vectoring modes described in the base spec. you want to remove the 21-bit restriction on handler addresses that emerges from the use of normal jumps. the easy way to do this is to add 1 flop to your fetch/decode unit "we're fetching a handler address", then when you fetch a 32-bit word with that bit set, treat the whole thing as a jump

02:25 <xentrac> okay

02:27 <sorear> there's a much harder way with no real advantages where when the interrupt arrives you stop executing instructions, inject a command into the data memory system to fetch a vector, then use a dedicated state machine to vector to that when it arrives... now why would you do this on cores intended to compete with M4

02:27 <xentrac> we're talking about physical interrupts here, right? where some external hardware needs attention either very rarely or with very low latency?

02:28 <xentrac> couldn't you point the normal jumps at a table of two-instruction trampolines somewhere in the low 2 MiB of RAM?

02:28 <xentrac> I mean that costs you an extra pipeline flush and two more instructions; is that the cost we're trying to avoid?

02:29 <sorear> i think so, but I don't want to get into an argument about whether saving 10 cycles on an interrupt in 2021 is actually useful

02:29 <xentrac> it's definitely useful in some cases

02:30 ewdwasright has quit [Ping timeout: 265 seconds]

02:30 <xentrac> when we're handling these interrupts in the hypervisor scenario, the hypervisor might not be running the RTOS at all, right? it might be running some best-effort task, like Linux or something

02:31 gector has joined #riscv

02:31 <xentrac> so the current page table wouldn't even be the RTOS guest's page table (or any of them if it has more than one)?

02:31 <sorear> no-one said anything about actually running this stuff under hypervisors productively

02:32 <xentrac> well, if it's *unproductively*, then the hypervisor can emulate all the interrupt handling in software because it doesn't have to be fast, right?

02:33 <sorear> the simplest possible thing to do is to treat it like any other instruction fetch, which is what the issue does

02:33 <sorear> everything you've proposed complicates things

02:33 <xentrac> you mean jrtc27?

02:33 <jrtc27> if you're loading your handler address through your icache then you need to be veeeeeeeery careful about coherence

02:34 <sorear> no, I mean you, you keep saying bizarre things about hypervisors

02:34 <jrtc27> but yes I agree it makes sense to load it around the fetch unit not your load store unit

02:34 <jrtc27> but that's a microarchitectural implementation detail

02:34 <jrtc27> it shouldn't pollute the spec

02:34 <xentrac> the only thing I proposed was that it isn't worth worrying too much about what happens when your fast interrupt handler needs to get paged in from disk

02:34 <xentrac> I was trying to understand the concern you raised about hypervisor vulnerabilities

02:35 <xentrac> i'm asking these questions because I've never built a system like this and so I think it's very likely that I'm missing the forest for the trees

02:36 <xentrac> it sounds like you were talking about a case where the RTOS is running under a hypervisor, but not "productively", which I interpreted as either it doesn't have access to real hardware or it doesn't need to meet real-time deadlines; is that what you meant?

02:37 <sorear> there's no RTOS here. there's a hypervisor, and someone trying to break out of the hypervisor by entering states that you say "isn't worth worrying about what happens"

02:37 <sorear> anyway, we've already spent more time on this than CLIC will ever save

02:38 <xentrac> well, but they're states that the hypervisor has to prevent the guest from entering anyway, aren't they?

02:38 <sorear> I'm not interested in continuing this.

02:38 <sorear> &

02:38 <xentrac> okay. well, thank you for your explanation so far!

02:39 <xentrac> sorry I was so dense that I couldn't understand what you were trying to explain to me :(

02:39 <sorear> not a fault situation, I'm not upset and am not looking for an apology

02:40 <xentrac> okay!

02:42 <sorear> i'm just saying (1) it's important to specify _some_ behavior or set of behaviors in every possible situation so that security can be exhaustively analyzed (2) there's an obvious way to implement the functionality (3) the obvious implementation does a specific thing in off-nominal cases, which is good enough to specify

02:42 <sorear> if that helps, great, if not, can try again tomorrow, I'm exhausted for now

02:44 <sorear> jrtc27: i'm probably going to get around to auditing this approximately never but there does need to be a fence.i requirement yes

02:44 <jrtc27> which is.. awful :P

02:44 <sorear> vector table is going to be in ROM in relevant cases

02:45 <jrtc27> (your proposal of specifying it as a separate execution mode where every instruction is XLEN bytes and is a jump to that absolute address is an... interesting way of making everything "fit")

02:45 <xentrac> yeah, I agree with (1) and I think your argument for (2) is plausible

02:45 <xentrac> not sure about (3)

02:46 <sorear> sorry for my impatience

02:47 * jrtc27 has had fun discovering a hole in our C++ spatial safety implementation :)

02:47 <dh`> is it really faster to make a special-case table fetch like this vs. doing it as the first instruction of the trap handler?

02:47 <jrtc27> std::make_shared allocates the control block inline with the data being pointed to

02:47 <dh`> granted the latter requires jiggering all the bits so you can do it that way, which riscv doesn't have but mips did at one point

02:48 <sorear> i just don't think CLIC is especially useful, it only saves a handful of cycles over the baseline arch, it's a far less complete solution for "write executives in C" than arm-M has, and if you really cared about cycle-precise event handling in 2021 you'd be using some kind of efpga, which wasn't an option back when cortex-M was new

02:49 <jrtc27> yet it's implemented in the SiFive E whatever core

02:49 <sorear> yes, because they need to make a table of arm features and sifive features, doesn't matter if it's actually equivalently useful

02:49 <jrtc27> I agree it's really not very useful, but it exists, so I want to at least try and make it not awful

02:49 <jrtc27> not that I'll ever have to care about it

02:50 <jrtc27> CHERI-RISC-V can just stick with a sane CLINT

02:50 <jrtc27> and whatever AIA eventually ends up being

02:51 <sorear> anyway tell me more about shared_ptr

02:51 <jrtc27> and maybe by the time that ratification happens people will be starting to take CHERI seriously within RISC-V

02:51 <jrtc27> oh

02:51 <jrtc27> shared_ptr itself is fine

02:51 <jrtc27> but it has two jobs: store the pointer and track ref counts, both shared (strong) and weak

02:52 <jrtc27> if you do shared_ptr(new Foo) then you get a control block and a pointer to your data

02:52 <jrtc27> if you do make_shared<Foo>() then the control block and Foo are combined into one allocation

02:52 <jrtc27> so oops the bounds of the capability include the control block

02:55 <sorear> is that a problem? is shared_ptr enforcing any kind of access boundary? people could implement their own shared_ptr without your mitigation

02:55 <jrtc27> it means you can corrupt ref counts

02:55 <jrtc27> and sure, they can, but they can pick up the pieces if they do that

02:55 <jrtc27> same as implementing their own custom allocator in generla

02:56 <sorear> corrupting ref counts sounds like a temporal safety problem

02:56 gector has quit [Ping timeout: 252 seconds]

02:56 <jrtc27> yeah, my guess is this one is *probably* fine in practice if you've already got our heap temporal safety turned on

02:56 <jrtc27> either you bump the counts up too high and cause resource leaks

02:57 <jrtc27> or you decrease them and cause early free, which will either be safe due to quarantine or, post-revocation, deterministically trap

02:57 <jrtc27> which isn't great, but it's fail-safe, so long as DoS isn't a concern

02:58 <jrtc27> and, well, fail-stop is the way we roll

02:58 <jrtc27> and the only thing you really can do at that point

02:58 <sorear> if you're relying on caps for something, and you don't have temporal safety, I think you've already lost because you can hold on to the "payload" cap after the ref count hits zero

02:58 <jrtc27> agreed

02:58 <jrtc27> well, depends on your threat model

02:59 <xentrac> sorear: no apology necessary, you have no obligation to explain yourself to me. i'm very interested in the topic of how to avoid having hypervisor vulnerabilities, and if you have insights I can learn from, I'd be delighted, but you don't owe me any of your time

02:59 <jrtc27> but it's certainly important for stopping a lot of memory safety vulnerabilities from being exploitable

03:00 <jrtc27> anyway, I hear birds chirping, must mean it's time for me to sleep

03:01 <dh`> here at this time of year they start up at like 3am, it's crazy

03:02 <dh`> not that this invalidates the conclusion

03:03 <xentrac> not sure about the efpga thing. the cheapest microcontroller is 3¢, the cheapest 32-bit microcontroller is something like 140¢, and the cheapest FPGA is more like 180¢

03:04 <xentrac> interestingly the 12¢ version of the 3¢ microcontroller has a totally different answer to interrupt latency problems

03:05 <sorear> mouser has the ice40lp384 for 120¢ @ 1000 and that's a couple times bigger than what I have in mind

03:06 <xentrac> oh cool, that's cheaper than digi-key (@100)

03:06 <jrtc27> (also, it's interesting to note that this "optimisation" does have noticeable side-effects: weak_ptr's have to keep the control block alive after all shared_ptr's are gone, which means shared_ptr(new Foo) can delete the pointer and thus free the memory for Foo, but make_shared() can't free the storage without also freeing the control block, so can only run the destructor for Foo, keeping the memory still around until all weak references disappear)

03:06 <xentrac> although maybe that's because they don't have the lp384 in stock, just the ul640

03:07 <xentrac> does mouser also have cheaper stm32 clones than that?

03:08 <sorear> but at this level you're mostly paying for the package, not the circuit

03:08 <sorear> didn't look

03:08 <xentrac> mostly but an avr is still 40¢

03:09 <xentrac> the 12¢ padauk chips use round-robin hardware multithreading with the idea that you can dedicate one of the threads to busy-waiting on your I/O when necessary, so you have worst-case response of 125ns with a 16MHz clock

03:09 <xentrac> similar to the propeller or the ga144

03:10 <xentrac> hard to beat that with interrupt response. but it's also hard to represent as a checkmark in an arm vs. brand-x comparison table

03:10 <sorear> i feel like that approach makes more sense than trying to do anything with multi-level interrupts

03:11 <xentrac> me too

03:11 <xentrac> jrtc27: it's dismaying that weak references can retain the control block indefinitely, particularly if it's part of the same allocation that contains Foo, which could thus be arbitrarily large

03:12 <xentrac> it's not really a new approach i guess, it's how the cdc 6600 did i/o too

03:14 <xentrac> with the costs I was just thinking that maybe risc-v microcontrollers will continue to be cheaper than fpgas for a significant amount of time

03:16 gector has joined #riscv

03:17 <xentrac> (or start to be, I guess; not sure how much a GD32VF103 is but it's probably not <120¢)

03:18 davidlt has joined #riscv

03:35 JSharp is now known as JSharp_

03:35 JSharp_ is now known as JSharp__

03:35 JSharp__ is now known as jaesharp

03:35 jaesharp is now known as jaesharp_

03:35 jaesharp_ is now known as jaesharp__

03:36 gector has quit [Ping timeout: 252 seconds]

03:37 jaesharp__ is now known as JSharp

03:38 JSharp is now known as Rachel

03:38 Rachel is now known as Rachel_

03:38 Rachel_ is now known as Rachel__

03:38 Rachel__ is now known as JSharp

04:03 frost has joined #riscv

04:20 davidlt has quit [Ping timeout: 268 seconds]

04:25 <GreaseMonkey> hmm, is there an SVD for the FU740?

04:36 gector has joined #riscv

04:37 radu242 has quit [Quit: The Lounge - https://thelounge.chat]

04:39 radu242 has joined #riscv

04:47 <GreaseMonkey> also is it possible for one to debug via the JTAG port from boot mode 0000?

04:48 <GreaseMonkey> ...or is that pointless and i'm somehow supposed to do it via the cable...

04:50 <GreaseMonkey> oh right, seems i'm supposed to use it via the cable

04:51 FluffyMask has quit [Quit: WeeChat 2.9]

04:58 gector has quit [Ping timeout: 252 seconds]

05:05 <xentrac> hmm, I'd forgotten about https://www.crowdsupply.com/sutajio-kosagi/precursor

05:47 zjason has quit [Remote host closed the connection]

05:48 zjason has joined #riscv

05:52 indy has joined #riscv

06:09 <solrize> xentrac, https://www.adafruit.com/product/5041

06:34 smartin has joined #riscv

06:57 indy has quit [Ping timeout: 265 seconds]

07:03 indy has joined #riscv

07:17 leah2 has quit [Ping timeout: 240 seconds]

07:17 davidlt has joined #riscv

07:20 leah2 has joined #riscv

07:32 crabbedhaloablut has quit [Remote host closed the connection]

07:32 crabbedhaloablut has joined #riscv

07:49 valentin has joined #riscv

07:54 rvalles has quit [Ping timeout: 265 seconds]

08:11 hendursa1 has joined #riscv

08:11 hendursaga has quit [Ping timeout: 244 seconds]

08:15 chrysh has joined #riscv

08:16 chrysh has quit [Client Quit]

08:39 jotweh has joined #riscv

09:02 Sos has joined #riscv

09:05 TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

09:05 TMM_ has joined #riscv

09:22 solrize has quit [Ping timeout: 240 seconds]

09:34 solrize has joined #riscv

09:35 solrize has quit [Changing host]

09:35 solrize has joined #riscv

09:39 crabbedhaloablut has quit [Remote host closed the connection]

09:39 crabbedhaloablut has joined #riscv

12:00 wingsorc has joined #riscv

12:14 riff_IRC is now known as riff-IRC

12:32 choozy has joined #riscv

12:37 frost has quit [Quit: Connection closed]

12:50 rvalles has joined #riscv

12:55 rvalles has quit [Client Quit]

12:56 rvalles has joined #riscv

13:08 Sos has quit [Quit: Leaving]

13:08 Sos has joined #riscv

13:15 TMM_ has quit [Ping timeout: 252 seconds]

13:16 TMM_ has joined #riscv

13:31 devcpu has quit [Quit: leaving]

14:00 choozy has quit [Remote host closed the connection]

14:16 choozy has joined #riscv

14:21 choozy has quit [Remote host closed the connection]

14:26 rvalles has quit [Read error: Connection reset by peer]

14:27 rvalles has joined #riscv

14:53 Andre_H has joined #riscv

15:00 Andre_H has quit [Ping timeout: 258 seconds]

15:02 devcpu has joined #riscv

15:29 Andre_H has joined #riscv

15:39 <xentrac> solrize: I bet that US$1 price is just as fake as the US$5 price for the Raspberry Pi Zero

15:39 <xentrac> hmm, maybe not? https://www.adafruit.com/product/5042 is a 10-pack for US$10

15:40 <xentrac> and adafruit historically aren't the cheapest

15:47 vagrantc has joined #riscv

16:06 choozy has joined #riscv

16:10 alexfanqi has quit [Ping timeout: 252 seconds]

16:12 alexfanqi has joined #riscv

16:19 FluffyMask has joined #riscv

16:25 choozy has quit [Ping timeout: 252 seconds]

16:38 gector has joined #riscv

16:44 gector has quit [Ping timeout: 250 seconds]

17:08 gector has joined #riscv

17:13 gector has quit [Ping timeout: 250 seconds]

17:57 davidlt has quit [Ping timeout: 258 seconds]

18:02 <solrize> xentrac, it's been all over the place that those chips are going to be $1 each

18:02 <solrize> in small quantity, reel quantities should be less

18:03 <solrize> remember that sparkfun (when they have stock) lets you order up to 100 picos at a time

18:05 <solrize> note that an esp32-s2 is also around $1

18:05 elastic_dog has quit [Ping timeout: 240 seconds]

18:06 <solrize> on the other had, the rp2040 has no program flash on chip, so you have to use a (cheap) spi flash like the pico does

18:20 elastic_dog has joined #riscv

18:30 TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

18:30 TMM_ has joined #riscv

18:33 <xentrac> solrize: this is a good point. also btw the RP2040 has a totally different way to do low-latency response things called "pioasm" that doesn't require fast interrupt response and is also cheaper than an FPGA, called . or would be if you could buy it separately (is there anything similar out there as a standalone chip)?

18:33 <xentrac> I haven't actually tried pioasm so I don't know if it's as nifty as it sounds

18:35 mahmutov has joined #riscv

18:36 <xentrac> ugh, editing fail

18:39 <xentrac> two PIO blocks each containing four state machines, each with four 32-bit registers, that share a 32-instruction program memory; each state machine is connected to the AHB-Lite bus through a 4-deep FIFO each way, and then the state machines can twiddle and read I/O pins

18:41 <xentrac> it looks like if you wanted to program a quadrature encoder or a PDM generator or SDIO or something it would be a lot easier to do with this pioasm coprocessor than with a CPLD and maybe even easier than with a general-purpose computer

18:44 choozy has joined #riscv

18:47 wingsorc has quit [Quit: Leaving]

18:55 chronon has joined #riscv

19:13 <xentrac> (they can also tug on the main processor's IRQs, of course)

19:15 <xentrac> (hmm, PDM might be beyond its capacity...)

19:20 Sos has quit [Ping timeout: 250 seconds]

19:46 Sos has joined #riscv

20:24 <solrize> if you look at an rp2040 die shot, it is almost all ram arrays, so no reason they couldn't have made the pio's much more powerful or added realtime risc-v cores or whatever. maybe some future chip will do that. the beaglebone has realtime coprocessors (PRU's) for stuff like that. There are two of them each with 32-bit registers and 8k of ram partly shared with the host cpu

20:24 <solrize> https://www.seeedstudio.com/Sipeed-Tang-Nano-FPGA-board-powered-by-GW1N-1-FPGA-p-4304.html as alternative this looks neat

20:24 <solrize> i think the chip is also in the $1 range

20:34 <xentrac> sweet :

20:34 <xentrac> :)

20:36 <xentrac> I think the only ALUish operations in pioasm are bit shifts, decrements, and zero tests, but I'm not done with the datasheet yet

20:36 <xentrac> so I don't think you can do, say, a digital differential analyzer in it

20:37 <xentrac> but you can drive a lot of common wire protocols in programs that are only two to six instructions long, at the chip's full clock speed and in lockstep between the state machines

20:37 <xentrac> oh I guess there's an equality test too

20:39 <xentrac> I think this is the first ISA I've seen designed after 01980 with an XEQ instruction (but called OUT EXEC)

20:47 <jrtc27> "Besides making people’s eyes bulge"

20:47 <jrtc27> heh, indeed

20:47 <jrtc27> out exec and mov exec are not normal things...

20:48 <xentrac> haha

20:48 <xentrac> I guess nowadays single-stepping through ROM for debugging is usually handled with an 8086-style trace flag or something?

20:49 <xentrac> I think that was one of the original uses for XEQ

20:55 <jrtc27> yes, if an ISA is to be taken seriously it should have hardware single-step and breakpoint functionality

20:55 <jrtc27> RISC-V only partially fulfils that

20:56 <jrtc27> in that its hardware debug support only exists for bare-metal debugging, there's nothing yet that operating systems can use

21:02 <xentrac> that's gonna be a problem if you're running a user task from ROM, which I guess is a thing you might want to do

21:03 <xentrac> these days RAM is faster, right? so if you have enough RAM to copy the code into, it won't make the user task run *slower*, and then you can single-step and breakpoint by dropping little turds into its instruction stream and fence.i'ing them

21:04 <dh`> yes but execute-in-place is desirable

21:05 <xentrac> yeah, I'm just saying, it sounds like a pain in the ass to implement in software, but it sounds like it's at least not impossible?

21:05 <jrtc27> you can do it in software with a tiny little buffer

21:05 <jrtc27> you just have to be careful about things that read pc (ie jalr and auipc, plus potentially exceptions)

21:06 <xentrac> and you also have to fence.i your buffer updates, right?

21:06 <xentrac> I guess all those fence.is could get pretty expensive if you're single-stepping under program control instead of interactively

21:07 <jrtc27> fence.i is going to be lost in the noise

21:07 <xentrac> cool, I was thinking of the ridiculous cost of full cache flushes on old MIPS

21:08 <xentrac> naturally enough none of the risc-v specs tell you how fast things are

21:09 <xentrac> so I thought modern RISC-V implementations might run into the same kind of swamp with heavily self-modifying code like that

21:10 <xentrac> you extend the tiny-little-buffer/be-careful approach a little further and before you know it you're writing qemu

21:10 <dh`> well

21:11 <jrtc27> depends on the core...

21:11 <dh`> single-stepping one instruction at a time so the debugger can step one line

21:11 <dh`> is usually pretty slow

21:11 <dh`> but mostly you don't notice

21:12 <xentrac> yeah, the case where I noticed that kind of thing recently was when I tried GDB's reverse-debugging via record-and-replay thing a couple of months ago

21:13 <xentrac> in theory it's magic: you run the program until you reproduce the bug, locate the problem in memory, set a watchpoint, and execute backwards to see how that memory got that way

21:15 <xentrac> and, yeah, instruction-wise record-and-replay via ptrace isn't ever gonna be fast, right? you pay a few hundred nanoseconds in context switch overhead for every instruction

21:15 <xentrac> so maybe your 15-millisecond program will take 15 seconds to run. that's totally fine!

21:16 <xentrac> but GDB never ceases to find new ways to disappoint me, because actually it took 20 minutes, which is not fine at all

21:16 <dh`> heh

21:16 <xentrac> also, 4 gigs of RAM, which is okay

21:16 <xentrac> but wouldn't always be

21:18 <xentrac> If I had to prioritize, hardware watchpoint support seems a lot more important than hardware breakpoint support

21:19 <xentrac> (when your memory is virtual, anyway. if you have to XIP then not having hardware breakpoints will make baby xentrac cry)

21:19 <xentrac> just because software watchpoint support usually has the same kind of slowdown as record-and-replay

21:31 <dh`> right

21:31 <jrtc27> rr gets iffy for lr/sc

21:32 <jrtc27> or any kind of interesting concurrency tbh

21:32 <dh`> I was talking to someone about that a while back and iirc I brought their concerns here and nobody was very interested :-)

21:32 <solrize> gdb supports hw watchpoints but i didn't know it had reverse execution at all

21:32 <dh`> my recollection is that I tried it and found it didn't actually work

21:33 <dh`> but that was some time back

21:34 <jrtc27> it's easier on x86 where you don't have lr/sc and have a relatively strong memory model

21:34 <jrtc27> or, perhaps, not easier, but you can get away with things more as you're less likely to hit issues

21:35 <jrtc27> good luck doing rr on a concurrent process on alpha

21:38 <xentrac> just to be clear, I wasn't using rr-project, which has a GDB interface and reportedly works a lot better; I was using GDB's internal record-and-replay functionality

21:38 <xentrac> not sure whether you meant rr-project by "rr", or the generic functionality it provides

21:39 <xentrac> I agree that concurrency is a huge problem, even without lr/sc, and I think that's where most of the effort goes

21:39 <xentrac> dh`: I had to fight with it a lot to get it to work

21:40 choozy has quit [Remote host closed the connection]

21:40 <xentrac> GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX2_Usable,-AVX_Fast_Unaligned_Load gdb myprogram

21:40 <dh`> in a sense one of the reasons record/replay is interesting is specifically to capture particular concurrente xecutions

21:40 <solrize> as this is #riscv i wonder if it's possible to make a special riscv cpu in an fpga, that does the trace recording automatically, making an undo record for each instruction as it runs

21:40 mahmutov has quit [Ping timeout: 258 seconds]

21:41 <dh`> probably

21:41 <xentrac> then set args; start; set record full insn-number-max 2000000; record; c

21:41 <dh`> the question is where you stream the undo log to

21:41 <solrize> dram

21:41 <xentrac> ideally, a SAN

21:41 <solrize> or if it's only 200000 and it's a big fpga then maybe on chip ram blocks

21:42 <xentrac> (perhaps obviously the above gdb line was on amd64)

21:43 <xentrac> IIRC GDB also doesn't support streaming the undo log out to disk

21:43 <jrtc27> we don't have it on our riscv cores currently, but our mips core could log a trace of every instruction to a circular buffer that you could trigger on various conditions and then dump out over jtag

21:44 <xentrac> nice! including data read and written, or just the instructions?

21:44 <solrize> yeah that would work, especially if the trace output has enough data to reverse the insn

21:44 <jrtc27> yes, it'd include register and memory writes

21:44 <xentrac> fabulous

21:44 <xentrac> one of gdb's record backends is an Intel feature that records only branches

21:44 <jrtc27> more used to find hardware issues than software issues though :D

21:46 <xentrac> I guess if you were designing a CPU to maximize replayability (rather than, say, efficiency) you'd default to XCHG rather than MOV

21:46 <xentrac> so the occasional OVERWRITE or MUL instruction would be logged, but the XCHGs and ADDs and SUBs wouldn't

21:47 <xentrac> unless you were trying to find hardware problems, of course

21:58 valentin has quit [Remote host closed the connection]

22:08 Andre_H has quit [Ping timeout: 250 seconds]

22:22 <xentrac> heh, there's a section in the datasheet about how to perform an addition with pioasm

22:22 <xentrac> of the rp2040

22:23 <xentrac> "A full 32-bit addition takes only around one minute at 125 MHz. The program pulls two numbers from the TX FIFO and pushes their sum to the RX FIFO, which is perfect for use either with the system DMA, or directly by the processor."

22:26 <xentrac> and yet you can do PWM in 7 instructions, I²C in 19 instructions (with DMA, and even clock stretching!), and the WS2812 protocol in 4 instructions

22:29 <xentrac> and apparently there's an example in the SDK book that uses pioasm to get a 125Msps logic analyzer, piping the data into RAM via DMA

22:34 <dh`> heh, only one minute

22:34 <dh`> that also sounds like it'd be a fun widget to muck about with formal verification for

23:45 smartin has quit [Quit: smartin]

23:47 riff-IRC has quit [Ping timeout: 265 seconds]

23:50 riff-IRC has joined #riscv