sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv
crest has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
crest has joined #riscv
djdelorie has quit [Remote host closed the connection]
cousteau has quit [Quit: ♫ I can't forget the day I shot that network down ♫]
shicz2 has quit [Ping timeout: 256 seconds]
shicz2 has joined #riscv
theruran has quit [Quit: Connection closed for inactivity]
mahmutov has joined #riscv
<Sofia> jimwilson: Do you work on the GCC compiler?
<jimwilson> Sofia, yes, for 34 years now
<Sofia> Cool. Could you elaborate on the difficulty with that stack pointer NOP?
<Sofia> The compiler I'm designing works quite differently at its core, so I have a different perspective at best.
<Sofia> Notably I have not yet reached register allocation or spilling to the stack. The land of unbounded registers and partitioned memory is nice. ;)
<jimwilson> we optimize, register allocate and spill, then optimize some more, sometimes we find an optimization after reg alloc which makes it possible to remove a temp but the stack slot has already been allocated
<Sofia> jimwilson: So just an instance of the phase ordering problem.
<jimwilson> remove a stack slot, and in general you need to redo frame generation which we don't have support for
* Sofia nods
<jimwilson> there are also issues about pointers into the stack which might escape or alias, and then you need to prove that you can delete that one stack slot without breaking anything
<jimwilson> changing stack slots might change address arithmetic for stack slots in inconvenient ways
<jimwilson> though in the simplest case where all stack slots get removed we really should find a way to fix that, I don't think it affects real code very often, but it makes simple testcases like yours look bad for gcc
<jimwilson> the cat is hungry, I need to go
<Sofia> Fair enough. TTYL. o/
EchelonX has quit [Quit: Leaving]
xxuull has joined #riscv
xxuull has left #riscv [#riscv]
cwebber has joined #riscv
mahmutov has quit [Ping timeout: 256 seconds]
vagrantc has quit [Quit: leaving]
freakazoid333 has joined #riscv
PyroPeter has quit [Ping timeout: 260 seconds]
jacklsw has joined #riscv
PyroPeter has joined #riscv
theruran has joined #riscv
BOKALDO has joined #riscv
shicz2 has quit [Ping timeout: 240 seconds]
shicz2 has joined #riscv
winterflaw has quit [Remote host closed the connection]
riff-IRC has quit [Remote host closed the connection]
riff-IRC has joined #riscv
djdelorie has joined #riscv
bauruine has joined #riscv
winterflaw has joined #riscv
<Sofia> Huh. Why does fmv.d.x have 6-cycle latency? fmv.x.d is as I expected with single-cycle latency. Is writing to a floating point register more expensive / also writing to a CSR or something? https://sifive.cdn.prismic.io/sifive/cde638d4-1346-4bc2-9724-17e6acf0edd0_u74mc_core_complex_manual_21G2.pdf
cousteau has joined #riscv
<Sofia> Likewise fmv.s.x has 2-cycle latency
<Sofia> Another core. single-precision moves are 1-cycle in both directions. The double precisions are N/A.. despite this being a 64-bit core? https://sifive.cdn.prismic.io/sifive/55ca6931-2d41-4603-82f2-5af618f86180_s76mc_core_complex_manual_21G2.pdf
<cousteau> where does it mention the latencies?
* cousteau just checked to see if the core did support the D extension (it does)
<Sofia> On both documents search for "FPU Instruction Latency"
<cousteau> yeah found the table
<cousteau> I was going to argue it may be because converting a 32-bit float into a 64-bit double and vice versa was problematic, but I see that the same N/A happens in either f32<->s64 and f64<->s64
<cousteau> not f32<->s64 and f64<->s32, which would have matched my hypothesis
<cousteau> oh wow, RV64GCB, nice!
<Sofia> I suspect this is pipeline implementation obscurities. If it writes to a floating point register, then schedule it with the rest of them. Maybe the integer write isn't in the FPU exec units?
<Sofia> cousteau: Don't get too excited, these cores were well before B's ratification. There are MANY cores claiming B and V with <1 versions.
<cousteau> ah, ouch...
<cousteau> was going to look up if it supported crypto stuff
<Sofia> So it might just contain all the instructions you care about.. and might have different opcodes.. but is sufficient to play with if you have the right toolchain.
<cousteau> 5.2.2 I-Type Integer Instructions [...] The immediate is always sign-extended to 32-bits before being used in an arithmetic operation.
<cousteau> this was either copy-pasted from a manual/standard targeting RV32 or a very weird implementation
<Sofia> I want to see the "core complex manuals" for the new P550 and P650, actually new enough to use the ratified specs.
<Sofia> At least the E24 implements a subset of atomics.
<cousteau> "new enough to use" != "likely to use" though...
<cousteau> MSVC is new enough to support C99 and well...
<Sofia> They claim B and V for the P550, which might even be too early. IIRC
<Sofia> No V for the P550 it seems, just the p270 which I think is too old for the ratified spec.
<cousteau> I guess p550 goes after p270, and they removed an extension... suspicious...
<cousteau> as versions progress, they improve them by removing the extensions they learn they don't support :)
<Sofia> More annoyingly the docs for any of these 3 are paywalled.
<Sofia> Or otherwise hidden in the devkits
<cousteau> (then again, maybe 2-70 means "version 2, powerfulness 70", so 6-50 is actually meant to be more modern but less powerful)
<Sofia> I think I've seen a sifive version number document....
<Sofia> P500 up to 4 cores. P600 up to 16 cores. https://www.sifive.com/cores/performance
<Sofia> P500 3-issue. P500 3-4 issue.
<cousteau> (it was embedded in the link you pasted)
<Sofia> Hmm
* Sofia nods
<Sofia> Nice catch
<Sofia> No ^F for that
<Sofia> Regarding copy-paste of these manuals, I've seen more obvious mistakes obviously due to copy-paste but cannot recall what they were.
<cousteau> re: "no ctrl-f", I suspect images are non-normative anyway
<Sofia> non-normative.. inconsistent?
<cousteau> as in, "it's just an example and not an official statement"
<cousteau> like, in the C standard, examples are informative only, not normative
<Sofia> Uhh.
<Sofia> Either way, B has been really stripped down that I'd trust its inclusion :P
<cousteau> on a completely unrelated note, and now that I'm reading about immediate instructions... I propose to refer to a 4096-byte block (the span of load/store instruction offsets) as "image"
<cousteau> because it's worth 1024 words
<Sofia> I'd rather more detailed images
<Sofia> Or 1024 meaningful words :D
<cousteau> and yeah, B was REALLY simplified; I remember reading this huge document with thousands of interesting features, and the bittersweet feeling when I found out they had plagiarized the "generalized bit reverse" instruction I had invented not long ago (only, they "plagiarized" it long before I came up with it, but still)
<Sofia> Wait what
* cousteau is still angry that they removed the grev instrction
<Sofia> Page 32
<Sofia> The FP register file is more embedded?
<cousteau> wait what?
<cousteau> yeah that's weird
<cousteau> it's just one cycle delayed though. Read happens one cycle later, writeback is delayed one cycle.
<cousteau> they painted it inside the pipeline but it should probably be considered outside of it
<cousteau> but I guess a rectangle was easier to draw than a Tetris Z-shape
* Sofia shrugs
<Sofia> Just a weird diagram really
<cousteau> indeed, but I'm used to that...
* cousteau still doesn't understand pipelines
<Sofia> :)
<cousteau> (processor pipelines, this is. I understand the concept of a pipeline, but not quite how they work in processors, in detail.)
* Sofia points at rocket chip
<cousteau> that has a 3- or 5-stage pipeline
<Sofia> Are Sifive's cores based on rocket chip?
<cousteau> Observe how I managed to produce a piece of factual information without really understanding its implications
<cousteau> I think they are
<cousteau> SiFive is largely based on rocket-chip, and in fact has contributed some parts for it
<Sofia> At least based on this document, they use Chisel but not exclusively. https://riscv.org/exchange/cores-socs/
<cousteau> like, if you generate a design using Chipyard, the FU540-C000 manual is priceless for documentation of stuff such as the DTIM or the UART peripheral
<Sofia> For the cores I linked docs for, those are all verilog apparently.
<cousteau> I managed to write a ZSBL in assembly that allowed the device to load software via XMODEM through the UART
<cousteau> Sofia: probably verilog blackboxes
<cousteau> within the Chisel
<Sofia> I mean it specifically states the sifive cores as verilog and not chisel, while others mention chisel.
<cousteau> are you familiar with Chipyard? I found it very interesting and helpful (specially the documentation)
<Sofia> Seen the name but lets go with no
<cousteau> chipyard is basically a collection of rocket-related projects (rocket-chip, rocket-core, boom, ariadne, and lots of peipherals such as hwacha and a sha3 thingy)
* Sofia nods
<cousteau> and a bunch of tools for generating a complete rocket- or boom-based SoC
<cousteau> and a nice documentation
<cousteau> I think everything the SiFive cores do can be implemented using Chipyard, and they probably designed the chip simply stitching together Chisel stuff in a similar way; only they don't provide the files publicly for replication
<cousteau> s/cores/processors
<Sofia> I really don't like how closed sifive is :/
<Sofia> Even if they do open some of it up eventually.
<cousteau> hm, why does the doc mention the FCVT.S.LU instruction? There's no LU in RV64, right? Or is that only for load/store instructions?
<cousteau> yeah, it's a pity that it's so closed, but they probably want to make some money off of it before releasing it to the world
<cousteau> re: s/d <-> l[u], maybe the reason is related to the fact that 64-bit integers have more precision than 53-bit precision floating point numbers can store
<cousteau> "If the rounded result is not representable in the destination format, it is clipped to the nearest value and the invalid flag is set." -- guess this explains the existence of LU
<Sofia> Anyway, I was looking at the fmv instruction because I want to use it to stash cold integers instead of resorting to the stack and we have so many floating point registers... and I'm already in the context of full-program global optimization.
<Sofia> So... I don't actually care about "using" the value within 6 cycles anyway, and the repeat rate permits my use. With the reverse direction being the desired 1-cycle. It suffices for my use case.
<Sofia> Just unexpected to see as high as 6 for a register move...
<Sofia> Unlike fcvt, fmv is bit preserving so I don't need to worry about it corrupting the values.
aerkiaga has joined #riscv
<cousteau> FMV might not just be a plain register move. Floats are likely not stored in IEEE754 format internally
<cousteau> I think they use this weird Berkeley format, which is some form of "verbose IEEE754 format"
<Sofia> My understanding is any arbitrary 32-bit integer moved into a single-precision floating point register and back will preserve its bits. Likewise for 64-bit integers into a double precision integer and back.
<cousteau> basically, instead of 52 explicit + 1 implicit mantisa bits, it uses 53 explicit mantisa bits, so it's possible that there are floats stored in a format that is not directly compatible with IEEE754 and need to be adjusted. Also it uses a different representation for NaNs and infinities, I think...
<cousteau> Sofia: yes, functionally you don't care how floats are internally stored and you can assume they are stored in IEEE754 format. But the implementation may differ.
<Sofia> > FMV.X.D and FMV.D.X do not modify the bits being transferred; in particular, the payloads of
<Sofia> non-canonical NaNs are preserved.
<Sofia> From those manuals
<Sofia> From Zfinz. > The Zfinx extension adds all of the instructions that the F extension adds, except for the transfer instructions FLW, FSW, FMV.W.X, FMV.X.W, C.FLW[SP], and C.FSW[SP].
<Sofia> From the F spec.
<Sofia> > Instructions are provided to move bit patterns between the floating-point and integer registers.
<cousteau> as far as I know, IEEE754 uses a notation like +1.0101000000000000000000 × 2^+00000101
<Sofia> FMV.X.W moves the single-precision value in floating-point register rs1 represented in IEEE 754-
<Sofia> 2008 encoding to the lower 32 bits of integer register rd. The bits are not modified in the transfer,
<Sofia> and in particular, the payloads of non-canonical NaNs are preserved. For RV64, the higher 32 bits
<Sofia> of the destination register are filled with copies of the floating-point number’s sign bit.
<Sofia> IIUC even if there is an encoding expansion internally, it does not corrupt the bits with any rounding or other behavior -- unless you invoke any floating point operations against it/
<Sofia> Even then, reading would be fine. It'd just produce garbage (maybe a NaN) and the result if written elsewhere, won't corrupt the register I care about.
<cousteau> yeah, I think Zfinx stores the numbers in IEEE754 format
<cousteau> but without Zfinx they might be stored in any format, such as that obscure Berkeley format I can't recall now
<Sofia> Even so, it should still preserve the bits.
<cousteau> sure, but conversion might not be direct
<Sofia> Critically we don't have any converstion. Only movement without mutation.
<Sofia> It'd need an arithmetic operation to apply rounding or any other communication between the bits.
<cousteau> for instance, converting from X (ieee754) to F (Berkeley) is trivial, since Berkeley is a superset of IEEE754, so you can convert directly (with some minor bit adjustments)
<cousteau> but the other way around, Berkeley to IEEE754, might require adjustment, since not all Berkeley representations are directly convertible to IEEE754 -- Berkeley is a superset as I said
<cousteau> (you said it was 1 cycle from X to F but 6 cycles from F to X, right? Or did I predict it the wrong way?)
<Sofia> 1 cycle from F to X. 6 cycle X to F.
<cousteau> ah, excellent, my brilliant reasoning had a 50% chance of being wrong and it was wrong :D
<Sofia> If X->F->X via FMV ever corrupts, then I cannot use F to stash X. If this is implementation dependent, then I can allow the optimization when I know the core behaves under the semantic I care about.
<cousteau> (how do I always end up putting the butter on the wrong side of the toast?)
<Sofia> This asymmetry is why I don't care about the 6-cycle latency. I won't be stashing registers if I want their value so soon.
<cousteau> my understanding is that X->F->X should be bit-preserving (as long as X is initially a valid representation)
<cousteau> (...then again I think all representations are valid, even if possibly a quiet/signaling NaN)
<Sofia> cousteau: Just butter a hot plate and toast the bread on the butter-coated plate.
<cousteau> I'm now intrigued as to why it takes 6 cycles to go from X to F, but it is so fast from F to X
<Sofia> This is why I was looking at the floating point writeback.
<Sofia> As that isn't quite the same as integer write back.
<Sofia> Either way, that 6-cycle latency doesn't make sense to me at all for a mere register move (and perhaps reformatting while preserving the bits)
<cousteau> yeah, far as I know, converting from IEEE754 to the format used by Rocket internally is pretty much a direct bit mapping, and the other way around is an adjustment (normalization) followed by a bit mapping
<Sofia> cousteau: So that corrupts the X result?
<Sofia> cousteau: Do you have a core you can test it against?
<cousteau> basically you have to convert something like 0.0010101 × 2^0001000 to 1.0101000 × 2^0000101 (ensure that it starts with "1.")
<cousteau> Sofia: no no, X->F->X should round-trip, at least if that X came from a valid F
<Sofia> The spec says at least NaN preserving, so doesn't this imply all integers are valid?
<cousteau> if all NaN, quiet or signaling, are preserved, then yes
<cousteau> and if +0 and -0 are honored
<Sofia> Which the spec requires
<Sofia> No?
<cousteau> I'd have to look it up but I guess it does
<Sofia> I pasted the relevant spec above ^
<cousteau> (If you say so)
<Sofia> "From the F spec"
<cousteau> yeah, I would think that you can write an arbitrary 64-bit X to F and back to X and it would be guaranteed to round-trip
<Sofia> My understanding of Zfinx is that flw, fsw, fmv are all obsolete because the ordinary X load/store suffice and the movement is trivially "done" already.
<Sofia> IIUC Zfinx doesn't prevent the alternative format you've described. Any additional bits would just be hidden from the integer land.
<Sofia> cousteau: I guess the alternative format promotes more optimizations? Either way, the spec defines the semantics for all the instructions so everything observable should be identical.
<Sofia> Unless you activate a CSR flag or something to allow divergent results.
<Sofia> I sure hope there aren't "undefined behavior" in the RISC-V spec.. <_<
<Sofia> aren't/isn't
<cousteau> the additional bits wouldn't be allowed in Zfinx, or if they are, every time you make an integer operation on a register containing a non-canonical float it would need to be converted on the fly
<cousteau> well, there's unspecified behavior
<Sofia> Underspecified..?
<cousteau> The architecture fully describes what implementations must do and any constraints on what they
<cousteau> unspecified is explicitly used.
<cousteau> may do. In cases where the architecture intentionally does not constrain implementations, the term
<cousteau> ("UNSPECIFIED" written in small caps)
<cousteau> (section 1.7 of the vol.I)
<cousteau> The behavior upon decoding a reserved instruction is UNSPECIFIED.
<cousteau> ...and that's the only use of UNSPECIFIED in small caps in the whole doc. Maybe Vol.II has more.
PyroPeter has left #riscv [WeeChat 3.1]
<Sofia> Oh
<Sofia> There are more such instances of unspecified.
<Sofia> I approve of the sparse use of the term. :D
Sofia has quit [Remote host closed the connection]
Sofia has joined #riscv
jjido has joined #riscv
Andre_H has joined #riscv
<cousteau> Sofia: I only found 9 instances of the word "unspecified" in the document as of v20191213: 6 in the definition of the term, 1 in the paragraph I quoted (re: reserved instructions), and 2 related to "returning codes for unspecified errors" (but those 2 instances of "unspecified" are in lowercase)
<cousteau> (of those 6, 1 is in the subsection title, 1 in the table of contents, and 4 in the actual body)
<cousteau> maybe newer versions of the standard have more instances of "UNSPECIFIED"
<cousteau> ah, and Vol.II has 10 more :)
<Sofia> I have the latest versions of both volumes as of the latest "release" 11 days ago.
bauruine has quit [Remote host closed the connection]
<cousteau> I was reading the "stable" version as per the riscv.org site, not the draft ones
<Sofia> cousteau: Yeah, I'm kinda waiting eagerly for the ratified merges from https://wiki.riscv.org/display/TECH/Recently+Ratified+Extensions
cwebber has quit [Remote host closed the connection]
zjason` is now known as zjason
AEtherC0r3 has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
AEtherC0r3 has joined #riscv
aerkiaga has quit [Remote host closed the connection]
BOKALDO has quit [Quit: Leaving]
perro has joined #riscv
bauruine has joined #riscv
freakazoid333 has quit [Read error: Connection reset by peer]
freakazoid333 has joined #riscv
freakazoid343 has joined #riscv
freakazoid333 has quit [Ping timeout: 250 seconds]
gioyik has joined #riscv
mahmutov has joined #riscv
BOKALDO has joined #riscv
freakazoid12345 has joined #riscv
freakazoid343 has quit [Ping timeout: 260 seconds]
freakazoid343 has joined #riscv
freakazoid12345 has quit [Ping timeout: 250 seconds]
riff-IRC has quit [Read error: Connection reset by peer]
riff-IRC has joined #riscv
vagrantc has joined #riscv
jacklsw has quit [Read error: Connection reset by peer]
<jimwilson> Sofia, the sifive essential 3 and 5 cores were based on rocket-chip, but were dropped when the essential 6 core was announced, no current sifive product is based on rocket-chip
<jimwilson> p2 and p5 are ip-cores, updated every 3 months, they will be updated to standard versions of B, V, etc before anyone builds a soc from them
<cousteau> ah, so I guess FU540 was rocket-based
<jimwilson> p2 and p5 are different cores with different pipelines, anything designed for p2 will not magically work for p5, so V was not dropped from when moving from p2 to p5, V hasn't been implemented for p5 yet
<cousteau> I liked my theory more
<jimwilson> yes fu540 is based on rocket ship
<jimwilson> in the p and x series, the second number is log2 (-1?) the alu size, so anything larger than 5 has V support, that includes p270 and x280
<cousteau> log2 size makes sense
<cousteau> and yeah, as I suspected, letter is family, first number is version (within the family), second is size/capabilities, third is probably minor version (so e.g. p271 would be an improved/newer p270)
winterflaw has quit [Remote host closed the connection]
<jimwilson> p2 is same core as u7, p5 is what we called u8 a few years ago but there is no u8 anymore, p2 and x2 are same core
winterflaw has joined #riscv
<jimwilson> the manuals are built by including a lot of small files into one large manual, so we can share stuff between manuals, I guess there is one spot for immediate extensions that is correct for rv32 but not rv64
freakazoid12345 has joined #riscv
bauruine has quit [Ping timeout: 268 seconds]
freakazoid343 has quit [Ping timeout: 252 seconds]
bauruine has joined #riscv
freakazoid343 has joined #riscv
<cousteau> y'all need to use XLEN more often :)
<cousteau> (or as a variable)
freakazoid12345 has quit [Ping timeout: 260 seconds]
gioyik has quit [Ping timeout: 276 seconds]
gioyik has joined #riscv
TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM_ has joined #riscv
gioyik has quit [Ping timeout: 276 seconds]
freakazoid12345 has joined #riscv
gioyik has joined #riscv
freakazoid343 has quit [Ping timeout: 250 seconds]
gioyik has quit [Ping timeout: 276 seconds]
bauruine has quit [Ping timeout: 260 seconds]
bauruine has joined #riscv
freakazoid343 has joined #riscv
BOKALDO has quit [Quit: Leaving]
gioyik has joined #riscv
freakazoid12345 has quit [Ping timeout: 240 seconds]
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
tgamblin has quit [Ping timeout: 240 seconds]
freakazoid12345 has joined #riscv
tgamblin has joined #riscv
freakazoid343 has quit [Ping timeout: 240 seconds]
gioyik has quit [Ping timeout: 276 seconds]
gioyik has joined #riscv
<jimwilson> I think the 6-cycles for fmv.x.d is a typo, this doesn't match gcc, maybe copied from fcvt table?, and the N/A in the s76 docs for fcvt.l.d is more rv32/rv64 sharing problems
<cousteau> that'd make sense
freakazoid12345 has quit [Read error: Connection reset by peer]
freakazoid12345 has joined #riscv
Andre_H has quit [Quit: Leaving.]
ahs3 has quit [Ping timeout: 256 seconds]
raghavgururajan has joined #riscv
freakazoid12345 has quit [Read error: Connection reset by peer]
freakazoid12345 has joined #riscv
ahs3 has joined #riscv
mahmutov has quit [Ping timeout: 260 seconds]
freakazoid343 has joined #riscv
freakazoid12345 has quit [Ping timeout: 260 seconds]
jwillikers has joined #riscv
jjido has joined #riscv
freakazoid12345 has joined #riscv
freakazoid343 has quit [Ping timeout: 240 seconds]
jwillikers has quit [Ping timeout: 252 seconds]
EchelonX has joined #riscv
jwillikers has joined #riscv
bauruine has quit [Remote host closed the connection]
freakazoid343 has joined #riscv
freakazoid343 has quit [Read error: Connection reset by peer]
freakazoid343 has joined #riscv
freakazoid12345 has quit [Ping timeout: 240 seconds]
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
<drmpeg> jimwilson: No longer at SiFive?
motherfsck has quit [Ping timeout: 252 seconds]
freakazoid12345 has joined #riscv
freakazoid343 has quit [Ping timeout: 240 seconds]
freakazoid343 has joined #riscv