<sorear>
RV32 capabilities aligned mod 4 with a length between 256 and 508 can be encoded in two redundant ways (T8=1 EF=1 or EF=0 E=0)?
mlw has quit [Ping timeout: 252 seconds]
mlw has joined #riscv
<jrtc27>
if EF=1 then E includes T8?
<jrtc27>
need to think some more tomorrow
<jrtc27>
yeah ok, right
<jrtc27>
T[EW / 2 - 1:0] = TE
<jrtc27>
B[EW / 2 - 1:0] = BE
<jrtc27>
if you try and make TE and BE big enough to make E=0, then the low bits of at least one of T and B are non-zero
<jrtc27>
if you try and make TE and BE 0 to make it match the EF=0 case you're trying to redundantly encode, then E will be > 0 and your LMSB gets shifted up
<jrtc27>
once this is in Sail (I don't know why it isn't to be honest, it's not that hard...) it can easily be model checked to ensure there aren't two non-malformed capabilities that mean the same thing
<sorear>
I'm adding a note to my feedback to treat E=0 EF=0 XLEN=32 as malformed bounds, E=-1 EF=0 already is and I'm just moving the compare by one
mlw has quit [Ping timeout: 264 seconds]
<sorear>
(I've finally figured out what the encoding and representability is supposed to be, it's written far more complicated than it is)
<jrtc27>
E=0 IE=1 in normal CHERI Concentrate is a normal thing to have
mlw has joined #riscv
<jrtc27>
not sure if the T8 situation changes that
<jrtc27>
but my instinct is it shouldnt'
<sorear>
T8 doubles the "subnormal" range from [0,255) to [0,511) completely covering and AFAICT eliminating the need for E=0 normals which are [256,511)
* muurkha
is in the subnormal range
<jrtc27>
oh hm this is the EF=1 case which means E is always 0
<jrtc27>
maybe you're right then
heat_ has quit [Remote host closed the connection]
<sorear>
hypothesis: the post-Moore chasing of basis points on cost efficiency and energy efficiency will eventually lead to computer systems where a majority of memory is undervolted to the point of being noticeably unreliable; software will need to adapt to this situation and not store capabilities or pointers in unreliable memory
heat_ has joined #riscv
<muurkha>
sorear: that's an interesting idea, but wouldn't it be more sensible to use ECC?
<sorear>
operationally: "cacheable but not tagged" is a PMA combination that is meaningful and likely to become and remain common, not just in weird transitional and CXL setups
<jrtc27>
that would be a fundamental paradigm shift
<muurkha>
it's already the case that a majority of memory is noticeably unreliable if you overheat it or rowhammer it
<muurkha>
and sandbox escapes with heat lamps and hair dryers have been demonstrated
<jrtc27>
if you have memory that gets corrupted all bets are off today, you don't need capabilities for that
<sorear>
normal ECC (36/72-bit hamming code) isn't strong enough to be useful under adversarial conditions and the codes that are are noticably expensive
<muurkha>
(in the current context of sandboxes mostly being used against users, this is fortunate for, though I know jrtc27 really doesn't like me talking about this, fundamental human rights)
<sorear>
it's not clear whether the physical access game is a win for the attacker or the defender. biology has been playing it possibly since before there were cells with no clear consensus
<muurkha>
agreed
<muurkha>
hmm, I realize I don't actually know how cheap you can make stronger ECC. but I think it's plausibly easier if you can slap the ECC on top of some kind of larger block transfer, like a 16-byte cache line or 2048-byte page, which is what NAND Flash chips routinely do already
<sorear>
it's mainly a function of block size and latency
<muurkha>
but how much latency does hardware Reed-Solomon decoding necessarily impose on a memory read?
<muurkha>
I mean, plausibly you don't want to do something like Gallager codes which can take a variable amount of time to decode
<sorear>
remember that most reads are of recently written data (compared to a NAND chip that's been sitting unpowered on a shelf for a year, anyway) with no bitflips, so you mostly only need to _optimise_ the zero bit error case
<sorear>
all widely used binary codes are linear, so you can check if the codeword is valid with a limited number of XOR gates
<muurkha>
I wonder if you could do the ECC during DRAM refresh
mlw has quit [Ping timeout: 256 seconds]
<sorear>
LDPC is problematic for a security feature because there's no rigorous theory, there's extensive experimental data but that can't observe events with probability below 2^-64 or so
<sorear>
periodic scrubbing is a normal feature of ECC systems. combining it with LPDDR won't work because your codewords are spread out over multiple chips, and refresh cycles take place inside each chip with the pin drivers off to conserve power, I think modern DDR works the same way
<sorear>
I vaguely recall GDDR storing cache lines in a single chip each, but that's a very different latency/throughput/power tradeoff
<muurkha>
DRAM has to "scrub" through all its pages every few milliseconds to refresh it by delivering it from the array to its internal sense amplifiers
<muurkha>
which re-up the charge on the capacitors
<muurkha>
in order to remain reliable at the rated temperature range (though as the Coldboot attack showed, they remain disturbingly reliable for orders of magnitude longer than that at low temperatures)
<sorear>
many chips these days have internal ECC to optimise the retention/BER tradeoff, and you can scrub that, but "ECC" normally refers to end-to-end ECC which needs to be split between chips so that a dead chip doesn't cause data loss
<muurkha>
in the case of this chip, if I'm reading this right, the rated refresh interval is 7.8 milliseconds
<muurkha>
at 0.75 nanoseconds per clock cycle, if you were doing a refresh every clock cycle, you'd refresh the whole chip in 390μs, which is a lot less than 7.8 milliseconds
<muurkha>
so I think you could quite plausibly design in an ECC circuit which detects bit errors during refresh and rewrites the corrected data to the page
<muurkha>
and it could operate over an entire two-kilobyte page
<muurkha>
you wouldn't want to do this off-chip because it would require wiring the 16384 outputs of the sense amplifier lines off-chip to the ECC circuitry
<muurkha>
does that make sense? I think I'll stick this in pavnotes2. how should I call you there, sorear?
<sorear>
you are literally describing how DDR5 works, I don't know how it's exposed in the spec but "in-chip ECC, not the same as ECC on the wire" is a builtin feature
<muurkha>
wow, I had no idea, thanks for telling me :)
<muurkha>
I guess I won't write it in pavnotes2 anyway
<sorear>
(sorear) sure
<muurkha>
*in that case
<muurkha>
but in that case wouldn't we not have to worry about making memory noticeably unreliable because of undervolting?
<muurkha>
we can always just use more ECC
<sorear>
most bus protocols these days support critical-word-first - if I read miss word 5 of an 8-word cache line, the data comes back from RAM in order 5,6,7,0,1,2,3,4 or 5,4,7,6,1,0,3,2 and stays in that order all the way to the load/store unit, saving a few cycles
<muurkha>
nice
<muurkha>
I should get to work on some actual hardware rather than fantasy inaccessible hardware (I'm a long way from being able to fab DRAM)
<sorear>
if you're doing ECC or MAC over the entire cache line, or longer, you need to either force the entire line to wait in the memory controller, or have some mechanism to tell the core "that word I gave you ten cycles ago wasn't actually valid, please pipeline replay" which is not a feature that exists in AMBA/ACE or TileLink, although IF/HT might have it
<muurkha>
oh, I wasn't suggesting doing it at read time
<muurkha>
I was suggesting doing it at refresh time
<muurkha>
every 7.8 milliseconds or whatever
heat_ has quit [Remote host closed the connection]
mlw has joined #riscv
heat_ has joined #riscv
<muurkha>
it's still possible for a bit to flip undetectedly in the 7.8 milliseconds since the last refresh, but it would almost surely have to be a single bit, no? which the 72-bit Hamming code will have no trouble correcting, once Intel stops using that as price discrimination
<muurkha>
(and even if Intel does, you can still do it in the RAM chip)
heat_ has quit [Ping timeout: 264 seconds]
BootLayer has quit [Quit: Leaving]
alexghiti has joined #riscv
notgull has quit [Ping timeout: 264 seconds]
<sorear>
jrtc27: i'm going to finish reading the draft, do an edit pass, remove things that are already reported and make one or more issues tomorrow or friday but if you want an early look https://gist.github.com/sorear/f248aef96641a010c5d2eee848e600e9
<jrtc27>
on "mepcc need never hold a sealed capability." specifically: yes it absolutely does; morello screwed this up and didn't allow it, which means cheribsd has some gross workarounds to emulate unsealing sentries in celr_el1
<muurkha>
oops
<jrtc27>
there are various cases where privileged software gets a function pointer from userspace, which is a sentry
<jrtc27>
thread creation, set_context and signal handlers all need to mess with that on morello
<jrtc27>
(specifically called out in 3.9 Sealed Entry Capabilities of ISAv9 because Arm screwed this up / based Morello on an earlier CHERI-MIPS that also made this mistake before we realised and then fixed it)
<jrtc27>
and re zcheri_legacy, no, ddc is handled like any other user-accessible register that affects S-mode in S-mode's trap handler
<jrtc27>
it just saves it and switches context as needed
<jrtc27>
(in purecap)
<jrtc27>
since the S-mode OS is capability-aware
<jrtc27>
(and if it's not then it shouldn't have enabled capability use even for itself in the first place, so there is no CHERI)
<jrtc27>
other things I agree with, need more time to think about or disagree with but they need too long a response than can be given here and now
Kyuvi has joined #riscv
zBeeble42 has joined #riscv
zBeeble has quit [Ping timeout: 240 seconds]
Kyuvi has quit [Ping timeout: 250 seconds]
MaxGanzII_ has joined #riscv
markh has quit [Remote host closed the connection]
shamoe has quit [Quit: Connection closed for inactivity]
ZipCPU has quit [Ping timeout: 260 seconds]
ZipCPU has joined #riscv
Stat_headcrabed has joined #riscv
Stat_headcrabed has quit [Client Quit]
Stat_headcrabed has joined #riscv
davidlt has joined #riscv
heat_ has joined #riscv
davidlt has quit [Remote host closed the connection]
davidlt has joined #riscv
ldevulder has joined #riscv
jobol has joined #riscv
Stat_headcrabed has quit [Ping timeout: 246 seconds]
markh has joined #riscv
danilogondolfo has joined #riscv
Stat_headcrabed has joined #riscv
Stat_headcrabed has quit [Client Quit]
jacklsw has quit [Ping timeout: 264 seconds]
Andre_Z has joined #riscv
davidlt has quit [Ping timeout: 264 seconds]
pecastro has joined #riscv
Andre_Z has quit [Quit: Leaving.]
<sorear>
jrtc27: (sentries) i did not realize that sentries were the intended ABI for function pointers, partly since I'm intentionally reviewing this mostly without reference to ISAv9, partly since cjalr accepts both, partly since there's no CADDISEAL but materializing a function pointer is rare enough that it doesn't matter if it takes three instructions. OK.
crossdev has joined #riscv
<sorear>
jrtc27: (zcheri_legacy) ah, but our caps-naive S-mode OS _didn't_ enable capability use for itself, it's running with menvcfg.CME=0 ... which isn't enough to prevent U-mode access to ddc, only changing UXL does that
davidlt has joined #riscv
<sorear>
challenge: what's the minimum number of representability checkers you need to add to a simple pipelined Zcheri+MSU, other than the obvious one in the execution stage to handle CINCOFFSET, load/store offsets, etc. depressingly high
czy has quit [Remote host closed the connection]
czy has joined #riscv
notgull has joined #riscv
ezulian has quit [Quit: ezulian]
ezulian has joined #riscv
psydroid has joined #riscv
MaxGanzII_ has quit [Ping timeout: 240 seconds]
alexghiti has quit [Ping timeout: 256 seconds]
notgull has quit [Ping timeout: 264 seconds]
anonpreet has joined #riscv
anonpreet has quit [Remote host closed the connection]
anonpreet has joined #riscv
KREYREN__ has quit [Remote host closed the connection]
KREYREN__ has joined #riscv
Stat_headcrabed has joined #riscv
ntwk has joined #riscv
Stat_headcrabed has quit [Client Quit]
Stat_headcrabed has joined #riscv
Stat_headcrabed has quit [Client Quit]
Stat_headcrabed has joined #riscv
Stat_headcrabed has quit [Client Quit]
anonpreet has quit [Remote host closed the connection]
maxinux has quit [Quit: Brb]
jmdaemon has quit [Ping timeout: 256 seconds]
shamoe has joined #riscv
<sorear>
review complete, gist updated, filing issues now
MaxGanzII_ has joined #riscv
MaxGanzII_ has quit [Remote host closed the connection]
MaxGanzII_ has joined #riscv
MaxGanzII_ has quit [Remote host closed the connection]
hightower2 has quit [Ping timeout: 276 seconds]
ntwk has quit [Read error: Connection reset by peer]