<widlarizerEmilJT>
<mei[m]> "there's also JP2GMD, which is..." <- Sorry to be only contributing to off topic rn. I'm monitoring this chat and plan on playing with the code base soon. Just wanted to add there's (or was?) also six stars, space, three stars for "jebac PiS"
<mei[m]>
<widlarizerEmilJT> "Sorry to be only contributing to..." <- not related to the pope in any way though, is it?
<Wanda[cis]>
less relevant now that these assholes aren't in power, but yeah
<whitequark[cis]>
<mei[m]> "Catherine: to expand on this, it..." <- the intended invariant is "for every input there is always a case that matches", yes
<_whitenotifier-4>
[prjunnamed/prjunnamed] meithecatte 7ef998c - decision: Add more comments, 2
<_whitenotifier-4>
[prjunnamed/prjunnamed] meithecatte 7434c43 - decision: Use Rc::ptr_eq (NFC)
<whitequark[cis]>
<mei[m]> "oh and this comment doesn't..." <- uh, yes, you're right. i don't recall offhand exactly what was the issue there but it's worth looking at the commit that originally introduced the comment because it was a very specific inefficiency
<whitequark[cis]>
i think it was "unreachable in one of the branches of the decision tree" or something like that, but got decontextualized while i was writing it'
<mei[m]>
hm, I think we need the invariant to be "there's a row of all X's" for it to stay valid after assume
<mei[m]>
wait, no, brain is dumb and stupid today
<whitequark[cis]>
i think the row of all X's is only ever removed if the match is provably full
<whitequark[cis]>
at which point it should be fine?
<mei[m]>
whitequark[cis]: you don't seem to ever even try to detect that
<whitequark[cis]>
it does happen to get detected if you have only one column
<whitequark[cis]>
and you eventually always have only one column due to how it's recursively examined
<whitequark[cis]>
since every decision tree branch eventually has two options it is a big deal to be able to have that since otherwise it blows up the size by a factor of 2 (if you do it naively)
<mei[m]>
whitequark[cis]: huh? which part does that?
<whitequark[cis]>
does normalize not do it any more?
<whitequark[cis]>
oh, you're right actually
<whitequark[cis]>
a previous version of the code did it but i removed it in favor of just using normalize since it was duplicating the whole logic
<whitequark[cis]>
hm. i wonder if you can construct a testcase for this that blows up in the SMT verifier
<mei[m]>
hm, I actually think that "there's a matching pattern for all outputs" is only possible to satisfy with a fallback case at the very end, due to the possibility of having X as input
<whitequark[cis]>
yes
<mei[m]>
i don't think you're using the incorrect formulation of the invariant anywhere
<whitequark[cis]>
iirc the way X-prop works for match cells is basically just whatever lets me kill as many rows and columns as possible, but I'm not sure it's actually optimal
<mei[m]>
i did reverse it from the smt code at one point further up in the chat
<mei[m]>
...i should probably write it down somewhere more permanen
<mei[m]>
s/permanen/permanent/
<whitequark[cis]>
i think the way it was in my head is uh
<whitequark[cis]>
"an X in the input never matches either a 0 or 1 in the mask because this way we can kill as many branches of the decision tree as possible, but it inevitably has to match X in the mask"
<whitequark[cis]>
and the rest of the SMT rules are just something i wrote because it seemed like it makes sense and wouldn't be incorrect
<whitequark[cis]>
(Wanda told me to express it as a combination of logic cell semantics and I did just that, there was not much more thought put into it)
<whitequark[cis]>
so I think treating the SMT rules as a source of truth may not be ideal
<whitequark[cis]>
<whitequark[cis]> "uh, yes, you're right. i don't..." <- so looking at this part again
<whitequark[cis]>
column 1 is being chosen reducing the match matrix to 0/1/X, and then nothing ever actually removes the X
<whitequark[cis]>
not sure about the "two rows" part
<whitequark[cis]>
yeah this is incredibly confusing to me too
<whitequark[cis]>
but there should be a testcase added with the condition i think/
<whitequark[cis]>
i think the comment is just wrong
<whitequark[cis]>
mei: oh, can you ask you to do something while you're at it? because the decision pass was written across the point in time where we changed the bit order of constants, it has the confusing `h.pat` helper with the wrong bit order
<whitequark[cis]>
it should be replaced with Const::lit with the argument being a string in the reverse order
<_whitenotifier-4>
[prjunnamed/prjunnamed] meithecatte 1cb87e4 - decision: Add explanation to other tests without assertions
<whitequark[cis]>
i would strongly prefer commits like 35d922c to be folded into commits like 78e43d9
<whitequark[cis]>
basically, i want the history to be readable and not include too much noise; this greatly helps with investigations of the "why the fuck is it even like this" years later, of which i do lots in all sorts of codebases
<whitequark[cis]>
i do not have a strong opinion on whether semi-unrelated changes like 1cb87e4 should or should not be folded into others; i personally often do that but it's not a big deal either way
<mei[m]>
whitequark[cis]: (it would've been a drive-by fix on the other instance of that comment but I do agree that that would probably still be a bit better)
<whitequark[cis]>
whitequark[cis]: (i do it for the same reasoning as previous; it keeps the history easier to read, in my view. whether this is actually the case is arguable so i just leave it to the committer's taste)
<whitequark[cis]>
mei[m]: yeah i'm fine with those
<whitequark[cis]>
some of your documentation fixes include a bunch of disparate drive-by changes that are roughly grouped under the same umbrella and i'm happy with that
<whitequark[cis]>
as long as it reasonably resembles a single unit someone would want to think about i think it's good (it will not surprise you that i do not adhere to a lot of rigid rules here either)
<whitequark[cis]>
currently fighting with github actions to get langref published
<mei[m]>
what do you think about starting a benchmark collection? I'd do it but I have no familiarity with how to actually get other tools to emit UIR
<mei[m]>
(I have some ideas on how to emit better decision trees that I'd like to try out)
<whitequark[cis]>
in amaranth you do it by from amaranth.back import unnamed; unnamed.convert(component)
<whitequark[cis]>
(while using our fork)
<whitequark[cis]>
i think for benchmarking decision trees specifically it might be useful to construct match matrices directly, perhaps? that seems like it would involve the fewest steps between the testcase and the code that processes it
<whitequark[cis]>
there is undeniable utility in starting from the frontend for more complex cases but i think taking out the text parsing and the frontend's decisions (if applicable) is actually a benefit here, not a drawback
<whitequark[cis]>
(match matrices or match cells, at your discretion, whichever make more sense for the use case)
<mei[m]>
i mean, yeah, but this is not "i know a specific case that would work better", but "i wonder how this heuristic would fare on logic that occurs in the wild"
<whitequark[cis]>
hm. the way i usually do it is i look at logic that occurs in the wild, search for inefficiencies, extract the cases and then improve them one by one, but i can also think of the utility of using a bunch of different cases (let's say different levels of utilization of input space; 30% 50% 70% 90% 100%) with various assign chains to see the dynamics of a heuristic across these points
<whitequark[cis]>
it's true that fully synthetic examples fail to exercise the heuristics in interesting ways, but i'm also not sure how useful it is to use aggregate statistics over a bunch of (inevitably biased) examples in a benchmark set
<whitequark[cis]>
(it's good for writing papers for sure, but beyond that?)
<whitequark[cis]>
or, let me rephrase that
<whitequark[cis]>
i think it would be useful to have a repository of real UIR we can test our flow on to see if there are any major improvements or regressions but as a separate (git) repository that would be an aid to the developer rather than integrated into the workflow the way our existing testcases are
<whitequark[cis]>
if that is what you're after the answer is yes, we should just have it
<mei[m]>
yes, that's what I have in mind
<whitequark[cis]>
oh okay, i misunderstood at first then
<whitequark[cis]>
Wanda: should we kill the legacy branches? they could be turned into tags if needed, but the github PRs are going to preserve the diff for a dead branch regardless
<whitequark[cis]>
(either way I think the value at this point is very limited)
<mei[m]>
<whitequark[cis]> "can you do git tag in it?" <- nope. I cloned the repo from prjunnamed/amaranth, looks like that fork doesn't have any tags?
<whitequark[cis]>
oh. yeah. lemme fix that
<whitequark[cis]>
why the hell does github not move tags over on fork...
<mei[m]>
thoughts on generating the API docs with --document-private-items?
<whitequark[cis]>
unsure. i think the API docs are already plenty confusing
<Wanda[cis]>
not really sure
<Wanda[cis]>
we don't really have "public" api in the first place
<Wanda[cis]>
will we?
<whitequark[cis]>
we kind of do though
<whitequark[cis]>
in that e.g. the internals of the netlist crate are actively concealing themselves so that they would not be relied on too much
<whitequark[cis]>
i think it's fine to introduce the barrier of "you have to open the source code" to see private items because they're... private
<whitequark[cis]>
that or lots of #[doc(hidden)]
<Wanda[cis]>
good point
<mei[m]>
otoh currently we have top-level comments on files whose contents are only public via reexport, so the docs aren't visible even though they are relevant
<whitequark[cis]>
well you made them top-level doc comments
<whitequark[cis]>
i was confused why, given the obvious visibility issues
<whitequark[cis]>
i think it's perfectly fine to have a long comment that isn't a doc-comment nor it is ever rendered in cargo doc
<whitequark[cis]>
generally speaking i do not believe in the "shove all of the docstrings into an html file in a random ass order" school of writing documentation because it sucks
<whitequark[cis]>
i think documentation needs to be curated, so that whatever surfaces in the html is something that is intentionally placed there because it has meaning within the surrounding context
<whitequark[cis]>
(this is how amaranth docs are written)
<whitequark[cis]>
rustdoc doesn't really make that possible but it needs to be approximated at least
<whitequark[cis]>
I think we should do s/A synchronous/Synchronous/g
<whitequark[cis]>
grammar be damned
<Wanda[cis]>
sorry
<Wanda[cis]>
mm
<Wanda[cis]>
yes
<Wanda[cis]>
or pick other terms?
<Wanda[cis]>
combinational? registered>
<whitequark[cis]>
combinational/registered seem ok to me
<whitequark[cis]>
hi Staf!
Chips4MakersakaS has joined #prjunnamed
<Chips4MakersakaS>
If I try to add catircservices.org to the list of servers in element it says there is no access to the public room list. Is this on purpose ?
<whitequark[cis]>
I have no idea what adding it to the list of servers in Element (is this Element X?) would achieve, or where is that UI element even is
<whitequark[cis]>
however I can say that catircservices.org is, well, that: my IRC services. it's not a general purpose Matrix server
<Wanda[cis]>
should it be codepoints? utf-8 bytes?
<Wanda[cis]>
... utf-16 units?
<whitequark[cis]>
very specifically codepoints
<whitequark[cis]>
absolutely not bytes
<whitequark[cis]>
almost no matter what character set you're using, you can do a (load-bearing mostly) unique and reversible transformation to Unicode
<Wanda[cis]>
feels like it needs rationale
<whitequark[cis]>
yes, there are cases which are fucked, but given that we aren't going to bundle ICU with prjunnamed you're screwed anyway
<whitequark[cis]>
Wanda[cis]: "it's unambiguous, fast in common case of UTF-8, and accommodating unusual cases if needed"
<whitequark[cis]>
which is like the default rationale for anything so I didn't put it into the text
<whitequark[cis]>
absolutely nobody is interested in counting UTF-16 code units except for JavaScript developers who will then put a single emoji in and it will break everything
<whitequark[cis]>
and doing it when all you have is UTF-8 involves an unacceptable amount of useless ceremony
<galibert[m]>
UTF-16 is really, really annoying
<whitequark[cis]>
UTF-8 bytes is basically as good as UTF-8 code points, except it completely breaks the ability of e.g. VS Code to open a link with a column number in it from the terminal or something (or other similar uses) when non-latin characters are used
<whitequark[cis]>
* code points in terms of ambiguity, except
<whitequark[cis]>
so it's strictly inferior
<whitequark[cis]>
I don
<whitequark[cis]>
* I don't actually know how VS Code or other editors count columns when astral plane characters are involved because nobody documents this shit, but nothing I know counts bytes
<jix>
I've always assumed that uses utf-16 units as column numbers mostly does so by accident and now it's too late to change it rather than by design
<whitequark[cis]>
yeah...
<galibert[m]>
CJK tends to be double width, which adds fun to the fun
<galibert[m]>
when fixed width that is, of course
<whitequark[cis]>
also, nothing I know uses extended grapheme clusters
<whitequark[cis]>
I believe that the only reasonable interpretation of column numbers in a world with combining and double-width characters is to let most tools count code points and then let whatever is doing the presentation (diagnostics system or code editor) convert it into terminal columns or editor ranges or whatever
<galibert[m]>
agreed
<galibert[m]>
I count code points too
<galibert[m]>
bytes is too iffy
<whitequark[cis]>
the VS Code API docs feature something that is worse than useless:
<whitequark[cis]>
galibert[m]: bytes is like *fine* (it's convenient for certain applications and with UTF-8 you can easily find out whether you're in the middle of a code point or not) but it just doesn't work well with the ad-hoc "yeah just add 1 to this and print it to the terminal" use of source locations
<galibert[m]>
I'd bet it's a utf-16 character count, forgetting surrogates exist
<jix>
(I'm assuming that happend and that this means vscode can work with any of those if you tell it what you're using, I might be wrong with both here)
<galibert[m]>
microsoft got unlucky on that one, they did their new, unicode-aware api while unicode was created... and then people realized soon after that 65536 weren't going to cut it
<whitequark[cis]>
jix: I think that only applies to language servers, I've no idea what extensions use but it doesn't get negotiated like that
<jix>
so the previous version of the LSP spec does use utf-16 units only, so I at least guessed that right
<whitequark[cis]>
mei: any ideas about improving source location transformation for `match` cells?
<whitequark[cis]>
right now they're technically accurate but not very useful
<whitequark[cis]>
i think maybe it's worth transferring the source metadata for the decision tree branches that actually influence a specific output bit, or something like that, but no idea how useful this is
<_whitenotifier-4>
[prjunnamed/amaranth] whitequark a0a5cab - back.unnamed: fix syntax of `match` cell with empty value.
<mei[m]>
<whitequark[cis]> "it's fascinating to me just..." <- oh! so it's like that for you two! from what i saw from the quality of documentation in your projects I just assumed that you find it much easier than I do
<mei[m]>
s/two/too/
<mei[m]>
<whitequark[cis]> "mei: any ideas about improving..." <- what's the current granularity of the input? one srcloc per cell?