<discocaml>
<cod1r> i mean it's probably just me having a skill issue but i wish there was some kind of default that was better
user_ has quit [Remote host closed the connection]
<discocaml>
<cod1r> like if it can infer the type based on usage and things being passed into the function surely better things can happen but idk. i've never written an optimizer b4
<dh`>
should you be curious the search terms you want are "unboxing" and "devirtualization" :-)
agentcasey has quit [Read error: Connection reset by peer]
<discocaml>
<doraent> How does this happen ? I feel like the only part the compiler has to do is infer the types, and the algorithm does not seem that slow if this is something like W. Is the type inference algorithm slower than what I think, or am I missing a part of the process ?
<discocaml>
<octachron> This is the performance of the compiled executable: a non-polymorphic float function can be more optimized than a boxed polymorphic function.
<discocaml>
<doraent> ok
<discocaml>
<contificate> > does not seem that slow if this is something like W
<discocaml>
<contificate> lol, if this is referring to algo W, I have bad news for you
<discocaml>
<doraent> understood, thanks
<discocaml>
<doraent> why ?
<discocaml>
<contificate> algorithm W is effectively defined by its reliance on a pure way of doing substitutions, which tends to mean reifying, composing, and applying them everywhere
<discocaml>
<contificate> it's not exactly prohibitively expensive, but it is bad
<discocaml>
<contificate> real implementations, such as the core of OCaml's typechecker, are more akin to algorithm J, whereby unifications are done as part of making types induce a union-find forest
<discocaml>
<contificate> all that said, you can still easily feed pathological input to the algorithms, regardless of the choice here
<discocaml>
<contificate> but W is only reserved for Haskellers, not real implementations
<discocaml>
<contificate> anyway, that point is unrelated to the discussion, I just always have to come down hard on any mention (or, worse, recommendation) of W
<discocaml>
<contificate> algo W is for formalising in proof assistants
Haudegen has quit [Quit: Bin weg.]
bartholin has joined #ocaml
myrkraverk has joined #ocaml
Haudegen has joined #ocaml
myrkraverk_ has quit [Ping timeout: 248 seconds]
<dh`>
contificate: is there a good summary of that somewhere? I don't know as much about it as I probably ought
wbooze has quit [Quit: Leaving]
<discocaml>
<contificate> summary of which part
<discocaml>
<contificate> frankly, it is a kind of historical error that people learn of algorithm W at all
<discocaml>
<contificate> Milner's own paper presents it for proof purposes and then present algorithm J as the simpler alternative
<discocaml>
<contificate> simpler, less error-prone, and more efficient
<discocaml>
<contificate> my theory is that Haskellers have popularised W because it allows them to play with toys a bit more
<discocaml>
<contificate> there's a faux mathematical quality to representing substitutions, composing them, and ineffeciently applying them to the entire type environment
<discocaml>
<contificate> it's hard to talk about performance of HM inference too formally, because, on paper, the worst case bounds are not good at all - but, whenever someone says "practically linear", they mean: (1) not pathological input like exponentially large product types being generalised, (2) algorithm J with union-find using ranks and path compresion + not scanning env for generalisation (Remy levles)
<discocaml>
<contificate> it's hard to talk about performance of HM inference too formally, because, on paper, the worst case bounds are not good at all - but, whenever someone says "practically linear", they mean: (1) not pathological input like exponentially large product types being generalised, (2) algorithm J with union-find using ranks and path compresion + not scanning env for generalisation (Remy levels)
dstein64- has joined #ocaml
dstein64 has quit [Ping timeout: 244 seconds]
dstein64- is now known as dstein64
<discocaml>
<octachron> "Practically linear" is probably better explained as quasi-linear in the size of types (seen as trees and not digraphs?) + the hypothesis that type sizes are bounded in practically programs (rather than doubly exponential in the size of the code source)
<dh`>
I found someone's lecture notes on it
<dh`>
I don't think I've ever seen the original paper
<discocaml>
<contificate> original paper is very readable
<discocaml>
<contificate> 1977 and he makes brief allusion to the technique of introducing a coercion/cast operation to do things like implicit integer widening etc. 🙏
<dh`>
thanks
<dh`>
hmm
<dh`>
naively, I would have thought it's better to apply substitutions immediately because then you don't have to do that work over again
<dh`>
but I see none of this involves updating terms, or terms that carry types
<discocaml>
<contificate> doing it immediately and pervasively is how algorithm J works
<discocaml>
<contificate> algorithm W uses a unification algorithm those output is actually a unifying substitution, so it has to propagate the information around, which kind of annoyingly emulates destructive/ambient state in the simple J impl
<dh`>
the last time I wrote one of these from scratch was quite a while ago, but I remember being careful to make sure that the substitution environment always contained the best available interpretation for each a (and not a chain of substitutions a1 -> a2 -> a3 -> t)
<dh`>
which is I guess approximately the same thing
<dh`>
also I have never understood why these things don't sometimes generate circular substitutions if you don't take the trouble to prevent it
Serpent7776 has quit [Ping timeout: 252 seconds]
<dh`>
(the last one I wrote maintained an invariant that if the substitution environment contains ai -> aj, j < i
<dh`>
)
<discocaml>
<contificate> well, certain inputs would produce circular substitutions
Anarchos has joined #ocaml
<discocaml>
<contificate> in the sense that you need the occurs check to avoid infinitely sized types
<dh`>
the occurs check isn't sufficient to avoid a case where S contains a1 -> a2 and a2 -> a1
<Anarchos>
If the ocaml testsuite fails for 4 tests, may i still send a PR to reactivate the native compilation for a platform ?
<dh`>
and that is exactly what one would expect to get in a naive implementation typechecking assignments x := y and y := x
<discocaml>
<contificate> well it is, because it traverses the induced relation from the rewriting of types vars to be links to other types
<discocaml>
<contificate> avoiding cycle creation is what the occurs check is
<dh`>
that's a different kind of cycle
<discocaml>
<contificate> what is the problem of doing `x := y` and `y := x`, I will assume you wrote this as pseudocode for assignment, and not OCaml refs where it's obviously wrong
<octachron>
Anarchos, yes it is better to send a PR asking to re-enable native compilation on Haiku rather than struggling silently for years.
<dh`>
I guess if all you have is x := y and y := x, and you generate tyvars a and b for x and y, you first get the substitution a -> b and then the second assignment just unifies b with itself
<discocaml>
<contificate> well, you unify representatives
<dh`>
(yes, that's just a generic assignment)
<Anarchos>
octachron thanks, cause i am totally unable to find the root causes of those failed tests
<discocaml>
<contificate> there's really no problem with those assignments
<discocaml>
<contificate> if they're physically equal type terms, unification succeeds by doing nothing
<discocaml>
<contificate> you don't create a cycle to link one unresolves tvar to itself, it's manifestly equal to itself
<dh`>
well sure
<dh`>
the question is whether you can get two halves of the cycle separately from separate but interrelated subterms
<dh`>
and then end up pasting them together
<dh`>
it's not clear that you can, but it's also not obvious that you can't, especially as you start using more exotic constructs in your syntax
<discocaml>
<contificate> the cycle just never gets formed though
<dh`>
algorithm J clearly can't do this because you thread a single substitution through everything
<discocaml>
<contificate> that's what the paper says but it's not exactly accurate
<dh`>
and never compose substitution results
<discocaml>
<contificate> you actually just rewrite types on the fly
<discocaml>
<barconstruction> I learned about the occurs check for the first time the other day while writing a logic program. Prolog is "unsound" and so one can construct "terms" that contain cycles, similar to cyclic lists in OCaml constructed with let rec, but depending on the application it may be desirable to forbid cyclic terms and so there is an occurs check during unification.
<discocaml>
<barconstruction>
<discocaml>
<barconstruction> This kind of sucks because my implementation of a red black tree doesn't appear to support insertion in time bounded by the depth of the tree. It seems that the occurs check is traversing the entire tree to see if unification is sound.
<discocaml>
<contificate> so each type var `'a` is actually boxed so it can be rewritten from being "unresolved" to become a link to another type, so `unify('a, int)` literally will update all type terms holding a ref to `'a` to contain a link to `int`
<discocaml>
<contificate> that's how it's done concretely
<discocaml>
<barconstruction> (This is in Elpi, one of the tactic languages for Rocq)
<dh`>
but it's much less obvious with algorithm W
<discocaml>
<contificate> because this is effectively a bunch of equality constraints among subterms, you probably could blast STLC to a simple SMT solver for equality - which, as you may guess, is implemented using union-find
<discocaml>
<contificate> algorithm W is just a joke
<discocaml>
<contificate> the idea of actually manually composing substitutions and getting it all right is just crazy
<discocaml>
<contificate> indeed, Prolog engines (like WAM) omit the occurs check by default
<discocaml>
<contificate> but many impls let you turn it on I'm sure
<discocaml>
<contificate> my understanding of Prolog programming is that it can devolve into manually pruning things with cut
<discocaml>
<contificate> into a kind of strange declarative imperative programming task
<discocaml>
<barconstruction> I don't really understand cuts yet because I don't understand exactly where choice points arise.
<discocaml>
<barconstruction> Elpi has cuts and I was able to get some performance gains from using them but the runtime of insert is definitely still growing faster than the depth of the tree. Disabling the occurs check makes it 10x faster
<discocaml>
<eval.apply> if you don't want occurs check bog you down, you do it like the ocaml typechecker where all subroutines that need to avoid loops incrementally check for cycles, and then do full occurs check once at the end
<discocaml>
<contificate> better yet, elide it entirely 💯 I make no mistakes
<discocaml>
<eval.apply> good thing too, because that would create a cycle
<pgiarrusso>
dh`: IIRC, to avoid the a->b and b->a cycle, you can apply the substitution you have built to new types before adding them to the substitution
<pgiarrusso>
TAPL's textbook unification (just looked up) instead applies the substitutions you're building to remaining constraints, which is a similar idea (not 100% sure my recollection is also correct, even if it seems so)
Tuplanolla has joined #ocaml
agentcasey has quit [Quit: ZNC 1.10.x-git-38-e8c4cda0 - https://znc.in]
troydm has quit [Ping timeout: 260 seconds]
<discocaml>
<contificate> it just doesn't happen in the algo J case with union-find
<discocaml>
<contificate> it's like.. suppose you induce a link from `ty(a)` to `ty(b)`, then each has been unified with the other and must be compatible with any context which the other finds itself, so it's like.. if one or either is a type variable, you will chase up to that and conclude that the representatives are physically equal anyway
agentcasey has joined #ocaml
pi3ce has joined #ocaml
ygrek has quit [Remote host closed the connection]
euphores has quit [Quit: Leaving.]
sroso has joined #ocaml
euphores has joined #ocaml
Anarchos has quit [Quit: Vision[]: i've been blurred!]
<discocaml>
<eval.apply> `find` is applying the substitution in that case
<discocaml>
<contificate> well sure, but it's just a detail that doesn't come up in the scenario cited, so I wasn't really following what the problem with the dual assignments example was
Serpent7776 has joined #ocaml
<discocaml>
<contificate> the state of the union-find forest is that all links are represented by their representative, so the substitution is done when the link is made, merely chasing it is just ensuring correctness
Anarchos has joined #ocaml
Haudegen has quit [Quit: No Ping reply in 180 seconds.]
Anarchos has quit [Quit: Vision[]: i've been blurred!]
eilvelia has quit [Ping timeout: 260 seconds]
eilvelia has joined #ocaml
agentcasey_ has joined #ocaml
agentcasey has quit [Ping timeout: 268 seconds]
remexre has quit [Ping timeout: 245 seconds]
remexre has joined #ocaml
Tuplanolla has quit [Quit: Leaving.]
ygrek has quit [Remote host closed the connection]
ygrek has joined #ocaml
<discocaml>
<octachron> Hm, it has been a long time since I have witnessed the OOM-killer killing the right process ... now I have to wonder if the typechecker running out-of-memory means that there is a genuine performance bug in the typechecker or not.