joast has quit [Ping timeout: 240 seconds]
joast has joined #jruby
<headius> Ok so it is simple to repro anyway
<headius> something about this jnr-ffi update is causing very peculiar failures
<headius> this is not making any sense
<headius> I go back and forth between the commits and sometimes the jnr update fails and sometimes it does not
<basshelal[m]> Is this the JNR-FFI build failing or JRuby?
<headius> JRuby... the JNR update only fixed that test scope thing and then a test for our embedding API started failing out of nowhere
<headius> I have no explanation
<headius> this should have been a trivial update
<headius> enebo: can you try running `mvn package -Ptest` on this branch? https://github.com/jruby/jruby/pull/6811
<headius> it doesn't fail for me locally but it is apparently failing on both 8 and 11 on GHA
<enebo[m]> headius: ok
<enebo[m]> going to run master first just to make sure I can tell what the difference may be
<headius> all that has changed on this branch is updating jnr-ffi and up, with the jnr-ffi change only being to move junit from compile scope to test scope in the maven build
<headius> I will double check that but there's no reason it should affect this... the failures in GHA look like our JSR223 scripting engine is not getting registered so the tests get NPE trying to use it
<headius> I have no leads
<enebo[m]> ok master is green for me locally. If I ran into any issue I would have run it anyways
<enebo[m]> For master I noticed this:
<enebo[m]> [INFO] -------------------------------------------------------... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/f85727ad4c98c4ce593f129848d9954697ec7a57)
<enebo[m]> I only happened to notice this because on your branch a lot of tests are running with highlighting which did not happen before
<enebo[m]> err
<enebo[m]> I see that in this branch run too...I guess default-test is nothing
<enebo[m]> headius: it ran to completion
<enebo[m]> headius: but my oddity still stands...master took 42s to run and your branch took 5 minutes because test runs
<headius> hmm
<headius> yeah that is the default tests for lib or something
<headius> there are none
<headius> Tests run: 437, Failures: 0, Errors: 0, Skipped: 0
<headius> master should not complete those tests in 42s
<headius> is it possible they stopped running on master somehow and this is a regression?
<headius> but it is green for you and me on branch
<headius> do you see those 437 tests run further up the log?
<enebo[m]> on branch but not on master
<headius> by jove you're right
<headius> master is not running tests with this command
<headius> so this may be a valid regression that was not caught because tests stopped running
<headius> lunch break and then I will try to figure this out
_whitelogger has joined #jruby
<headius> hmm
<headius> a bad commit some time back flipped this to a single test... wondering if my commit somehow didn't take properly
<headius> you know anything about this?
<headius> enebo: I think this might be the root cause, mostly because of this junit/surefire conflict: https://github.com/jruby/jruby/pull/6788
<headius> ahorek: I did not notice until now that the test runs in that PR didn't actually run the tests
<headius> the same ones that started failing in my JNR update
<headius> I've created https://github.com/jruby/jruby/pull/6814 to revert that and see what happens
<headius> ok that did not fix it
<headius> may need to go back further
<headius> or figure out this junit conflict (the article linked just fixes it by using an older surefire)
<headius> maybe this one is the culprit, since it brought us to surefire 3: https://github.com/jruby/jruby/pull/6296
<ahorek[m]> headius: feel free to revert it if it helps. But there wasn't any dependency change between jnr-ffi-2.2.5 vs jnr-ffi-2.2.6?
<headius> a PR by basshelal updated junit for jnr-ffi and accidentally made it "compile" scope, so we had to fix that... and then this issue showed up
<headius> but even before that it seems surefire stopped running these tests
<headius> something about the jnr changes may have exposed the problem but I don't have a good explanation yet
<headius> I will try the latest milestone of surefire 3, and if that doesn't work I'll try the last 2.x release
<ahorek[m]> aha, ok
<headius> ok nevermind, M5 is the latest released
<headius> 2.22.2 did not help... I'm going to bisect
<ahorek[m]> https://github.com/jruby/jruby/blob/master/pom.rb#L201 there seems to be a conflict
<headius> I noticed that... unsure if it is the problem
<ahorek[m]> probably not
<headius> I'm not sure why that version is specified separately to begin with
<headius> ugh this is just resolving to the last time I updated JNR
<headius> basshelal: seems like that junit5 update caused way more problems than either of us expected!
<headius> I'm going to try to add the junit5 jupiter engine to JRuby's build and see where we stand then
<headius> ok I think I have master running tests again without the jnr change and it also seems to have engine problems
<basshelal[m]> <headius> "basshelal: seems like that junit..." <- damn, that's upsetting, the benefits are big
<basshelal[m]> If it's failing with JRuby too using JUnit5 then we might be doing something wrong with JUnit5 ?
<headius> The change I have locally enables the legacy engine and that seems to get tests running again
<headius> So the junit 5 dependency in compiled scope broke the tests and prevented them from running
<headius> And during that time we may have regressed on this engine test
<headius> The updated jnr-ffi removes the compile dependency and the tests start running and failing
<basshelal[m]> Oh wow ok that kind of makes sense
<basshelal[m]> So we can't use the new engine because?
<basshelal[m]> So we have JUnit5 but on an older engine
<headius> so there are a couple issues
<headius> if you pull in junit5 and have junit4 tests, they will not be detected without the legacy engine
<headius> the jnr compile dep thing caused master to stop running our junit4 tests
<basshelal[m]> WHAT?
<basshelal[m]> So if I'm running JUnit5 but my dependencies are using 4 it won't work?
<basshelal[m]> I'm a little confused
<headius> if you pull in junit5 it will only autodetect junit5+ tests
<headius> if I'm reading these posts right
<headius> so in our case we ended up with junit bumping up to 5 to resolve the dependency conflict, but all our tests are 4
<headius> so the second issue then is that while we were not running these tests, something regressed in the JRubyEngineTest related stuff
<headius> when I pulled in the fixed jnr-ffi, that started failing and confused the heck out of me
<headius> I guess this at least shows our junit tests are important!
<headius> I think at this point I'm going to merge the jnr update to master and we will see if that test continues to fail in CI (it does not fail for either enebo or myself locally)
<basshelal[m]> I'm still a little confused but ok, do that and tell me how it goes. Not much I can do to help other than maybe just undo and go back to JUnit4? But I'm still very curious to know about this
<basshelal[m]> keep me posted
<headius> FYI this is the patch I did on master (with the junit5 compile dependency issue): https://gist.github.com/headius/c97a7556bc98fb3e7932faae23895480
<headius> that at least lets the junit4 tests run on master
<headius> the jnr update will fix the junit5 compile dependency but might start showing the regressed test... if it does I will just deal with it
<headius> the junit version update there probably does nothing because it will end up choosing junit5 anyway
<headius> there = my patch
<headius> it is confusing 🤪
<basshelal[m]> If I'm getting this correctly, the only long term fix would be to convert all of JRuby's test to JUnit5 and use the new engine assuming there's no other JUnit4 dependencies
<basshelal[m]> But it's all test scope so it shouldn't matter :/
<headius> we could also do it a test at a time if we run both the legacy and the jupiter engines
<headius> that is possible too
<headius> they will do separate runs but it should detect both junit4 and junit5 tests
<headius> I merged the JNR update... we'll see how it goes in CI
<basshelal[m]> ok let's see
<headius> enebo: fun stuff but I think I untangled the knot
<headius> if this fails on master then there's something odd about our JSR-223 engine very recently
<headius> or some interaction with JDK updates on GHA and Travis
<headius> same tests failed on master after merge... so something seems to have broken
<enebo[m]> headius: should I try it now on master?
<headius> yeah try it again with updated master
<headius> it still does not fail for me locally
<headius> the failures look like our engine is not getting registered correctly
<headius> ok wat, it is failing now that I merged
<headius> well at least I have something in hand I can investigate
<headius> I sure hope it fails for you too
<enebo[m]> [ERROR] org.jruby.embed.jsr223.JRubyEngineTest.testClearVariables Time elapsed: 0.18 s <<< ERROR!... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/83014ef89d00a1cbcc737a5b8ced01734cacab3e)
<enebo[m]> why is this not indenting...I am tabbing
<enebo[m]> 9 errors
<headius> 9?
<headius> I have 6
<enebo[m]> [ERROR] Tests run: 437, Failures: 0, Errors: 9, Skipped: 0
<headius> what in the what
<headius> gist the failures for me somewhere
<headius> that's all I have
<headius> some of these are a null instance, some are NPE when it tries to eval
<headius> they make no sense
<enebo[m]> lol
<enebo[m]> so 3 scriptingcontainewrtest errors
<headius> yeah I have no idea about those
<headius> what Java version did you run on?
<enebo[m]> openjdk version "1.8.0_242"
<enebo[m]> well it is an internal error whilte calling eval
<headius> do you have any local changes?
<headius> yeah
<enebo[m]> It seems to be masking where it is happening
<enebo[m]> This is clean master
<headius> ScriptError does that
<headius> unfortunately
<enebo[m]> no cause field?
<headius> I'm not sure, but it may be surefire not printing the cause properly
<enebo[m]> maybe we need debug/trace on or something
<headius> ahorek: 😳
<headius> did we change this recently?
<enebo[m]> epic
<headius> that one has appeared and disappeared before
<headius> I have no explanation for it either
<enebo[m]> so this fix will just make ARGV if it is not already on stopself?
<enebo[m]> but isn't ::ARGV always there? Maybe something is deleting it after a run?
<headius> yeah I want to know why this works
<enebo[m]> well ignore my description
<headius> most of what we updated in the last week was just Ruby dependencies and the CFG fixes
<enebo[m]> I just think this is weird
<headius> and the jnr juggling but that should be unrelated and minimal
<enebo[m]> getVariable is only called once containsVariable returns true
<headius> I'm testing if did_you_mean update caused something odd
<headius> that is the only updated gem that loads at boot
<enebo[m]> then the variable is accessed. This code is correct but the implementation is broken somehow or there is concurrent testing or something like?
<headius> if not that, could be stdlib update, or CFG work
<headius> did_you_mean is off the hook
<enebo[m]> CFG changes will only change Ruby code and really only in a case where a constant value is propagated to create dead code
<headius> enebo: yeah but could it break some initialization that makes the engines fail?
<headius> like could it eliminate an ARGV = something
<enebo[m]> but ARGV is a ruby thing so I don't think removing code would remove ARGV
<ahorek[m]> org.jruby.embed.internal.BiVariableMap{ARGV=[Ljava.lang.String;@6fc6f68f}
<ahorek[m]> vars.getVariable(ARGV) => null
<headius> trying stdlib revert now
<headius> nope
<ahorek[m]> it's just a workaround, but it's worth more investigation
<headius> I'm going to spin a revert PR that backs off your changes enebo
<enebo[m]> ok
<enebo[m]> I don't want to be the one who says it can't be that because it definitely could be but I am confused ARGV is set every container setup by Java code
<enebo[m]> The 223 code in these classes seems to be settingh a new ARGV in Java
<headius> yeah it doesn't make sense to me either
<enebo[m]> what is stranger to me is it is true for containsVar to make it in there so it really think ARGV is in there
<enebo[m]> Even ahorek hashrocket shows it things it is in there
<enebo[m]> the get itself is just returning null at that point
<headius> it will be interesting to see what is causing this b
<headius> because I have zero theories
<enebo[m]> It has to be that epic kares commit doesn't it?
<headius> the one that was in the variable dir ahorek linked to?
<enebo[m]> "has to" is another famous last phrase :)
<headius> which epic commit do you mean? I don't think there's any recent epic kares commits
<enebo[m]> headius: ok wait...when did this break?
<enebo[m]> I thought we don't know because of another commit which turned off testing
<headius> it broke sometime after the 1st
<headius> the jnr update on the 1st disabled the tests
<headius> before that this was running and passing
<enebo[m]> oh ok I thought that was going to be earlier
<headius> unfortunately since then we landed a bunch of library updates and your IR work
<enebo[m]> then kares may be off the hook :)
<headius> MAY
<headius> we can always blame kares if nothing else
<enebo[m]> I am just seeing if he wakes up or not
<enebo[m]> I thought that was a recent commit too :)
<headius> bad news enebo
<enebo[m]> I do always think there is a possibility dead code could kill live code by mistake but I find this baffling
<headius> revert is passing
<enebo[m]> It annoys me these tests pass through the debugger
<enebo[m]> I will trun the entire file
<enebo[m]> yay...repro'd
<headius> yeehaw
<headius> enebo: should I merge the revert and then you can give it another go, or just leave it and you'll fix it on master?
<enebo[m]> give me 5 before I decide that
<headius> ok
<enebo[m]> catch (Exception e) .... examine NPE stacktrace...literally the one given
<enebo[m]> as the cause of the ScriptException
<enebo[m]> well LocaloptimizationPass never replaces any branches with new instrs so that is mysterious
<enebo[m]> When it fails: if ( var != null && var.isReceiverIdentical(receiver) ) {
<enebo[m]> isReceiverIdentical is not identical
<enebo[m]> it uses == in Java to compare but it begs how "main" would be different.
<enebo[m]> yeah two mains...hmm
<enebo[m]> Without testing this I think I see why it is broken but it makes more questions :)
<enebo[m]> I removed static field Nil.NIL in the operand and made it a per runtime field
<enebo[m]> I will just verify that is the problem quickly
<enebo[m]> Lesson learned...do not try and read the revert diff
<enebo[m]> but I made a classic mistake
<headius> classic enebo
<enebo[m]> I wanted to see Nil and thought it had no runtime reference within it...but I forgot it caches...so static fields makes multiple runtimes get confused
<enebo[m]> So to fix this I can revert how nil is used but now I need to pass something into simplifyBranch so it can find manager to get nil
<enebo[m]> My 3 additional errors still seem to be here but I can see it is something to do with not finding 'date'
<enebo[m]> I am willing to bet I have some environmental issue there for those 3
<enebo[m]> I should have a fix for this in a few minutes. I will just push it. It is just making it not static and getting an object there that simplify can access nil with
<headius> ohh I see
<headius> you made the instance field static, not the other way around
<enebo[m]> yeah I was reading the revert PR :)
<enebo[m]> believe me I was confused...I thought what the hell I made it an instance field...it should have made this better :)
<headius> haha
<headius> ok
<headius> well simple fix then
<headius> huzzah
<headius> enebo: I'm done for today... JNR stuff has landed and all that's left is the JI issue, I think
<headius> this release will be the first one with jruby-base in addition to jruby-core so that might be an adventure
<headius> Assuming the JI issue is not fixed before tomorrow, I'll start looking into that
<headius> ttfn
<enebo[m]> headius: ok cya