#jruby on 2021-09-08 — irc logs at libera.irclog.whitequark.org

2021-06-15 18:25 ChanServ changed the topic of #jruby to: Get 9.1.19.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

08:38 joast has quit [Ping timeout: 240 seconds]

08:45 joast has joined #jruby

12:24 <headius> Ok so it is simple to repro anyway

13:13 <headius> something about this jnr-ffi update is causing very peculiar failures

13:42 <headius> this is not making any sense

13:42 <headius> I go back and forth between the commits and sometimes the jnr update fails and sometimes it does not

13:45 <basshelal[m]> Is this the JNR-FFI build failing or JRuby?

13:46 <headius> JRuby... the JNR update only fixed that test scope thing and then a test for our embedding API started failing out of nowhere

13:46 <headius> I have no explanation

13:46 <headius> this should have been a trivial update

13:58 <headius> enebo: can you try running `mvn package -Ptest` on this branch? https://github.com/jruby/jruby/pull/6811

13:59 <headius> it doesn't fail for me locally but it is apparently failing on both 8 and 11 on GHA

14:11 <enebo[m]> headius: ok

14:12 <enebo[m]> going to run master first just to make sure I can tell what the difference may be

14:12 <headius> all that has changed on this branch is updating jnr-ffi and up, with the jnr-ffi change only being to move junit from compile scope to test scope in the maven build

14:13 <headius> I will double check that but there's no reason it should affect this... the failures in GHA look like our JSR223 scripting engine is not getting registered so the tests get NPE trying to use it

14:13 <headius> I have no leads

14:15 <enebo[m]> ok master is green for me locally. If I ran into any issue I would have run it anyways

14:19 <enebo[m]> For master I noticed this:

14:19 <enebo[m]> [INFO] -------------------------------------------------------... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/f85727ad4c98c4ce593f129848d9954697ec7a57)

14:20 <enebo[m]> I only happened to notice this because on your branch a lot of tests are running with highlighting which did not happen before

14:22 <enebo[m]> err

14:22 <enebo[m]> I see that in this branch run too...I guess default-test is nothing

14:22 <enebo[m]> headius: it ran to completion

14:23 <enebo[m]> headius: but my oddity still stands...master took 42s to run and your branch took 5 minutes because test runs

14:24 <headius> hmm

14:25 <headius> yeah that is the default tests for lib or something

14:25 <headius> there are none

14:25 <headius> Tests run: 437, Failures: 0, Errors: 0, Skipped: 0

14:25 <headius> master should not complete those tests in 42s

14:26 <headius> is it possible they stopped running on master somehow and this is a regression?

14:26 <headius> but it is green for you and me on branch

14:26 <headius> do you see those 437 tests run further up the log?

14:28 <enebo[m]> on branch but not on master

15:25 <headius> by jove you're right

15:25 <headius> master is not running tests with this command

15:25 <headius> so this may be a valid regression that was not caught because tests stopped running

16:15 <headius> lunch break and then I will try to figure this out

16:47 _whitelogger has joined #jruby

17:34 <headius> hmm

17:36 <headius> https://github.com/jruby/jruby/commit/da7883573a294d12e9461ec611d7171b432d8c7e

17:36 <headius> a bad commit some time back flipped this to a single test... wondering if my commit somehow didn't take properly

17:56 <headius> basshelal: https://dzone.com/articles/why-your-junit-5-tests-are-not-running-under-maven

17:56 <headius> you know anything about this?

17:58 <headius> enebo: I think this might be the root cause, mostly because of this junit/surefire conflict: https://github.com/jruby/jruby/pull/6788

18:01 <headius> ahorek: I did not notice until now that the test runs in that PR didn't actually run the tests

18:01 <headius> the same ones that started failing in my JNR update

18:02 <headius> I've created https://github.com/jruby/jruby/pull/6814 to revert that and see what happens

18:05 <headius> ok that did not fix it

18:05 <headius> may need to go back further

18:05 <headius> or figure out this junit conflict (the article linked just fixes it by using an older surefire)

18:07 <headius> maybe this one is the culprit, since it brought us to surefire 3: https://github.com/jruby/jruby/pull/6296

18:08 <ahorek[m]> headius: feel free to revert it if it helps. But there wasn't any dependency change between jnr-ffi-2.2.5 vs jnr-ffi-2.2.6?

18:08 <headius> a PR by basshelal updated junit for jnr-ffi and accidentally made it "compile" scope, so we had to fix that... and then this issue showed up

18:09 <headius> but even before that it seems surefire stopped running these tests

18:09 <headius> something about the jnr changes may have exposed the problem but I don't have a good explanation yet

18:10 <headius> I will try the latest milestone of surefire 3, and if that doesn't work I'll try the last 2.x release

18:11 <ahorek[m]> aha, ok

18:12 <headius> ok nevermind, M5 is the latest released

18:19 <headius> 2.22.2 did not help... I'm going to bisect

18:19 <ahorek[m]> https://github.com/jruby/jruby/blob/master/pom.rb#L201 there seems to be a conflict

18:19 <headius> I noticed that... unsure if it is the problem

18:21 <ahorek[m]> probably not

18:21 <headius> I'm not sure why that version is specified separately to begin with

18:32 <headius> ugh this is just resolving to the last time I updated JNR

18:37 <headius> basshelal: seems like that junit5 update caused way more problems than either of us expected!

18:37 <headius> I'm going to try to add the junit5 jupiter engine to JRuby's build and see where we stand then

19:00 <headius> ok I think I have master running tests again without the jnr change and it also seems to have engine problems

19:06 <basshelal[m]> <headius> "basshelal: seems like that junit..." <- damn, that's upsetting, the benefits are big

19:07 <basshelal[m]> If it's failing with JRuby too using JUnit5 then we might be doing something wrong with JUnit5 ?

19:25 <headius> The change I have locally enables the legacy engine and that seems to get tests running again

19:26 <headius> So the junit 5 dependency in compiled scope broke the tests and prevented them from running

19:26 <headius> And during that time we may have regressed on this engine test

19:27 <headius> The updated jnr-ffi removes the compile dependency and the tests start running and failing

19:53 <basshelal[m]> Oh wow ok that kind of makes sense

19:54 <basshelal[m]> So we can't use the new engine because?

19:54 <basshelal[m]> So we have JUnit5 but on an older engine

19:55 <headius> so there are a couple issues

19:56 <headius> if you pull in junit5 and have junit4 tests, they will not be detected without the legacy engine

19:56 <headius> the jnr compile dep thing caused master to stop running our junit4 tests

19:58 <basshelal[m]> WHAT?

19:58 <basshelal[m]> So if I'm running JUnit5 but my dependencies are using 4 it won't work?

19:58 <basshelal[m]> I'm a little confused

19:59 <headius> if you pull in junit5 it will only autodetect junit5+ tests

19:59 <headius> if I'm reading these posts right

19:59 <headius> so in our case we ended up with junit bumping up to 5 to resolve the dependency conflict, but all our tests are 4

20:00 <headius> so the second issue then is that while we were not running these tests, something regressed in the JRubyEngineTest related stuff

20:00 <headius> when I pulled in the fixed jnr-ffi, that started failing and confused the heck out of me

20:01 <headius> I guess this at least shows our junit tests are important!

20:01 <headius> I think at this point I'm going to merge the jnr update to master and we will see if that test continues to fail in CI (it does not fail for either enebo or myself locally)

20:02 <basshelal[m]> I'm still a little confused but ok, do that and tell me how it goes. Not much I can do to help other than maybe just undo and go back to JUnit4? But I'm still very curious to know about this

20:02 <basshelal[m]> keep me posted

20:03 <headius> FYI this is the patch I did on master (with the junit5 compile dependency issue): https://gist.github.com/headius/c97a7556bc98fb3e7932faae23895480

20:03 <headius> that at least lets the junit4 tests run on master

20:03 <headius> the jnr update will fix the junit5 compile dependency but might start showing the regressed test... if it does I will just deal with it

20:04 <headius> the junit version update there probably does nothing because it will end up choosing junit5 anyway

20:04 <headius> there = my patch

20:04 <headius> it is confusing 🤪

20:07 <basshelal[m]> If I'm getting this correctly, the only long term fix would be to convert all of JRuby's test to JUnit5 and use the new engine assuming there's no other JUnit4 dependencies

20:07 <basshelal[m]> But it's all test scope so it shouldn't matter :/

20:07 <headius> we could also do it a test at a time if we run both the legacy and the jupiter engines

20:07 <headius> that is possible too

20:07 <headius> they will do separate runs but it should detect both junit4 and junit5 tests

20:08 <headius> I merged the JNR update... we'll see how it goes in CI

20:08 <basshelal[m]> ok let's see

20:10 <headius> enebo: fun stuff but I think I untangled the knot

20:10 <headius> if this fails on master then there's something odd about our JSR-223 engine very recently

20:10 <headius> or some interaction with JDK updates on GHA and Travis

20:15 <headius> same tests failed on master after merge... so something seems to have broken

20:16 <enebo[m]> headius: should I try it now on master?

20:16 <headius> yeah try it again with updated master

20:16 <headius> it still does not fail for me locally

20:17 <headius> the failures look like our engine is not getting registered correctly

20:19 <headius> ok wat, it is failing now that I merged

20:19 <headius> well at least I have something in hand I can investigate

20:21 <headius> I sure hope it fails for you too

20:23 <enebo[m]> [ERROR] org.jruby.embed.jsr223.JRubyEngineTest.testClearVariables Time elapsed: 0.18 s <<< ERROR!... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/83014ef89d00a1cbcc737a5b8ced01734cacab3e)

20:23 <enebo[m]> why is this not indenting...I am tabbing

20:23 <enebo[m]> 9 errors

20:23 <headius> 9?

20:23 <headius> I have 6

20:23 <enebo[m]> [ERROR] Tests run: 437, Failures: 0, Errors: 9, Skipped: 0

20:23 <headius> what in the what

20:24 <headius> gist the failures for me somewhere

20:24 * headius sent a code block: https://libera.ems.host/_matrix/media/r0/download/libera.chat/f44a42972b498196ad6e1682da91a83f084a908f

20:24 <headius> that's all I have

20:24 <headius> some of these are a null instance, some are NPE when it tries to eval

20:24 <enebo[m]> [INFO] Results:... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/617c73f19de24b9ad6b4c636ef059e45da2a1876)

20:24 <headius> they make no sense

20:24 <enebo[m]> lol

20:24 <enebo[m]> so 3 scriptingcontainewrtest errors

20:24 <headius> yeah I have no idea about those

20:25 <headius> what Java version did you run on?

20:26 <enebo[m]> openjdk version "1.8.0_242"

20:26 <enebo[m]> well it is an internal error whilte calling eval

20:26 <headius> do you have any local changes?

20:26 <headius> yeah

20:26 <enebo[m]> It seems to be masking where it is happening

20:26 <enebo[m]> This is clean master

20:26 <headius> ScriptError does that

20:26 <headius> unfortunately

20:26 <enebo[m]> no cause field?

20:27 <ahorek[m]> https://github.com/jruby/jruby/pull/6815

20:27 <headius> I'm not sure, but it may be surefire not printing the cause properly

20:27 <enebo[m]> maybe we need debug/trace on or something

20:27 <headius> ahorek: 😳

20:27 <headius> did we change this recently?

20:28 <ahorek[m]> https://github.com/jruby/jruby/commits/master/core/src/main/java/org/jruby/embed/variable

20:29 <ahorek[m]> but travis died on... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/aed103804603fc207f309e0d49233839df9a7da5)

20:29 <enebo[m]> epic

20:30 <ahorek[m]> but travis died on :-(... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/65dc8e200d264638f816fbdd3c07af2eaddad87d)

20:30 <headius> that one has appeared and disappeared before

20:30 <headius> I have no explanation for it either

20:31 <enebo[m]> so this fix will just make ARGV if it is not already on stopself?

20:31 <enebo[m]> but isn't ::ARGV always there? Maybe something is deleting it after a run?

20:32 <headius> yeah I want to know why this works

20:32 <enebo[m]> well ignore my description

20:32 <headius> most of what we updated in the last week was just Ruby dependencies and the CFG fixes

20:33 <enebo[m]> I just think this is weird

20:33 <headius> and the jnr juggling but that should be unrelated and minimal

20:33 <enebo[m]> getVariable is only called once containsVariable returns true

20:34 <headius> I'm testing if did_you_mean update caused something odd

20:34 <headius> that is the only updated gem that loads at boot

20:34 <enebo[m]> then the variable is accessed. This code is correct but the implementation is broken somehow or there is concurrent testing or something like?

20:34 <headius> if not that, could be stdlib update, or CFG work

20:35 <headius> did_you_mean is off the hook

20:35 <enebo[m]> CFG changes will only change Ruby code and really only in a case where a constant value is propagated to create dead code

20:36 <headius> enebo: yeah but could it break some initialization that makes the engines fail?

20:36 <headius> like could it eliminate an ARGV = something

20:37 <enebo[m]> but ARGV is a ruby thing so I don't think removing code would remove ARGV

20:37 <ahorek[m]> org.jruby.embed.internal.BiVariableMap{ARGV=[Ljava.lang.String;@6fc6f68f}

20:37 <ahorek[m]> vars.getVariable(ARGV) => null

20:37 <headius> trying stdlib revert now

20:38 <headius> nope

20:39 <ahorek[m]> it's just a workaround, but it's worth more investigation

20:39 <headius> I'm going to spin a revert PR that backs off your changes enebo

20:40 <enebo[m]> ok

20:40 <enebo[m]> I don't want to be the one who says it can't be that because it definitely could be but I am confused ARGV is set every container setup by Java code

20:41 <enebo[m]> The 223 code in these classes seems to be settingh a new ARGV in Java

20:41 <headius> yeah it doesn't make sense to me either

20:41 <enebo[m]> what is stranger to me is it is true for containsVar to make it in there so it really think ARGV is in there

20:41 <headius> https://github.com/jruby/jruby/pull/6816/checks?check_run_id=3549315717

20:42 <enebo[m]> Even ahorek hashrocket shows it things it is in there

20:42 <enebo[m]> the get itself is just returning null at that point

20:42 <headius> it will be interesting to see what is causing this b

20:42 <headius> because I have zero theories

20:43 <enebo[m]> It has to be that epic kares commit doesn't it?

20:43 <headius> the one that was in the variable dir ahorek linked to?

20:43 <enebo[m]> "has to" is another famous last phrase :)

20:43 <headius> which epic commit do you mean? I don't think there's any recent epic kares commits

20:44 <enebo[m]> headius: ok wait...when did this break?

20:44 <enebo[m]> I thought we don't know because of another commit which turned off testing

20:45 <headius> it broke sometime after the 1st

20:45 <headius> the jnr update on the 1st disabled the tests

20:45 <headius> before that this was running and passing

20:45 <enebo[m]> oh ok I thought that was going to be earlier

20:45 <headius> unfortunately since then we landed a bunch of library updates and your IR work

20:45 <enebo[m]> then kares may be off the hook :)

20:45 <headius> MAY

20:46 <headius> we can always blame kares if nothing else

20:46 <enebo[m]> I am just seeing if he wakes up or not

20:46 <enebo[m]> I thought that was a recent commit too :)

20:48 <headius> bad news enebo

20:48 <enebo[m]> I do always think there is a possibility dead code could kill live code by mistake but I find this baffling

20:48 <headius> revert is passing

20:48 <enebo[m]> It annoys me these tests pass through the debugger

20:49 <enebo[m]> I will trun the entire file

20:49 <enebo[m]> yay...repro'd

20:49 <headius> yeehaw

20:50 <headius> enebo: should I merge the revert and then you can give it another go, or just leave it and you'll fix it on master?

20:50 <enebo[m]> give me 5 before I decide that

20:50 <headius> ok

20:52 <enebo[m]> catch (Exception e) .... examine NPE stacktrace...literally the one given

20:52 <enebo[m]> as the cause of the ScriptException

20:56 <enebo[m]> well LocaloptimizationPass never replaces any branches with new instrs so that is mysterious

21:01 <enebo[m]> When it fails: if ( var != null && var.isReceiverIdentical(receiver) ) {

21:02 <enebo[m]> isReceiverIdentical is not identical

21:02 <enebo[m]> it uses == in Java to compare but it begs how "main" would be different.

21:03 <enebo[m]> yeah two mains...hmm

21:06 <enebo[m]> Without testing this I think I see why it is broken but it makes more questions :)

21:07 <enebo[m]> I removed static field Nil.NIL in the operand and made it a per runtime field

21:07 <enebo[m]> I will just verify that is the problem quickly

21:12 <enebo[m]> Lesson learned...do not try and read the revert diff

21:13 <enebo[m]> but I made a classic mistake

21:13 <headius> classic enebo

21:13 <enebo[m]> I wanted to see Nil and thought it had no runtime reference within it...but I forgot it caches...so static fields makes multiple runtimes get confused

21:15 <enebo[m]> So to fix this I can revert how nil is used but now I need to pass something into simplifyBranch so it can find manager to get nil

21:15 <enebo[m]> My 3 additional errors still seem to be here but I can see it is something to do with not finding 'date'

21:15 <enebo[m]> I am willing to bet I have some environmental issue there for those 3

21:15 <enebo[m]> I should have a fix for this in a few minutes. I will just push it. It is just making it not static and getting an object there that simplify can access nil with

21:16 <headius> ohh I see

21:16 <headius> you made the instance field static, not the other way around

21:16 <enebo[m]> yeah I was reading the revert PR :)

21:17 <enebo[m]> believe me I was confused...I thought what the hell I made it an instance field...it should have made this better :)

21:17 <headius> haha

21:17 <headius> ok

21:17 <headius> well simple fix then

21:17 <headius> huzzah

21:32 <headius> enebo: I'm done for today... JNR stuff has landed and all that's left is the JI issue, I think

21:33 <headius> this release will be the first one with jruby-base in addition to jruby-core so that might be an adventure

21:33 <headius> Assuming the JI issue is not fixed before tomorrow, I'll start looking into that

21:33 <headius> ttfn

21:35 <enebo[m]> headius: ok cya