#jruby on 2021-11-11 — irc logs at libera.irclog.whitequark.org

2021-10-13 17:53 ChanServ changed the topic of #jruby to: Get 9.3.1.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:24 fidothe has quit [Ping timeout: 268 seconds]

00:27 siasmj has quit [Ping timeout: 264 seconds]

00:48 subbu has quit [Quit: Leaving]

00:53 siasmj has joined #jruby

01:01 fidothe has joined #jruby

01:26 subbu has joined #jruby

01:52 subbu has quit [Quit: Leaving]

09:00 deividrodriguez[ has quit [Quit: You have been kicked for being idle]

09:00 nelsnnelson[m] has quit [Quit: You have been kicked for being idle]

12:14 <mattpatt[m]> @headius switching it to base off 9.3 fixed the mri:int failures. I added the Rubyspec spec:ruby:fast task and now it's now failing weirdly. Bumping to v2 of the setup-java action (which will do maven cache for you) has broken everything.

12:14 <mattpatt[m]> So, reasonable progress then... 🥳

12:15 <mattpatt[m]> * @headius: switching

12:29 <mattpatt[m]> oooh, not the v2 setup-java action, just attempting to 'unset' gem-related env vars.

12:45 <mattpatt[m]> If someone could take a quick look at the failed `spec:ruby:fast` job in https://github.com/fidothe/jruby/actions/runs/1448727436, I'd be interested if these failures are expected or (particularly the cancelled one) something seen before on Travis.

13:41 siasmj has quit [Ping timeout: 256 seconds]

13:41 fidothe has quit [Ping timeout: 268 seconds]

13:42 fidothe has joined #jruby

13:43 siasmj has joined #jruby

14:17 fidothe has quit [Ping timeout: 260 seconds]

14:18 <enebo[m]> mattpatt: It is pretty likely the 5F,1E is ok. I do see some failures locally in my FC env involving IPv6

14:19 siasmj has quit [Ping timeout: 240 seconds]

14:33 siasmj has joined #jruby

14:44 fidothe has joined #jruby

15:04 <edipofederle[m]> <headius> "edipo.federle: are you still..." <- headius: hi, yes, I plan to finish it this weekend, its ok ?

15:33 <enebo[m]> edipo.federle: it is fine

16:03 <mattpatt[m]> @enebo the Etc.getlogin failure is in a spec with Travis-specific code so I'm not mega surprised about that

16:04 <mattpatt[m]> enebo: the one that worries me is the 'cancelled' JDK 8 spec-ruby-fast job, because it cancelled itself

16:04 <mattpatt[m]> not sure if it did because the JDK 11 one failed

16:04 <mattpatt[m]> or because of something I caused

16:08 <mattpatt[m]> it seems to happen just after the Etc.getlogin failure

16:16 <enebo[m]> mattpatt: oh it cancelled itself...I thought you did somehow

16:17 <enebo[m]> I don't know that much about GHA and the times I have used it my stuff was all green to begin with

16:19 <enebo[m]> 5:46s for java 8 and a tiny bit less on Java 11 for those sections so it seemed like 8 was either done without output or very close

18:38 <mattpatt[m]> enebo: I think the main problem we'll have is that Travis' runners were very full-fat linux machines, and it looks like the GHA ones are much more stripped down, so there'll be a lot of stuff where the implicit dependencies on stuff will bite us because they vanished

18:39 <enebo[m]> but do you think it is possible we went over some resource limit?

18:39 <enebo[m]> that cancelled job was running like 20s longer than the other one

18:39 <mattpatt[m]> the machines have 7GB RAM, and the auto-kill timeouts are measured in hours

18:39 <enebo[m]> heh ok

18:40 <mattpatt[m]> it's weird

18:40 <enebo[m]> yeah so the other theory is that perhaps one job failing led to cancelling the other? I have not seen that with the other GHA things we are running

18:40 <enebo[m]> Although it is possible the ones we fail on fail after the others all finish

18:41 <enebo[m]> I can refire your PR run right? Let's just re-run and see if we get the same result

18:41 <mattpatt[m]> the other jobs explicitly list fail-fast: false in their strategy section

18:43 <enebo[m]> the theory one killed the other does still make the most sense to me since they are in their own matrix (says the guy who knows almost nothing about GHA)

18:44 <enebo[m]> Another test would be to not put those two jobs in the same matrix and see if they both then complete

18:44 <mattpatt[m]> Almost certainly unrelated: The Travis setup also had redis-server running. What needs that? A quick search in the code for redis turns nothing up

18:44 <enebo[m]> haha

18:45 <mattpatt[m]> There were some issues related to socket handling for redis, but I couldn't connect the dots

18:45 <enebo[m]> My only substantial experience with GHA was setting up 3 OS builds of the jruby-launcher rust port

18:45 <enebo[m]> I found myself cloning lots of crap in GHA recipeland until it all worked

18:46 <enebo[m]> so is that just because we pick an existing image?

18:46 <enebo[m]> ruby-build perhaps needs it for other things so they just include it

18:47 <enebo[m]> err I guess we don't use ruby-build although I suppose that makes sense since we are a java project

18:54 <enebo[m]> https://github.com/actions/virtual-environments/blob/main/images/linux/Ubuntu2004-README.md

18:54 <enebo[m]> No redis in default image which I think this page lists what should be in it

18:54 <mattpatt[m]> This was in Travis, not GHA - it's just listed as a service to install and run at the bottom of .travis.yml

18:54 <enebo[m]> oh

18:55 <enebo[m]> ok. That might have been for -Ptest where I think we used to (or one job may have still) ran some app server and integration tests

18:55 <enebo[m]> Actually let me see what phase it was. mkristian used to run lots of integration with stuff

18:56 <mattpatt[m]> jobs.<job_id>.strategy.fail-fast

18:56 <mattpatt[m]> When set to true, GitHub cancels all in-progress jobs if any matrix job fails. Default: true

18:57 <mattpatt[m]> aha, got the cancelled job thing:

18:58 <enebo[m]> nice!

18:58 <enebo[m]> mattpatt: so if that is out of the way I guess I can try and figure out why we fail some of these tests. I get a few IPv6 errors locally so I guess I even have a starting point

18:59 <enebo[m]> fwiw I think we can probably just tag these out for now since it has been this way for at least a couple of years

18:59 <enebo[m]> Nothing new is broken

18:59 <enebo[m]> the getlogin error is new but that may just be a bad test assuming the env will act a particular way

18:59 <mattpatt[m]> If you like, I can move all the jobs over and then we can triage expected vs unexpected failures and take it from there

19:00 <mattpatt[m]> my main worry was if new and exciting things were failing

19:00 <mattpatt[m]> which would suggest bigger problems with the differences between the Travis and SHA stacks

19:00 <mattpatt[m]> s/SHA/GHA/

19:01 <enebo[m]> yeah so far I think travis vs GHA may just be some env differences and we will have to triage those

20:41 <mattpatt[m]> There's a Travis job called 'MRI core jit' that runs `jruby -S rake test:mri:core:fullint`, and 'MRI core jit jdk11' that runs `jruby -S rake test:mri:core:jit`. Is the MRI core jit job justr badly named?

20:41 <mattpatt[m]> Or is it badly named and in need a JDK 8 job that runs `jruby -S rake test:mri:core:jit` too?

21:08 <enebo[m]> mattpatt: it is just misnamed: :fullint => ["-X-C", "-Xjit.threshold=0", "-Xjit.background=false"],

21:08 <enebo[m]> -X-C means interpreted and threshold=0 means it will go from startup interpreter to full interpreter the first time it is called

21:10 <enebo[m]> HAHA this is pretty esoteric. The label is just wrong. MRI core full interp would be better name

21:11 <enebo[m]> mattpatt: thanks for helping out with this

21:44 <mattpatt[m]> enebo: wish it luck, i'm off to bed :-) https://github.com/fidothe/jruby/actions/runs/1450673803

21:45 <enebo[m]> wowzers!

21:46 <enebo[m]> do org accounts have limits per month?

21:46 <mattpatt[m]> lots of scope for refactoring once the footgun errors are removed

21:46 <enebo[m]> mattpatt: It will be cool to see this run

21:47 <mattpatt[m]> enebo: https://github.com/fidothe/jruby/actions/runs/1450673803

21:48 <mattpatt[m]> sorry, bad pasteboard

21:48 <mattpatt[m]> https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration#usage-limits

21:48 <mattpatt[m]> TL;DR, probably not for a public repo

22:44 <headius> nice... FWIW this is the only way to get parallel execution, even though it bloats up the list of checks a ton

22:44 <headius> I wish you could get parallel jobs without adding a check entry

23:51 yosafbridge has quit [Quit: Leaving]

23:56 yosafbridge has joined #jruby