sagax has joined #jruby
<MattWelke[m]> I'm having some trouble dockerizing a test JRuby app. I'm going for the modern approach where you have multi layer builds and you use the first layer to build your runtime and your app and then copy in just what's needed to run your app into the final layer.
<MattWelke[m]> This is the app, which I've got working when I run it on my machine (using sdkman! to get Java and rbenv to get JRuby): https://github.com/mattwelke/jruby-javalin-example
<MattWelke[m]> The error I get when I try to run my image as a container is:... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/ef08f6eb9df65f7533c0f37d7cfd688ae66586cb)
<MattWelke[m]> I'm not surprised by this error because I'm not copying in the jars that jbundle downloaded for me. I can't figure out where jbundler put them though in my `build-ruby` image. On my machine, jbundler puts jars into `$HOME/.m2`, like its docs say it would. But I'm not using Maven in my builder image, as far as I know. Either way, the builder image has no `HOME` env var. I tried adding some `ls` commands in my Dockerfile to try to look
<MattWelke[m]> around for the jars as the build executes, but had no luck.
<MattWelke[m]> I'm using Java 17 for one of the builders but Java 11 to fetch my dependencies. This is intentional. I noticed when using JRuby on my local machine that jbundler appears to not support anything greater than Java 11. But I want to use Java 17 to run my app for performance reasons. My understanding is that new Java can run old Java class files because they're forwards compatible.
<MattWelke[m]> A few things that might look odd, which I'm open to feedback on. I might be missing important things.
<MattWelke[m]> I'm using bundler version 1.17.3 instead of the most recent version. This is also intentional because I noticed when trying to use the most recent version that jbundler didn't work. I commented about this on https://github.com/mkristian/jbundler/issues/90.
<MattWelke[m]> I'm copying in the Gemfile and Jarfile (and their lock files) from the build context into the runtime image. That's because I noticed a different error when I tried to run the app before doing this. I forgot to write down that error, but I googled around and learned that jbundler needs to be around at runtime in order to set up the classpath for the running app.
<MattWelke[m]> * A few things that might look odd, which I'm open to feedback on. I might be missing important things.... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/0d547163dc0f4d1a8e151a6839732db70503de72)
rcrews[m] has joined #jruby
<kares[m]> Hey Matt, believe JBundler is effectively dead by now and has been replaces by jar-dependencies (part of JRuby).
<kares[m]> Jars (dependencies) either loads jars from the Maven repo or rather you could vendor the jars with your app/gem.
<MattWelke[m]> Thanks! I found both jbundler and jar-dependencies when looking around. I couldn't find a clear answer about what each one was for but I saw jar-dependencies mentioned in jbundler's repo, so I concluded that the relationship between them was that jbundler worked at a higher level and somehow leveraged jar-dependencies.
<MattWelke[m]> I'll look more into jar-dependencies next time I hack on this.
<MattWelke[m]> Just checked again. I had it backwards. jbundler was mentioned in the jar-dependencies readme. But the reason I directed my attention to jbundler instead was because the jar-dependencies readme begins by talking about gemspecs and bundling gems. That's not what I was looking for. I was looking for something to help me manage the Java dependencies my app has. In my little test app, that's Javalin.
<MattWelke[m]> But the jbundler readme jumps right into talking about Jarfile and Jarfile.lock. It immediately felt to me like it was a tool meant to track and download jars you're interested in, like how one would use Maven or Gradle in JVM projects. That's what drew me to it.
<kares[m]> Honestly I haven't used JBundler that much but there was a way to lock down jars - would have expected a way to vendor them as well, again I am not sure and would need to play with things a bit. But I am sure jar-dependencies has the vendoring functionality.
drbobbeaty has quit [Ping timeout: 246 seconds]
drbobbeaty has joined #jruby
<boc_tothefuture[> headius: are you the right person for my icon question above? I did search again and couldn't find anything.
<enebo[m]> boc_tothefuture: You can use the logo. It was made for our use when we worked at engine yard but it was made specifically to not run afoul with use Java duke
<boc_tothefuture[> Thanks!
richbridger has quit [Remote host closed the connection]
richbridger has joined #jruby
<headius> Good morning!
<headius> back in the saddle
<kares[m]> Good afternoon!
<headius> boc_tothefuture: hopefully you also found jruby/collateral which has the vector originals for the various logos and logotypes
<headius> anewbhav: obviously startup time is a known issue but there are some ways to mitigate... maybe you can describe your setup a bit for us?
<headius> We still support "drip" which is a JVM preloader that has helped some folks. You might also look at "thein" (I think that's the name) which is a JRuby-compatible preloader for Rails
<headius> Matt Welke: The maven+jruby situation is kinda all over the place and we are seeking to take over the related gems so we can get that straightened out. That whole ecosystem was maintained by someone who is no longer active in JRuby, so it is a bit confusing and undermaintained right now
<headius> related gems/artifacts... there's a combination of Ruby and Java libraries that support JRuby+Maven (several of which we use to build and test JRuby itself)
<boc_tothefuture[> @headius: I hadn't.. but that is perfect.
<headius> enebo: 9.3.2
<enebo[m]> yeah
<enebo[m]> we cannot consider it until we get GHA worked out but that is pretty close
<headius> I think we were close two weeks ago, what is outstanding now?
<headius> ah that is a good call
<headius> I will look at mattpatt PR today
<enebo[m]> I also just opened an issue with strftime perf improvements
<headius> noice
<enebo[m]> I would like to merge it but not until we have GHA
<enebo[m]> strftime for pattern logger uses is 3.3x faster
<enebo[m]> but I imagine that will generally translate to most format strings
<enebo[m]> It is a fairly complicated format so perhaps it gets faster the longer it is but it is all pretty linear
<enebo[m]> Once we get a callsite cache then this can be made faster again too
<headius> yeah for sure
<headius> looking at GHA PR and my first thought is about all these jobs
<enebo[m]> HAHAH
<enebo[m]> yeah it is a lot when you see it in this UI
<headius> at least they scroll now (for a while they just made the "checks" box gigantic) but there's sure a lot of them
lopex[m] has joined #jruby
<lopex[m]> heh I guess you already know about this https://openjdk.java.net/jeps/8277131
<enebo[m]> some of these can be groups like all spec/ruby ones but they need to toggle the group to not cancel on first fail
<headius> lopex: I did not know about the JEP but I knew this was coming
<headius> this is a fairly natural progression from coroutines
<lopex[m]> so internal green scheduler ?
<enebo[m]> That seems to just be a config. For me though I am ok with refining this and just getting something showing reasonable results first
<headius> if we could get some time to play with experimental stuff we could have true fibers within a week
<lopex[m]> M:N like BEAM ?
<headius> I believe it's more like a fiber scheduling system based on the coroutine work in loom
<lopex[m]> so cooperative..
<headius> yeah mostly that I guess... I am behind on it but they have taken care to make coroutines live well with blocking calls and locks and such so I guess there could be scheduler intervention at those points too
<headius> if we're lucky they are basically implementing the new Ruby 3 scheduler model and we'll be able to just plug it in
<headius> we have also been tossing around thoughts about what JDK level to go to when we drop 8
<enebo[m]> "The scheduler for virtual threads is a work stealing ForkJoinPool, that works in first-on-first-out (async) mode, and with parallelism set to the number of available processors."
<headius> clearly it won't be 9, but if we do it next year could it be 17?
<lopex[m]> 17 is LTS I guess ?
<enebo[m]> Feels like this will mostly just allow a lot of people to stop handrolling this
<headius> enebo: yeah so probably work-stealing at blocking-call points due to coroutine work
<headius> it already will work steal on subjob boundaries but this would expand it into fiber yields etc
<enebo[m]> perhaps some magic on seeing blocking beneath java?
<headius> they already have hooks to know when you are going to do a blocking IO or lock so at that point they can deschedule the current coroutine and run another one for a while
<enebo[m]> yeah
<enebo[m]> The evolution of Java has been interesting to watch
<lopex[m]> and I guess it will affect jit as well right ? since similar things arise like biasing etc
<enebo[m]> It started where nothing could ever change to build up its reliability across a large number of vendors
<enebo[m]> Then when externally pressured they realized they needed to improve things quicker
<enebo[m]> Now the firehose is wide open
<headius> it is funny and sad that I/we told them how useful this would be over a decade ago at JVMLS and I was shot down by none other than Doug Lea who tried to tell me that thread switching wasn't that bad and nobody needs more than 1000 concurrent jobs
<headius> thanks Doug
<enebo[m]> yeah rear guard was defeated by the time they switched release strides
<enebo[m]> Honestly adding new stuff under the covers is a much simpler sell than externally visible features
<enebo[m]> but people have been wishing for externally visible features more and they are providing them
<headius> we could see some interesting use cases if we got our fibers hooked up to Loom
<enebo[m]> This is sort of the shark decided to start swimming but the the difference between this and Ruby is people actually wanted these features for over a decade
<headius> yeah and were told they didn't really want them
<enebo[m]> yeah from google employees :) /ducks
<headius> so back to GHA...
<headius> this is promising already... mri:core passes along with most of the smaller suites
<enebo[m]> getlogin is not working on the image so I think we tag it
<headius> I will look into the failing suites and work with mattpatt to tweak them, or else we merge and just fix on branch
<enebo[m]> we have 6 other failures in spec/ruby which I actually see on fedora core as well but they are ipv6 or socket constants. We should probably tag all of these since they were never working
<enebo[m]> spec:ji has an issue with no 'touch' (which seems simple to workaround) but there might be another issue in there since a bunch of stuff is giving unhelpful error messages. I will but it is not having javac or something
<headius> JI specs failing is a a bit surprising
<headius> aha
<enebo[m]> it could all just be touch but that is the easy error
<enebo[m]> errors above those errors are mysterious so I figured something we compile is not happening which leads to weird errors...it is just a guess though
<headius> how can touch not exist
<enebo[m]> on a stripped down linux instancer?
<enebo[m]> I find it funny it is not there but it is pretty easy to see how it happens
<headius> well are these really stripped down? it is a basic POSIX utility
<headius> it would be like not having ls
<enebo[m]> Is it POSIX?
<headius> it is
<enebo[m]> Even if it wasn't it is GNU coreutils
<headius> yeah something else might be causing this
<headius> weird path or something
<headius> /usr/local/bin/touch or something
<headius> but that would still be really weird
<enebo[m]> yeah I thought of that but why would touch not be in same place as cp rm (unless those fail too)
<headius> TestSocketAddrinfo#test_addrinfo_new_inet [/home/runner/work/jruby/jruby/test/mri/socket/test_addrinfo.rb:90]:
<headius> <[46102, "127.0.0.2"]> expected but was
<headius> <[46102, "127.0.0.1"]>.
<headius> this is clearly a GHA ifconfig issue
<enebo[m]> ah yeah I forgot about that
<headius> that's the only one failing in mri:stdlib
<enebo[m]> yeah that is a bad MRI test
<enebo[m]> err no it isnt
<enebo[m]> ai = Addrinfo.new(["AF_INET", 46102, "localhost.localdomain", "127.0.0.2"])
<enebo[m]> assert_equal([46102, "127.0.0.2"], Socket.unpack_sockaddr_in(ai))
<headius> oh hmm
<enebo[m]> I thought it would be based on how much they hard-code 127.0.0.1 here
<headius> why would that run differently here
<enebo[m]> are we reverse looking up smoething based on localhost.localdomain?
<headius> that seems likely
<headius> well or at least a good possibility
<headius> but I have no 127.0.0.2 locally and it works
<enebo[m]> return RubyArray.newArray(runtime, addrinfo.ip_port(context),
<enebo[m]> addrinfo.ip_address(context));
<headius> hah
<enebo[m]> ip_address is the thing
<headius> yeah it goes back to the address and asks for its address
<headius> which may resolve back to 127.0.0.1 in this env
<enebo[m]> yeah and this dips into Java impl of this
<enebo[m]> heh it is possible AddrInfo changes this before that point in initialize too
<headius> looks like it from the JDK source
<headius> I don't see anything weird in the "get" methods
<headius> this passes for you on fedora?
<enebo[m]> I get no failures locally with test:mri
<headius> test:mri:stdlib is a subset, but I think included in test:mri
<enebo[m]> I get the 6 we see in spec/ruby but those I always thought were some env issue with ipv6
<enebo[m]> Let me try running locally
<enebo[m]> I thought that was the last part of doing spec:mri
<headius> task mri: ['test:mri:core:int', 'test:mri:extra:int', 'test:mri:stdlib:int']
<headius> so yeah it should run
<enebo[m]> yeah I just ran through these on my new_strftime branch before landing since I will not get CI
<headius> reading through the code nothing jumps out so it would help to be able to step through this and see where the address goes arong
<headius> wrong
<enebo[m]> GHA is basically just a container thing right?
<enebo[m]> Can we run them locally?
<headius> ubuntu 20.04
<headius> probably
<enebo[m]> but it has to be "some" ubuntu 20.04 image since it does not seem to have touch of getlogin
<enebo[m]> s/of/or
<headius> I have 20.04 running on docker locally, will try to repro
<headius> I always forget to install jdk headless
<headius> of course it works fine
<headius> I will try mri:stdlib run on this container and hope it fails
<headius> huh that one didn't fail but the ipv6 version did
<headius> TestSocketAddrinfo#test_addrinfo_new_inet6 [/jruby/test/mri/socket/test_addrinfo.rb:520]:
<headius> <[42304, "127.0.0.1"]>.
<headius> <[42304, "::1"]> expected but was
<headius> so something along that path is re-resolving this back to 127.0.0.1
<headius> I can repro at command line
<headius> I think I see it
<headius> if nodename is given it prefers that for getting the InetAddress
<headius> # bin/jruby -rsocket -e 'p Socket.unpack_sockaddr_in(Addrinfo.new(["AF_INET6", 42304]))'
<headius> ConcurrencyError: Detected invalid array contents due to unsynchronized modifications with concurrent users
<headius> woah
<headius> # bin/jruby -rsocket -e 'p Socket.unpack_sockaddr_in(Addrinfo.new(["AF_INET6", 42304]))'
<headius> ConcurrencyError: Detected invalid array contents due to unsynchronized modifications with concurrent users
<headius> not bounds-checking the incoming array at all
<enebo[m]> heh
<headius> I may have a fix
<headius> basically when it has both a host and an address it needs to use the InetAddress.getByAddress form
<headius> so it gets the requested hostname with the requested address rather than looking up the hostname
<headius> it was basically ignoring the address if you give a hostname in this form
<headius> I can't repro the ip4 version but that should work
<headius> the rubyspec failures almost look like it is laying out the sockaddr wrong for current platform
<headius> BasicSocket#recvmsg_nonblock using IPv6 using a connected but not bound socket raises Errno::ENOTCONN ERROR
<headius> but got: NotImplementedError (recvmsg_nonblock is not implemented)
<headius> Expected Errno::ENOTCONN
<headius> that is pretty weird
<enebo[m]> headius: looks good to me
<headius> I'm not thrilled about creating another InetAddress to get the address bytes but not sure what logic to use to turn an arbitrary string into appropriate ip4 or ip6 bytes
<headius> it should see this as an IP and not try to resolve so it should be cheap enough
<headius> ```Etc.getlogin.should == `logname`.chomp```
<headius> the related failure in rubyspec probably indicates `logname` is not available either
<enebo[m]> but getlogin was also an issue as wellright?
<headius> hmm this does check if logname can be run
<enebo[m]> Just to show that PR
<headius> nice
<enebo[m]> HAHAHA so longname does exist but getlogin doesn't
<headius> our etc should be using ffi though
<enebo[m]> and touch is not in PATH or there
<headius> 1) Dir globs (Dir.glob and Dir.[]) respects jar content filesystem changes
<headius> Errno::ENOENT:
<headius> No such file or directory - zip
<headius> Failure/Error: `zip -d #{jar_path} glob_target/bar.txt`
<headius> another one
<headius> instaling zip and touch would fix JI specs
<enebo[m]> we can write a ruby touch in a few lines
<headius> makes no sense for it not to be there so I still suspect something else is up
<enebo[m]> I would say the same of zip but that I guess depends on what we are doing with zip
<enebo[m]> sure
<headius> looks like this is using it to manipulate jars
<headius> could just be a jar comman
<headius> jar uf
<headius> how can touch not be there 😡
<enebo[m]> HAHAHA logname is there
<headius> yeah I am still stumped on that one... mspec has poor language for failures but I think what is failing may actually be the getlogin call
<enebo[m]> I thought it was getlogin returning nil
<headius> hmm or not?
<headius> Etc.getlogin.should == `logname`.chomp
<headius> error is Expected "runner" == ""
<headius> I assume the order is the same so getlogin returned "runner" and logname returned "" or just failed outright
<headius> in the log above
<headius> aha
<headius> ```logname: no login name```
<headius> how
<enebo[m]> I was going to say spec/mspec is better than test/unit in that is makes expected vs actual simpler to write but I cannot read that error string
<headius> The "logname: no login name" error is due to logname expecting a tty session which ssh normally does not provide and has nothing to do with your script. The -t option forces a tty so logname completes successfully
<headius> rando comment on some linux forum
<enebo[m]> ah yeah
<headius> this would need a patch in rubyspec
<enebo[m]> This happens for other things too
<enebo[m]> Interesting to think no one else is running this spec on GHA
<headius> travis probably set up a pty for their runs and gha doesn't
<enebo[m]> how are they running it?
<headius> hmm good question
<enebo[m]> Or maybe they are running an instance which does set up a pty
<enebo[m]> I would argue the PR would make sense regardless
<enebo[m]> it can still pass without tty/pty
<headius> I don't see anything odd in the rubyspec GHA
<enebo[m]> and they don't tag
<headius> the spec even has special logic for env TRAVIS
<enebo[m]> for MRI
<headius> this spec is bad
<headius> it also has logic that checks if Etc.getlogin returns nil, and if it does, it makes sure that Etc.getlogin returns ENV['USER']
<headius> which will probably never pass
<headius> I have patches for zip and login -t
<headius> logname -t
<headius> I merged the Addrinfo thing, will push this other stuff shortly
<headius> hmm great, logname -t is a gnuism
<headius> enebo: FileUtils also provides touch so I'll just use that
MateuszFryc[m] has joined #jruby
<MateuszFryc[m]> Hi all, I am wondering if some of you could shed some light on issue which I am struggling with, namely I migrated from jruby 1.7.2 to jruby 9.3.1.0. I am sending queries written in ruby to java server, and after migration it happened twice in last two days that jvm crashed with fatal error, during compilation of method from jruby. As far as I understand this crash occurs during OSR compilation by C2 compiler, and OSR_BCI equals to 408,
<MateuszFryc[m]> what is peculiar is the fact that each time it crashed OSR_BCI indicated 408th byte.
<MateuszFryc[m]> I tried to reproduce this issue, but without a success so far.
<MateuszFryc[m]> Could you please advice how to approach such problem ?
<MateuszFryc[m]> I attach small excerpt from hs_err file: https://pastebin.com/2Ytxw43s
<headius> Congrats on finding a jvm bug
<MateuszFryc[m]> # JRE version: OpenJDK Runtime Environment (17.0+35) (build 17+35-2724)
<MateuszFryc[m]> So it has not much in common with the fact that, crash occurs on compilation of parser_magic_comment, I suppose.
<MateuszFryc[m]> * common with JRUBY (the fact, * of parser_magic_comment), I
<MateuszFryc[m]> because this is peculiar that crash showed up on compiling this particular method twice. However other compilation of this method where OSR_BCI was different than 408 finshed sucessfully.
<headius> Usual procedure here is filing an issue with openjdk directly, which I can help with
<headius> There's really nothing we can do in jruby other than provide workarounds like not compiling this particular method
<headius> Any crash in the jvm compiler is entirely a jvm bug and they are usually pretty quick to jump on it
<MateuszFryc[m]> ok, I understand.
<headius> You could try disabling osr
<headius> Sometimes disabling the tiered compiler can help as well
<MateuszFryc[m]> well I have never submitted any bug to openjdk, so it would be probably helpful If you could direct me somehow.
<headius> Via JVM flags
<MateuszFryc[m]> Well, but as I understand this kind of configuration can decrease the overall performance of my app.
<headius> You need to be an open jdk member to directly file bugs but you could also post this to the public hotspot compiler list and have the same effect
<headius> Disabling osr is unlikely to affect performance since most methods will compile normally. Osr just allows it to replace code that's running on the stack, like a long loop
<headius> If it can't osr than it will just replace the method for future calls
<headius> If the bug is actually in how it is compiling this method then we we'll need to find another option
<MateuszFryc[m]> yeah, I read some articles about this technique. Some long running loops could underperfom then If I understand correctly.
<headius> Only if they never exit. If they are hot and eventually return then future calls should use the optimized version
<MateuszFryc[m]> But of course it would be worth to try it, and check experimentally whether this will have visible impact on my server.
<MateuszFryc[m]> If the issue is only with compilation of this particular method parser_magic_comment, then most likely I will be able to workaround it by excluding from compiling this single method.. I guess.
<headius> And if this is failing in C2 using tiered compilation, the worst case is that the method is only compiled by C1 and should still be plenty fast
<headius> Anyway that would be my first recommendation for you to try, and we should raise this up to openjdk folks one way or another
<headius> Especially since this is in a 17 release
<headius> Clearly it would help if we could reproduce but if you know exactly what method is failing that might be enough
<MateuszFryc[m]> Well I tried to reproduce it, but without a success, I didn't come across same compilation (same method (parser_magic_comment)/compilation type OSR/phase - 4/OSR_BCI 408)
<MateuszFryc[m]> I have two hs_err files, probably they could be handy?
<headius> For sure
<headius> I'm on mobile for a few minutes but I will have a look at them in a bit
<MateuszFryc[m]> Perhaps a stupid question, because program which crashes doesn't belong to me. Are there some vulnerable information in such hs_err file, which I wouldn't like to leak out ? Shuld I sanitize it somehow, can you advice?
<headius> There definitely could be environment specific information, like file paths. Hard to say what would really be sensitive
<headius> It's just text so have a look at it and see if anything looks risky
<headius> I know they try to keep sensitive information out of there so mostly file paths and call stacks you can't really avoid
<MateuszFryc[m]> what about binary data? from registers etc?
<mattpatt[m]> @headius: will be around for a bit once kid is down if you want to look at GHA stuff
<headius> mattpatt: yeah cool, I will push some of these other fixes and have you rebase or merge and see how we're doing
<headius> I am leaning towards merging before it is green just to make it easier to test small fixes and to get the green jobs running
<MattWelke[m]> Hey headius just wanted to say that JS for being so active here and answering questions, including my own. I'm not using JRuby at work right now so it's mostly a weekend thing for me because I find the project interesting. I'll read up on your answers next time I'm tinkering.
<MattWelke[m]> P.s. hope you're feeling better. Saw the Twitter messages.
<MattWelke[m]> s/that/thanks/, s/JS//
<headius> Yeah just a bit tired at this point
<headius> enebo: this login spec also has a fallback to the 'id' command, which is also posix.2 and may not have the same tty issues
<headius> Matt Welke: let us know if you get to the point of running something larger on JRuby, we're happy to help you bootstrap
<enebo[m]> headius: I wonder why id is not used
<headius> yeah not sure
<headius> it could also read LOGNAME env but there are clearly a bunch of possible fallbacks here and no clear way to choose the right one
<MateuszFryc[m]> this is hs_err log file from one crash.
<MateuszFryc[m]> if you could help me @headius to send it to openjdk team I would be grateful :)
<headius> yup, that's definitely a hotspot bug
<MateuszFryc[m]> the question is also whether they will be able to handle it without the way to reproduce it.
<headius> SEGV no less
<headius> Mateusz Fryc: I have yet to dump a crash on them they couldn't figure out
<MattWelke[m]> <headius> "Matt Welke: let us know if you..." <- Yeah for sure. I doubt that'd happen at my work. We have many languages in our stack already, none of which are Ruby. But we do use the JVM a lot. Some things I want to explore out of curiosity though are writing Apache Beam pipelines and ML stuff in Ruby.
<headius> feel free to post there what you've told us here + hs_err dump
<headius> drop my name if you like and I will monitor the list
<headius> I'd open an issue directly but I'm not sure about procedure for crashes like this
<MattWelke[m]> In the Beam ecosystem, right now, the Java stuff is better than the Python stuff (and those are the only two langs officially supported). Lots of Beam users hate having to code in Java. Static typing can slow them down when writing data pipelines. Hence me being curious what it'd be like using Ruby.
<MateuszFryc[m]> ok, thank you. I will do it this week, now I go to supper ;] thank you for your help and time.
<headius> Matt Welke: yeah that sounds like a great use case
<headius> Mateusz Fryc: if you get a chance open a JRuby issue too, so we can track discussion there
<headius> it's not a bug we can fix but we like to have a record of it in our tracker
<MateuszFryc[m]> Ok.
<headius> enebo: without knowing why one would use `logname` instead of `id` this might be workable
<headius> I will open an issue on ruby/spec for discussion
<mattpatt[m]> headius: think I'm out of the woods and kid is asleep, I'll be about for an hour or so before I turn into a zombie
<enebo[m]> heh that is way too much logic for this too. If it is that hard to determine a username then perhaps it should be its own gem
<enebo[m]> not that we want a spec library to depend on a gem
<enebo[m]> I guess env edges are always where you see this sort of logic though
<headius> we could just tag this for now and untag when it is fixed
<headius> I just don't know what the "best" way to get login user is
<headius> BTW I have not been able to find any standard image for the GHA env
<headius> it may exist but naive searches did not find it
<mattpatt[m]> headius: doubt there's a standard image accessible outside GHA
<mattpatt[m]> if they're anything like Travis there's a lot of extra stuff baked in
<mattpatt[m]> and it'll change regularly
<mattpatt[m]> My assumption is that the main difference between Travis and GHA as far as our tests are concerned is that it seems like Travis ran stuff in a login shell and GHA is using a non-interactive shell
<headius> yeah seems like that
<mattpatt[m]> (Kinds of errors here smell like login/interactive vs non-interactive)
<headius> travis did publish a docker image you could run that would be identical to their env, but that was years ago
<headius> mattpatt: jruby-9.3 branch has all my current fixes: `jar` instead of `zip`, FileUtils.touch instead of `touch`, and an Addrinfo.new fix
<mattpatt[m]> rebasing now
<headius> that should get test:mri:stdlib and spec:ji green at least
<headius> enebo: looks like logname on BSDish does not have the terminal requirement
<MateuszFryc[m]> so basically, I should post an email to hotspot-compiler-dev@openjdk.java.net. and attach hs_err file? I thought there will be some site where I could upload it. Correct me If I am wrong ;)
<headius> Mateusz Fryc: yes do that but put hs_err in a gist or something and link to it
<MateuszFryc[m]> ok
<enebo[m]> bleh
<MateuszFryc[m]> thank you once again.
<headius> Mateusz Fryc: of course you can try other JDK versions and maybe it will go away... this method has been in JRuby for a long time and this is the first time it has crashed the JVM
<MattWelke[m]> <headius> "BTW I have not been able to find..." <- Is this helpful? https://github.com/actions/virtual-environments
<MattWelke[m]> GHA runs workflows in VMs by default. You have to opt in to using a Docker container to run the workflow in. Their docs on networking for backing services talks about that because it has implications for whether you use `localhost` or a DNS name derived from part of the workflow YAML to connect to your backing service.
<MattWelke[m]> But there they describe the environment present in the VM your workflow runs in.
<headius> mattpatt: these are all really close to green as it is
<headius> Matt Welke: that's useful information about the env at least
<headius> the two main types of things we are seeing failing are terminal-related and socket-related
<MattWelke[m]> Ah. That's all I've ever been able to find on the environments sorry.
<headius> no worires
<mattpatt[m]> Matt Welke: that's super useful
<headius> we had to do this dart-throwing to get other suites running in GHA so it's fine
<enebo[m]> mattpatt: I think you should just squash and we can merge and then solve these one by one on jruby-9.3
<headius> agree
<MattWelke[m]> You can check out https://docs.github.com/en/actions/using-containerized-services/about-service-containers though for examples on running your workflow in the runner (a VM) vs. in a container running on the runner.
<mattpatt[m]> okay, I'll do that. It needs to be two commits because of the whole reusable-workflow thing requiring SHAs.
<mattpatt[m]> Once we're green we can look at getting the snapshot deploy task working.
<enebo[m]> mattpatt: cool
<mattpatt[m]> headiusenebo: Updated. sha ref means the whole thing will error until it's merged
<mattpatt[m]> merging it should magically make it work provided the SHAs of the commits don't change
<mattpatt[m]> if that's a problem, then we could do one merge for the resuable workflow file, which is harmless by itself, and then do a merge where the PR checks will actually run the actiosn
<mattpatt[m]> tbh, that's probably better
<mattpatt[m]> lmk
<enebo[m]> mattpatt: merged.
<mattpatt[m]> enebo: welp, the CI workflow is running now, so the SHAs work now :-)
<MateuszFryc[m]> I sent an email with description of the problem to the hot spot email group. Hopefully someone will have a chance to take a look at it, unfortunately I don't know how to track discussion when it starts. ISo, when it reaches you, please open issue in jruby yourself.
<MateuszFryc[m]> * @headius I sent
<MateuszFryc[m]> * headius: I sent
<headius> ok
<mattpatt[m]> headius: first complete run of GHA CI on the 9.3 branch is done. Do you want to pick a file to go over? Not sure what you've already been looking at
<headius> nice let's have a look
<headius> odd failure on JI specs
<headius> this is one of the file I change wrt `touch` and `zip`
<headius> mattpatt: I don't have a process in mind... just looking at these failures and trying to suss out why they fail on GHA but not travis
<mattpatt[m]> i'm not sure what's what
<mattpatt[m]> the last Travis build I can find is from at least a year ago
<headius> well your job may be done, but if you look at these and see something obvious we can patch it
<headius> most will require a bit of digging to figure out what's different in this environment
<headius> e
<mattpatt[m]> the JI specs at least all fail on the same spec
<headius> ugh yeah it fails locally too... i'll figure out what I didn't fix
<headius> - `zip -d #{jar_path} glob_target/bar.txt`
<headius> + `jar uf #{jar_path} glob_target/bar.txt`
<headius> bleh
<headius> s/+/-/
<headius> * ```
<headius> - `zip -d #{jar_path} glob_target/bar.txt`
<headius> + `jar uf #{jar_path} glob_target/bar.txt`
<headius> ```
<headius> oh duh
<headius> -d delete
<headius> maybe that's why this used zip
<mattpatt[m]> hrm
<headius> I would rather not have to install zip to do this but jar doesn't support deleting a file
<mattpatt[m]> they claim zip is installed
<headius> there must be some pathing issue affecting that and `touch`
<mattpatt[m]> maybe it's a $PATH thing?
<mattpatt[m]> gimme a sec. just finding windows. seems my mechanical keyboard woke the kid
<mattpatt[m]> had to relocate further away
<headius> hah, mine told me last spring that his fellow remote learners complained about the noise
<headius> I do type hard and fast
<mattpatt[m]> no obvious path manipulation in the workflow file itself
<headius> I don't know that we do anything odd with path in the JI run either
<mattpatt[m]> gonna put a test workflow up and see about env info
<mattpatt[m]> touch and zip, right>
<mattpatt[m]> s/>/?/
<headius> yeah that's the two I patched in the JI specs
<headius> touch change seems fine because it just uses a Ruby util for it but the zip one is not working
<headius> we can just use zip if we can sort out the pathing on GHA
<headius> mattpatt: other thing that might be a clue is how ruby/spec runs in GHA itself: https://github.com/ruby/spec/blob/master/.github/workflows/ci.yml
<headius> may be something configured differently that allows stuff to work better
<headius> I don't see anything obvious though
<mattpatt[m]> well that was unhelpful
<headius> Looks like it should work
<mattpatt[m]> did that from inside Rake so it was being invoked with jruby -S
<mattpatt[m]> PATH looks sane, which finds zip
<mattpatt[m]> so that looks good
<mattpatt[m]> so problem must be deeper in
<headius> I need a little lie down so handing off to enebo
<mattpatt[m]> headius enebo: if you use `/usr/bin/zip -d` instead of `zip -d` it passes spec:ji
<mattpatt[m]> yuck
<enebo[m]> hmm
<enebo[m]> mattpatt: we could `which zip` and when none is found just hard-code the path with a comment GHA requires it
<enebo[m]> This is not really ideal but it would be good to move past this
<enebo[m]> if PATH is really the path in your output at that point I think the only issue I can think of is there is a file called zip higher which is not marked with an execute bit
<mattpatt[m]> enebo: i'm trying again with only `zip -d`
<mattpatt[m]> if that fails i'm going to dump PATH into the output from the spec and see if that's different to the $PATH from the root of the Rakefile
<enebo[m]> Your Rakefile experiment looking for a zip in each PATH dir might be worth it too
<enebo[m]> although a changed PATH seems likely
<mattpatt[m]> Oh wow: Failure/Error: puts `which zip`
<mattpatt[m]> okay, i gotta sleep. enebo I'll poke at this again in the morning.
<enebo[m]> ok. Thanks for the efforts!
aquijoule_ has joined #jruby
richbridger has quit [Ping timeout: 256 seconds]