#jruby on 2023-07-31 — irc logs at libera.irclog.whitequark.org

2023-06-07 22:00 enebo changed the topic of #jruby to: Get 9.4.3.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:33 subbu has joined #jruby

01:01 subbu has quit [Ping timeout: 245 seconds]

01:12 razetime has joined #jruby

04:15 razetime has quit [Ping timeout: 252 seconds]

05:01 razetime has joined #jruby

06:35 razetime has quit [Ping timeout: 245 seconds]

06:35 razetime has joined #jruby

06:55 razetime has quit [Ping timeout: 245 seconds]

06:56 razetime has joined #jruby

08:12 razetime has quit [Ping timeout: 246 seconds]

08:14 razetime has joined #jruby

08:22 razetime has quit [Ping timeout: 250 seconds]

09:01 AlenSebastian[m] has quit [Remote host closed the connection]

13:01 lopex[m] has quit [Remote host closed the connection]

13:01 ahorek[m] has quit [Remote host closed the connection]

13:01 JasonLunn[m] has quit [Remote host closed the connection]

13:01 Davinci[m] has quit [Remote host closed the connection]

13:01 duc[m] has quit [Remote host closed the connection]

13:01 headius has quit [Read error: Connection reset by peer]

13:01 nilsding has quit [Remote host closed the connection]

13:02 enebo[m] has quit [Remote host closed the connection]

13:15 enebo[m] has joined #jruby

13:24 subbu has joined #jruby

13:47 subbu has quit [Ping timeout: 245 seconds]

14:25 nilsding has joined #jruby

14:25 duc[m] has joined #jruby

14:26 ahorek[m] has joined #jruby

14:26 Davinci[m] has joined #jruby

14:26 JasonLunn[m] has joined #jruby

14:26 lopex[m] has joined #jruby

14:26 headius has joined #jruby

16:03 razetime has joined #jruby

16:04 razetime has quit [Client Quit]

16:04 razetime has joined #jruby

16:04 razetime has quit [Client Quit]

17:18 kalenp[m] has joined #jruby

17:32 <kalenp[m]> Good morning JRuby friends. Kalen from Looker here. We're in the process of upgrading from 9.2 to 9.3 and are investigating a potential memory issue. Looking to see if you might have some tips for investigation so we can confirm if it's an issue on our end or yours.

17:36 <kalenp[m]> The situation is that our CI job is getting OOMs after the upgrade which we weren't previously. Created a gist with the errors that we're seeing https://gist.github.com/kalenp/f64d408f8f1a1f5668f9b03f26219fe8

17:37 <kalenp[m]> So first an error about running out of heap space, then the CodeHeap warnings. The second might just e a side effect of the first, but maybe somebody with more understanding of those systems can tell if they're not.

17:37 <kalenp[m]> Are there any known issues or changes between 9.2 and 9.3 that would lead to increased memory usage?

17:39 <enebo[m]> kalenp: there shouldn't be a difference but a lot of code has changed over time

17:40 <enebo[m]> kalenp: I imagine 9.4 is too big a jump for your app? I only ask because there is a lot more effort on 9.4 in the last year or two

17:41 <enebo[m]> A heap dump earlier and later may expose what is growing enough to get a clue

17:41 <kalenp[m]> yeah, we're not quite ready to do the 9.4 jump. even 9.3 required updating quite a few gems and addressing bugs, so we're taking it stepwise

17:42 <enebo[m]> yeah makes sense

17:42 <enebo[m]> Since OOME is first I don't think we can really trust what is reported later

17:43 <enebo[m]> 9.2 -> 9.3 is largely just incremental changes but so many moving parts

17:43 <kalenp[m]> this might be a general jvm question, but I added some logging for Total and Free memory during the run and don't see it approaching 4GB at any point. are there other rough metrics I could/should be watching before trying to take a full heap dump?

17:45 <enebo[m]> when you say that do you mean it is saw horsing up and down on the 4Gb but collections knock it back down low or you mean it never gets close to 4Gb?

17:46 <enebo[m]> JVM tends to be really lazy generally so I expect it to use a lot of that heap to do less collections

17:47 <kalenp[m]> Total is never reported above 2GB. Threw in a GC to get more stable numbers, because it was sawtoothing at lot and so it was hard to see what was actually live.

17:47 <enebo[m]> headius: you around?

17:49 <enebo[m]> We have had some issues over time where we leak in a special way where it is non-heap but it is uncommon

17:49 <enebo[m]> This would be a weird test but you could run with JIT disabled and see if you see the problem

17:49 <kalenp[m]> Looker is good at hitting those sort of bugs historically :)

17:50 <enebo[m]> non-heap memory issues would be us doing something wrong like re-making the same method over and over but not losing the reference to the old one. so you would see zillions of classloaders (and its non-heap counterpart growing)

17:52 <enebo[m]> If -X-C (or --dev) do not cause any issues then something in JIT is getting tripped up but since interp is slower it will take more time probably to figure out if it is actually ok

17:53 <enebo[m]> If you are using -Xcompile.invokedynamic that would be a case of more generated code. I do not expect that is buggy but just thinking outloud

17:54 <kalenp[m]> already in slow repro land. tests take 30 minutes to start failing, plus build times in CI. so trying to get a few different ideas going in parallel

17:54 <enebo[m]> hooking up with visualvm may show you something on one of the pages

17:54 <enebo[m]> I am on a new laptop and have nothing helpful setup yet beyond my IDE :)

17:55 <enebo[m]> I think some of the mbeans will show how many methods have been JITTed (in Java). JRuby has some beans showing how much we JIT in Ruby. It should show how many classloaders are there

17:55 <enebo[m]> for JIT we use on classloader per JITted methods so it won't be a small number

17:56 <kalenp[m]> ok, so some things to try: kick off a run with -X-C (is that an argument to jruby directly, or a java argument?). run it normal and attach visualvm to look for classloaders or other unexpected outliers being allocated

17:56 <enebo[m]> we also have a very long tail for JITTing since we use method call count as the metric (I think 50 calls by default)

17:56 <enebo[m]> -X-C is a call to JRuby not Java

17:57 <enebo[m]> -Djruby.compile.mode=OFF is what it does

17:58 <kalenp[m]> cool. I can go get those things started and see if I get some more data. I'll keep this open in case Headius jumps in with more ideas. thanks for the tips!

17:58 <enebo[m]> interestingly we still compile methods at IR level using OFF but it does not generate java bytecode it just makes more complicated IR

17:59 <enebo[m]> -Djruby.jit.threshold=-1 will disable doing that but I doubt it matters in this case

18:00 <enebo[m]> kalenp: if you see it happen at a consistent enough point in time and it is a function of JIT then changing threshold to let's say 100 from default of 50 would cause the OOME to report later

18:00 <enebo[m]> but we don't know it is JIT.

18:01 <enebo[m]> The other angle would be native exceptions but then I think it would be untracked memory and you would not see OOME but you would hit some setrlimit process size thing

18:01 <enebo[m]> or I think so anyways

18:12 <kalenp[m]> oh, another thing which I noticed is that this is our coverage CI job, so it's running with --debug to get coverage data. for our non-coverage tests, we're not seeing OOME, but it's also broken into smaller slices. 9.2 worked even for coverage, but it's another piece of the matrix here

18:12 <headius> hey hey

18:14 <headius> non-heap leak you say>

18:15 <enebo[m]> hmm coverage

18:15 <headius> OOM is only heap so that's just a leak or it's using more memory than available heap

18:15 <enebo[m]> I would say if it was coverage specifically then it would just be a heap problem but --debug does more than jsut enable coverage

18:17 <headius> the CodeHeap stuff may be related but I would not expect that unless the OOM is a symptom of the host system not having enough memory

18:17 <enebo[m]> so they do not see much looking while it runs but something ends up using a lot of heap

18:17 <headius> yeah this OOM says heap space so I'd expect a normal sort of leak on the heap for that

18:18 <headius> it would look different if it were failure to allocate more native memory or something

18:18 <enebo[m]> I fixed a gnarly issue on 9.4 with coverage (which I could backport to 9.3). It is at least the third time I have tweaked around calls needing proper line number separate from profiling/coverage line numbers

18:18 <headius> so I'd say a heap dump is the next step I'd take

18:18 <enebo[m]> it is not this problem but a coincidence

18:19 <enebo[m]> yeah if OOME can only be from heap issue then definitely

18:19 <enebo[m]> non-heap is so rare I did not even realize that

18:19 <kalenp[m]> looks like we're running with -J-XX:+HeapDumpOnOutOfMemoryError, but we're not actually saving those. working with our CI team to get those saved

18:19 <headius> coverage has improved over time to track more/better data so I would not be surprised if it's using more memory, but I wouldn't expect it to be using 100s of MB more

18:19 <headius> kalenp: ahh yeah nice

18:20 <headius> you can also just use visualvm or jconsole to get a dump when it's obvious a process is on its way to OOM

18:20 <enebo[m]> it will be the filename in every line as a separate char[]

18:21 <headius> enebo: OOM is used for lots of things but the one kalenp posted explicitly has the heap error

18:21 <headius> other types of OOM will say failure to allocate memory or thread or whatever

18:22 <headius> or stack...I think you can see OOM for stack size exceeded

18:22 <enebo[m]> I guess it literally says ran out of heap so lol

18:22 <enebo[m]> https://github.com/jruby/jruby/issues/7859

18:23 <enebo[m]> I am hoping for calls at least I never have to deal with this again

18:23 <headius> CodeHeap could be related if we are jitting too much stuff or hanging on to transient jitted methods that should go away

18:23 <enebo[m]> this did make me ponder what IR would look like if we baked line into instr like we do for AST nodes

18:24 <enebo[m]> It has a big advantage for interp by not having those instrs but a number of disadvantages too

18:24 <headius> you mean not emitting LineNumber instrs>

18:24 <headius> ?

18:24 <enebo[m]> I suppose though for JIT since all instrs are in-order that is reasonably simple

18:24 <enebo[m]> yeah

18:25 <enebo[m]> for IRbuilding it is complicated because operand building is not in order but addinstr is in order

18:25 <headius> for JIT it would make little difference... I just have a "current line" value and whenever I see a Line Number I update that and emit the line number bytecode stuff if it changed

18:25 <enebo[m]> so I ended up comparing lastLineNum or whatever field and whackiness

18:25 <enebo[m]> It adds 4 bytes to each instr but then we lose line num

18:26 <enebo[m]> and it is possible there are a number of instrs which do not need line as a field but that is complicated

18:26 <headius> I'm going to review that launcher PR from mrnoname so we can merge it

18:26 <enebo[m]> yeah I just want it in sooner than later

18:27 <headius> oh so the other order of business for me this week is to wrap up the mavengem stuff

18:27 <headius> I still need to get the "bundler API" endpoints working, which will require some exploration

18:27 <enebo[m]> sounds like you are running all tests now?

18:27 <headius> I think we should move this under org.jruby groupID since the old one is org.torquebox.mojo and TB is defunct now anyway

18:28 <enebo[m]> yes

18:28 <enebo[m]> all the things should be in the org at this point

18:28 <headius> I am "running" all tests, but the ones related to bundler API or still dependent on the dependencies?gems=foo,bar multiple result API are failing

18:28 <headius> other than that everything works up the stack... the failing features are not used by mavengem itself

18:29 <enebo[m]> I sort of feel like unless we get enough community mass around something we should just put it in this org

18:29 <headius> I could leave them in place in that library and disable tests in a pinch

18:29 <headius> nobody will be using the new group so they would not see any breakage... but of course it will stop working for those APIs anyway on Aug 8

18:30 <enebo[m]> ah yeah but as far as you know nothing is using the bundler api stuff?

18:31 <headius> nothing I know of right now

18:31 <headius> it might be used by the maven-tools gem based on this rubygems-tools Java library

18:31 <enebo[m]> heh...I suppose you could wait for the shoe to drop or just make it work

18:31 <headius> this stack is large... it makes me realize how much integration work was going on around TB at the time

18:32 <enebo[m]> I have been wondering about how much magic is lost in tb after ben stopped working on it

18:32 <enebo[m]> He was opting hard on the techempower benchmarks

18:33 <headius> yeah it was good work

19:04 <headius> enebo: waiting on one review question for launcher fixes but otherwise it seems fine

19:04 <headius> I have mrnoname on chat now

19:15 <headius> enebo: org.jruby or org.jruby.maven or what for groupID?

19:15 <headius> TB isolated this stuff under org.torquebox.mojo

19:24 <enebo[m]> ok

19:53 <headius> Ok what

19:56 <enebo[m]> ok

19:56 <enebo[m]> lol when you said you had mrnoname on chat I thought he was coming on here then forgot about it

19:56 <enebo[m]> but he wasn't so it didn't matter

20:08 <headius> So org.jruby.maven maybe

20:09 <headius> This shouldn't be a top level artifact in org.jruby

20:15 <enebo[m]> sure. that makes sense

21:11 <headius> heh

21:11 <headius> https://repo1.maven.org/maven2/org/jruby/extras/goldspike/1.6.1/goldspike-1.6.1.pom

21:12 <headius> seems like the only sub-group we have used is org.jruby.extras

21:12 <headius> I don't think anything in here is used anymore: https://repo1.maven.org/maven2/org/jruby/extras/

21:52 subbu has joined #jruby

22:07 subbu has quit [Ping timeout: 264 seconds]

22:32 lopex[m] has quit [Ping timeout: 245 seconds]

22:32 duc[m] has quit [Ping timeout: 245 seconds]

22:32 ahorek[m] has quit [Ping timeout: 240 seconds]

22:36 headius has quit [Ping timeout: 246 seconds]

22:36 Davinci[m] has quit [Ping timeout: 246 seconds]

22:36 JasonLunn[m] has quit [Ping timeout: 246 seconds]

22:36 kalenp[m] has quit [Ping timeout: 246 seconds]

22:36 enebo[m] has quit [Ping timeout: 246 seconds]

22:36 nilsding has quit [Ping timeout: 246 seconds]