#jruby on 2023-02-27 — irc logs at libera.irclog.whitequark.org

2023-02-07 16:51 ChanServ changed the topic of #jruby to: Get 9.4.1.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

14:34 <byteit101[m]> hmm, Process::Status is a weird class. I think I can redefine it in subspawn yet leave it intact from both a CRuby and JRuby POV when calling Process.waitpid

14:46 <enebo[m]> ahorek: I am taking today off but it would be nice to figure this joni issue out

14:47 <enebo[m]> That revert got rid of some huge pathological issue which seemed to be causing a lot of extra searching

14:48 <enebo[m]> lopex: if you have any time it would be nice to talk to you about how we can better integrate reading/detecting broken encded strings in joni (I am not on today though)

14:52 <lopex[m]> enebo: https://github.com/jruby/jcodings/issues/26

14:52 <lopex[m]> and https://github.com/jruby/jruby/wiki/Encodings-in-JRuby

14:53 <enebo[m]> lopex: thanks. I will look more tomorrow. It sounds like work :)

14:54 <lopex[m]> yeah, it's a mess

14:55 <lopex[m]> the problem is that mri does quit a bit of addictional work for already validated strings

14:56 <lopex[m]> not to mention the plethora of those functions

14:56 <enebo[m]> lopex: but we are seemingly not doing enough for unvalidated ones to not do extra for validated ones. Does that soundright?

14:57 <lopex[m]> yeah, like returning -1 for broken characters, mri falls back to 1 just to make any advencements in those routines

14:57 <lopex[m]> that sunday search was about that case too afaik

14:58 <enebo[m]> so without diving into this today if we do not have BROKEN as cr things work fine and do not do extra work otherwise perhaps we see a single byte as length 1 which shouldn't be

14:58 <enebo[m]> oh I see

14:58 <enebo[m]> that is weird

14:58 <lopex[m]> yeah, we'll enter infinite loops in some cases

14:59 <enebo[m]> so they get -1 but the method will go 'whelp let's pretend it was 1 valid byte and keep searching'

14:59 <lopex[m]> but first what we have to do what semantics are we gonna use on what code layer

14:59 <lopex[m]> yeah

15:00 <lopex[m]> https://github.com/jruby/jruby/wiki/Encodings-in-JRuby

15:00 <lopex[m]> onigenc_mbclen_approximate

15:00 <lopex[m]> return 1; at the end

15:00 <enebo[m]> I can imagine dealing with internet data means broken encodings so perhaps this is needed but what a weird semantic

15:01 <lopex[m]> so, there's valid length, missing, and 1

15:01 <lopex[m]> being returned

15:01 <enebo[m]> so perhaps this may end up being passing CR into joni somewhere and ternary check on broken

15:01 <enebo[m]> valid length is also 1 right? so single byte ascii and broken are both 1

15:01 <lopex[m]> the weird semantics some from the fact that there's missmatch between mri and oni

15:02 <lopex[m]> and different usages - like parsing, processing etc

15:02 <lopex[m]> in the old days oniguruma didnt have any validation

15:02 <enebo[m]> lopex: yeah I am sure there are differences I just meant with the decision to support regexping on broken data

15:03 <enebo[m]> ok !validation explains why it may not be so consistent

15:03 <lopex[m]> yeah, passing at least "validated" would help a lot

15:03 <enebo[m]> they added it over time probably to address reported issues

15:04 <enebo[m]> having a boolean branch per char would be a cost but it would be less cost than doing the ambiguous one all the time

15:04 <lopex[m]> yeah, when porting oni, there's was only char* and not char*, int p, int e on all of those functions

15:04 <lopex[m]> yeah, but it will pay as you wont have costlies length functions

15:04 <lopex[m]> *costlier

15:04 <enebo[m]> plus most strings are not broken so it would likely still opt ok most of the time

15:05 <lopex[m]> for no broken we already do fine

15:05 <enebo[m]> bifurcation of bloken and clean interpreters would be too much right?

15:05 <enebo[m]> oh ok I am confused no

15:05 <enebo[m]> now

15:06 <enebo[m]> I thought processing broken what was the issue and we were not giving 1 but -1

15:06 <lopex[m]> yes

15:06 <enebo[m]> I asumed clean strings are fine with what we have

15:06 <lopex[m]> yes

15:06 <enebo[m]> ok so then it is about adding the cost of handling broken strings without hurting perf of ok ones right?

15:07 <lopex[m]> right

15:07 <enebo[m]> ok

15:07 <enebo[m]> If you have some time tomorrow I may be asking some more questions but I am repressing the urge to look at code

15:08 <enebo[m]> I missed a company holiday last week and decided today would replace that

15:08 <enebo[m]> err I guess two weeks ago

15:08 <lopex[m]> also, you could take a look at https://github.com/jruby/jcodings/blob/master/src/org/jcodings/MultiByteEncoding.java#L57

15:08 <lopex[m]> since all those are being used for unicode

15:08 <lopex[m]> maybe they could be improved

15:09 <enebo[m]> cool

15:10 <enebo[m]> always another thing. Seeing that I also wonder how much of these methods could be made static

15:10 <enebo[m]> I know at some level this is polymorphic

15:10 <lopex[m]> they use instance tables

15:10 <enebo[m]> but some of the methods which call methods could have those methods static which would force inlining quicker

15:11 <enebo[m]> Like the method you referenced calls lengthForTwoUptoFour

15:11 <enebo[m]> if that was static it would inline but I also imagine that method is used by subclasses directly

15:12 <lopex[m]> TransZero is instance

15:13 <enebo[m]> sure I am just referring to the methods acting on that instance I guess

15:13 <enebo[m]> as they are all protected instance methods

15:13 <lopex[m]> then yeah

15:14 <enebo[m]> anything which calls anything else in there could possible speed up inlining decision (but perhaps not actually change overall perf)

15:14 <enebo[m]> and who knows

15:14 <enebo[m]> funny to think jcodings for most people end up being US-ASCII, UTF-8, and 8BIT(BINARY)

15:15 <enebo[m]> most of the time we do not even use the ascii paths in jcodings for fast path

15:15 <enebo[m]> so it is probably 99% UTF-8

15:16 <enebo[m]> The crazy thing to do is make pure non-inherited all static paths for UTF-8 in jcodings and joni

15:16 <enebo[m]> anyways pie in the sky and possibly not worth it

15:17 <enebo[m]> I just noticed on the still yet to be landed String split branch I made I got big gains from that technique

15:17 <enebo[m]> (well for US-ASCII hot paths)

15:18 <enebo[m]> ok I am going to KVM away for rest of the day

15:18 <enebo[m]> windows machine is my escape from social media and OSS stuff

16:24 <headius> I won't be working today but perhaps tonight and definitely tomorrow. I'd like to at least get us settled for 9.4.2 even if that has to happen next week

16:52 genpaku has quit [Read error: Connection reset by peer]

16:56 genpaku has joined #jruby

19:31 rcrews[m] has joined #jruby

19:42 <rcrews[m]> I just saw the note about 9.4.2 and the request to hear about blockers. I've been struggling with a new error since upgrading to 9.4.(1). My issue seems to be specific to the using the AWS S3 gem, but it is causing problems with code that has not had problems before. Not sure of the etiquette here, so here goes. I regularly get this error, which I have never gotten before: ERROR: org.jruby.embed.EvalFailedException: (RequestTimeout) Your

19:42 <rcrews[m]> socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. ... I'm not getting this with other gems, and I've tried adjusting the AWS connection timeout values several times, up from defaults of 5-60 seconds to 5-10 minutes. Nothing helps. My network is fine for all other uses. It's just the combination of JRuby with the aws-sdk-s3 gem. Is this something that sounds like it

19:42 <rcrews[m]> might be a problem that could be caused by JRuby or solved by a different JRuby configuration setting?

21:36 <headius> rcrews: and you say this was a new issue in 9.4.1? You should definitely open an issue with as much info as you can, and if it's possible to give us a reproduction that will be a huge help

22:00 <rcrews[m]> Thank you. I will see what I can come up with. However, accessing AWS content requires AWS credentials so I'm not sure what kind of test case I can provide. I was hoping someone here might be able to think about what thows/raises org.jruby.embed.EvalFailedException and consider what might have changed recently. In any case, I will report back when I arrive at a work-around. Thanks again

23:02 <headius> That exception usually wraps a more specific one. Put your info into a bug and we may be able to turn on some verbose or debug logging.

23:02 <headius> It's kind of a catch-all exception when using the JRuby scripting APIs from Java

23:03 <headius> If you have some freedom to iterate we could even provide a debug build with more logging and figure out what's going on