<byteit101[m]> hmm, Process::Status is a weird class. I think I can redefine it in subspawn yet leave it intact from both a CRuby and JRuby POV when calling Process.waitpid
<enebo[m]> ahorek: I am taking today off but it would be nice to figure this joni issue out
<enebo[m]> That revert got rid of some huge pathological issue which seemed to be causing a lot of extra searching
<enebo[m]> lopex: if you have any time it would be nice to talk to you about how we can better integrate reading/detecting broken encded strings in joni (I am not on today though)
<enebo[m]> lopex: thanks. I will look more tomorrow. It sounds like work :)
<lopex[m]> yeah, it's a mess
<lopex[m]> the problem is that mri does quit a bit of addictional work for already validated strings
<lopex[m]> not to mention the plethora of those functions
<enebo[m]> lopex: but we are seemingly not doing enough for unvalidated ones to not do extra for validated ones. Does that soundright?
<lopex[m]> yeah, like returning -1 for broken characters, mri falls back to 1 just to make any advencements in those routines
<lopex[m]> that sunday search was about that case too afaik
<enebo[m]> so without diving into this today if we do not have BROKEN as cr things work fine and do not do extra work otherwise perhaps we see a single byte as length 1 which shouldn't be
<enebo[m]> oh I see
<enebo[m]> that is weird
<lopex[m]> yeah, we'll enter infinite loops in some cases
<enebo[m]> so they get -1 but the method will go 'whelp let's pretend it was 1 valid byte and keep searching'
<lopex[m]> but first what we have to do what semantics are we gonna use on what code layer
<lopex[m]> yeah
<lopex[m]> onigenc_mbclen_approximate
<lopex[m]> return 1; at the end
<enebo[m]> I can imagine dealing with internet data means broken encodings so perhaps this is needed but what a weird semantic
<lopex[m]> so, there's valid length, missing, and 1
<lopex[m]> being returned
<enebo[m]> so perhaps this may end up being passing CR into joni somewhere and ternary check on broken
<enebo[m]> valid length is also 1 right? so single byte ascii and broken are both 1
<lopex[m]> the weird semantics some from the fact that there's missmatch between mri and oni
<lopex[m]> and different usages - like parsing, processing etc
<lopex[m]> in the old days oniguruma didnt have any validation
<enebo[m]> lopex: yeah I am sure there are differences I just meant with the decision to support regexping on broken data
<enebo[m]> ok !validation explains why it may not be so consistent
<lopex[m]> yeah, passing at least "validated" would help a lot
<enebo[m]> they added it over time probably to address reported issues
<enebo[m]> having a boolean branch per char would be a cost but it would be less cost than doing the ambiguous one all the time
<lopex[m]> yeah, when porting oni, there's was only char* and not char*, int p, int e on all of those functions
<lopex[m]> yeah, but it will pay as you wont have costlies length functions
<lopex[m]> *costlier
<enebo[m]> plus most strings are not broken so it would likely still opt ok most of the time
<lopex[m]> for no broken we already do fine
<enebo[m]> bifurcation of bloken and clean interpreters would be too much right?
<enebo[m]> oh ok I am confused no
<enebo[m]> now
<enebo[m]> I thought processing broken what was the issue and we were not giving 1 but -1
<lopex[m]> yes
<enebo[m]> I asumed clean strings are fine with what we have
<lopex[m]> yes
<enebo[m]> ok so then it is about adding the cost of handling broken strings without hurting perf of ok ones right?
<lopex[m]> right
<enebo[m]> ok
<enebo[m]> If you have some time tomorrow I may be asking some more questions but I am repressing the urge to look at code
<enebo[m]> I missed a company holiday last week and decided today would replace that
<enebo[m]> err I guess two weeks ago
<lopex[m]> since all those are being used for unicode
<lopex[m]> maybe they could be improved
<enebo[m]> cool
<enebo[m]> always another thing. Seeing that I also wonder how much of these methods could be made static
<enebo[m]> I know at some level this is polymorphic
<lopex[m]> they use instance tables
<enebo[m]> but some of the methods which call methods could have those methods static which would force inlining quicker
<enebo[m]> Like the method you referenced calls lengthForTwoUptoFour
<enebo[m]> if that was static it would inline but I also imagine that method is used by subclasses directly
<lopex[m]> TransZero is instance
<enebo[m]> sure I am just referring to the methods acting on that instance I guess
<enebo[m]> as they are all protected instance methods
<lopex[m]> then yeah
<enebo[m]> anything which calls anything else in there could possible speed up inlining decision (but perhaps not actually change overall perf)
<enebo[m]> and who knows
<enebo[m]> funny to think jcodings for most people end up being US-ASCII, UTF-8, and 8BIT(BINARY)
<enebo[m]> most of the time we do not even use the ascii paths in jcodings for fast path
<enebo[m]> so it is probably 99% UTF-8
<enebo[m]> The crazy thing to do is make pure non-inherited all static paths for UTF-8 in jcodings and joni
<enebo[m]> anyways pie in the sky and possibly not worth it
<enebo[m]> I just noticed on the still yet to be landed String split branch I made I got big gains from that technique
<enebo[m]> (well for US-ASCII hot paths)
<enebo[m]> ok I am going to KVM away for rest of the day
<enebo[m]> windows machine is my escape from social media and OSS stuff
<headius> I won't be working today but perhaps tonight and definitely tomorrow. I'd like to at least get us settled for 9.4.2 even if that has to happen next week
genpaku has quit [Read error: Connection reset by peer]
genpaku has joined #jruby
rcrews[m] has joined #jruby
<rcrews[m]> I just saw the note about 9.4.2 and the request to hear about blockers. I've been struggling with a new error since upgrading to 9.4.(1). My issue seems to be specific to the using the AWS S3 gem, but it is causing problems with code that has not had problems before. Not sure of the etiquette here, so here goes. I regularly get this error, which I have never gotten before: ERROR: org.jruby.embed.EvalFailedException: (RequestTimeout) Your
<rcrews[m]> socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. ... I'm not getting this with other gems, and I've tried adjusting the AWS connection timeout values several times, up from defaults of 5-60 seconds to 5-10 minutes. Nothing helps. My network is fine for all other uses. It's just the combination of JRuby with the aws-sdk-s3 gem. Is this something that sounds like it
<rcrews[m]> might be a problem that could be caused by JRuby or solved by a different JRuby configuration setting?
<headius> rcrews: and you say this was a new issue in 9.4.1? You should definitely open an issue with as much info as you can, and if it's possible to give us a reproduction that will be a huge help
<rcrews[m]> Thank you. I will see what I can come up with. However, accessing AWS content requires AWS credentials so I'm not sure what kind of test case I can provide. I was hoping someone here might be able to think about what thows/raises org.jruby.embed.EvalFailedException and consider what might have changed recently. In any case, I will report back when I arrive at a work-around. Thanks again
<headius> That exception usually wraps a more specific one. Put your info into a bug and we may be able to turn on some verbose or debug logging.
<headius> It's kind of a catch-all exception when using the JRuby scripting APIs from Java
<headius> If you have some freedom to iterate we could even provide a debug build with more logging and figure out what's going on