<headius>
looks like they are rolling an update out incrementally because most jobs work
<enebo[m]>
ah so I remembered this backward
<enebo[m]>
it was only regexp in 2.6 and string was accepted later
<headius>
ahh right
<enebo[m]>
I expected to see a failure for this but I am surprised it is a single spec
<headius>
yeah low risk there
<enebo[m]>
this csv parser is some pretty big and sophisticated looping methods
<enebo[m]>
a lot of code and a bunch of nexts
<headius>
fun
<headius>
lots of cyclomatic complexity
<enebo[m]>
I am not putting it down but I imagine these methods are pretty large in IR
<enebo[m]>
The other random thought reading this is whether we are setting CR on substrings
<enebo[m]>
You mentioned transcoding but us marking stuff as VALID vs 7BIT would have a huge difference
<headius>
could audit cr propagation
<enebo[m]>
we heavily rely on generic encoding methods in a lot of our string methods but we can definitely improve things lookinf more greedily for 7bit up front can end up much faster
<enebo[m]>
Split on ascii string with "," ended up being like 80% quicker
<enebo[m]>
which might mean something is not optimizing well in the string helpers we use but I think it largely because it is just some dead simple loop
<headius>
there's probably many rounds of CR optimizations missing from cruby
<enebo[m]>
yeah I would not be surprised if they added stuff and we never noticed. I do not really follow their development very closely.
<enebo[m]>
They just tend to end up a reference when I want to understand behavior
<enebo[m]>
100% of all lines delivered to get_lines is CR 0
<headius>
hah
<headius>
well that's not helpful
<enebo[m]>
yeah I guess I need to figure out where it came from and whether it reasonably should have one
<enebo[m]>
we make so many micromistakes around sharing code
<enebo[m]>
some callers call through a method which looks up a separator but then calls another method which has to have same logic because other things call it directly instead
<enebo[m]>
in case of each line stuff it means checking opts twice and checking separator arg twice
<headius>
yeah bad refactoring probably
<enebo[m]>
really small overhead obviously
<enebo[m]>
the main split method is 130ish lines with several call outs to other ways (enumerate) in a few places
<enebo[m]>
I definitely feel we get here because we tend to port some methods closely
<enebo[m]>
interestingly this method does not even care about CR or even if the string is a valid string
<enebo[m]>
It just seems to take it at its word but after the recent mail gem issue with joni I wonder how much code is literally forgiving on broken data
<headius>
hmm I booched something in chr fix
<headius>
centos installing on my other machine
<enebo[m]>
lunch
<headius>
oops, >> vs >>>
<headius>
I can never remember the right one
<enebo[m]>
yeah bit math sux
subbu has joined #jruby
<enebo[m]>
IO.gets has a fast path which will set CR but slow path does not seem to have that logic
<enebo[m]>
Testing to see if it actually is hitting that path or not
<enebo[m]>
OpenFile.getlineFast which I thought was setting cr is not providing any way for it to be anything other than 0
<enebo[m]>
I suspect cr was int* in MRI
<enebo[m]>
So slow path of getline has no cr logic and fast path is looking at a value which I think is always 0
<enebo[m]>
We pass all the things so I am guessing cr is merely for optimization and not for correctness in these methods
<headius>
ok so I finally got this tested on centos
<enebo[m]>
yeah cr is passed &cr
<headius>
the builds I did in CI against an earlier debian are still picking up newer glibc so it did not help
<headius>
building on centos directly does produce binaries that work, but it's not reproducible unless I can fix it in CI
<enebo[m]>
yuck
<enebo[m]>
So I am really curious will fixing cr calculation on IO speed things up
<headius>
it is frustrating that glibc does not have better support for building against a new version and still working on older ones
<headius>
so I think we punt this and I'll try to come up with something else
<headius>
I will open an issue against 9.4.3 so it doesn't get lost
<enebo[m]>
I imagine in some methods it will just say "oh no cr I better walk it once"
<enebo[m]>
but in each_line it just says "you are utf-8 great...utf-8 walking"
<headius>
yeah if it is walking a lot of strings it doesn't need to that would be overhead
<headius>
I did see some calls to CR calculation stuff in profile but not up very high
<enebo[m]>
so each_line is not doing that second walk but other methods are likely scanning or at least using CR for opts
<enebo[m]>
but IO is not making a CR other than 0 so it just opts out all IO created data from playing
<enebo[m]>
It might only be getline although I suspect that is a very common way of working with data
<enebo[m]>
This may yield something
<enebo[m]>
headius: so OpenFile can have a cr field which is only used for temporary cr calcs or each of these methods can make int[] cr = new int[] { 0 };
<enebo[m]>
using a field prevents the box but it may be confusing
<enebo[m]>
part of me feels like JVM should properly be able to escape single value arrays as out params but I guess it is not that simple :)
<headius>
turns on color for backtraces and ir.print if running on a tty
<enebo[m]>
nice
<headius>
This Maven bug is getting annoying now. Who knows how long until GitHub updates to a fixed version, if the fix has even been released yet
<headius>
I guess we just need to force all jobs to use a known working version for now
subbu has quit [Quit: Leaving]
<enebo[m]>
interesting. you did something clever to avoid out param for cr but it is not hooked up to anything
<enebo[m]>
you know pos will be int and you increate an int using a long which loses second half of the long
<enebo[m]>
but second half is where you were stuffing CR data
<enebo[m]>
so I will reaudit the methods in question but I think I can make smaller changes to keep CR properly updated
<enebo[m]>
Although looking at this method I don't think it does anything other than figuring out CR (which is fine) but we are not setting it so it is just dead work. For fun I will delete this call and see if this is faster
<headius>
Ha I must have gotten distracted
<enebo[m]>
who knows. A lot of this is really old too
<enebo[m]>
It is possible it was hooked up at some point
<headius>
fwiw JVM should be able to eliminate temoprary boxes if they inline and don't escape but it's not guaranteed of course
<headius>
JVM as in Hotspot
<enebo[m]>
Is a precondition that the place it escapes has been inlined right?
<enebo[m]>
in theory my bench uses index on each line and if 7bit it should just byte index
<enebo[m]>
I should probably do aref perhaps to just make it as dramatic as possible
<headius>
yes must inline
<headius>
and can't be used as part of any branching structure, unless they have improved that
<headius>
we might get better escape analysis with a single-field carrier object also since arrays have bounds checks and things
<enebo[m]>
this is just bit mathing it into the return result
<enebo[m]>
using long instead of int
subbu has joined #jruby
<headius>
yeah
<headius>
when we can do that it's obviously ideal
<headius>
I just did it for the chr fix
<headius>
enebo: I will manually merge that irb fix and avoid his pom.xml juggling commits