<puritylake[m]>
Man been run off my feet this week
<puritylake[m]>
Barely any time to do anything but I am free now
<enebo[m]>
puritylake: howdy
<enebo[m]>
so I nearly have the parser part of sprintf where all formats are correctly parsed
<puritylake[m]>
How are you?
<enebo[m]>
I decided to try and solidify that part so it should involve converting to the new format a little less frustrating
<enebo[m]>
puritylake: doing well.
<enebo[m]>
The new sprintf stuff ported actually passes some specs the old one does not
<enebo[m]>
puritylake: I can do a summary of what is involved and why this is an interesting project
<puritylake[m]>
Sure, sounds good
<enebo[m]>
sprintf itself is a huge piece of esoteric craziness
<enebo[m]>
there is also a layer of stuff Ruby added (I think involving %{name} to extract fields out of hashes
<enebo[m]>
Our implementation and in fact most implementations seem to be written into a massive switch statement in a loop
<enebo[m]>
We do not handle %a or %A and at one point a couple of us looked at adding them
<enebo[m]>
if you look at the code (org.jruby.util.Sprintf) then you will see that a number of format modifies in printf like %d or %b is the same chunk of code
<enebo[m]>
in that chunk of code is lots of if statements
<enebo[m]>
Some are obvious it is only for a particular format (like %u) but others are unclear unless you spent a lot of time reading some fairly complicated code
<enebo[m]>
So motivation #1 was to untangle these sections into more straightforward separate methods even if it means some additional duplication
<enebo[m]>
The second tangle of the printf implementation is that the processing of the format string "%0.2d" is processed every time you call printf and it happens in the same code which is building up the result string
<enebo[m]>
This is two activities which have been stirred together
<enebo[m]>
This not only makes understanding what is happening more difficult it also limits our ability to eliminate work
<enebo[m]>
If at a sprintf call site (place we call it) we have a literal string (e.g. "%2d") then we can just parse that format string once and save it
<enebo[m]>
Then each time we revisit that particular sprintf we just use what we have already parsed
<enebo[m]>
(at this point in time saving this off is out of scope until this is working and debugged)
<enebo[m]>
So this is the basic problem description.
<enebo[m]>
Let me know if you have any comments or questions
<puritylake[m]>
Makes sense to me, I assume a trip to the Ruby docs might be in order to make sure everything is up to mRuby?
<enebo[m]>
well it would not hurt to learn sprintf beyond the simple stuff most of us will do
<puritylake[m]>
I have used sprintf in the past, albeit in C but I am no expert on it yet
<puritylake[m]>
* on it... yet
<enebo[m]>
ok
<enebo[m]>
For example '%*1$.*3$*2d' is a valid specifier :)
<enebo[m]>
I have never actually seen anyone use that
<enebo[m]>
but I have written the new parser already so it will generate a FormatToken which contains the data
<puritylake[m]>
Pretty sure I heard a saying "imagine an obscure miniscule part of a code base, you can't remove it cause someone somewhere uses it" lol
<enebo[m]>
I should say we have two test suites we can run to make sure we are passing all we passed with the old implementation
<enebo[m]>
So let's talk a bit about the process of what you will work on
<enebo[m]>
There is new code in SprintfParser and old code in Sprintf
<enebo[m]>
In the new code I have converted %duibBoxX already
<enebo[m]>
What I did was look for the letter I want to convert like %e and then in Sprintf I look for case 'e':
<enebo[m]>
That will end up being a pretty big blob of code for eEgG
<enebo[m]>
Actually I should not use the most complicated one as an example
<enebo[m]>
%c is probably a better starting one
<enebo[m]>
more or less you will make a method in SprintfParser called format_c() like format_idu() but the body of format_c you will parse from that section of the switch statement from Sprintf
<enebo[m]>
It won't just compile as you will need to make some smallish changes to make it work with the new FormatToken object
<enebo[m]>
but I have two methods already converted so you can look at what changes I made when I moved the code over
<enebo[m]>
At the moment the only way to turn on the new system is to set the env variable SPRINTF=anything
<enebo[m]>
If it is a format specified that is not supported it will just fail back to the old system
<enebo[m]>
but the nice thing about this is that you can test with SPRINTF set and without to compare the outputs
<enebo[m]>
That gist above is how to run both of the tests for printf code that we have (MRI internal test suite and the ruby/spec projects test suite)
<enebo[m]>
One exciting thing if I have fixed a few problems while working on the parser that we have never supported and I have little fear I was unexpectedly breaking something else
<enebo[m]>
This was why I mentioned that we attempted to add %a/%A. We failed because of the complexity of weaving it into that big switch
<puritylake[m]>
I have that branch set as current at the moment
<enebo[m]>
cool
<enebo[m]>
And I just sort of picked this out as a fun and actually a pretty important thing to work on
<enebo[m]>
If this is not fun or you are frustrated then you can say so and we can try and figure something else out
<puritylake[m]>
I should be fine, if I get frustrated I'll stop for maybe a day and work on some personal stuff and come back at it the next day
<enebo[m]>
Also I spent the last few days trying to make sure we passed all tests with what I ported over so my memory of this is good atm
<enebo[m]>
So likely any question you have I will be pretty familiar with the code
<enebo[m]>
If you have never written a recursive descent parser then looking at the Lexer (hahah well some times a lexer and parser is a fuzzy line)
<enebo[m]>
Hopefully we will not need a lot more changes there but it is I think quite a bit simpler to understand than the old loop
<puritylake[m]>
I wrote a lisp-like language for my final college project which was earlier this year, I should be able to figure it out albeit mine wasn't very complex lol
<enebo[m]>
The indexed (unnumbered) parsing is the only icky bit
<puritylake[m]>
Had written a parser combinator for it in Swift but had to change to C# and couldn't figure out it's generics
<puritylake[m]>
Well not in time
<enebo[m]>
heh...well not as cool as parser combinators but it is a work horse
<enebo[m]>
descent parsers are easy to write anyways
<puritylake[m]>
My problem with writing for something more complex than Lisp is I am unsure how to structure and iterate over the AST
<enebo[m]>
yeah in other languages moving over the data requires some explicit code to allow it
<enebo[m]>
If you look at IRBuilder we walk through our AST or you can look at tool/ast which does it a little differently
<enebo[m]>
in JRuby 1.7 we did use an AST interpreter. That is surprisingly simple in that you just make each node type have an interpret() method
<enebo[m]>
you could decouple that if you wanted but it worked well enoug
<puritylake[m]>
I've had half baked ideas to write a language in it itself, a la pypy
<puritylake[m]>
Although I think technically pypy is written in RPython
<enebo[m]>
yeah it is
<enebo[m]>
it is close enough to be metacircular
<enebo[m]>
or I will give it that :)
<enebo[m]>
A lot of people are into the idea of self-hosting a language impl. I guess I am as well but it is not clear and cut that it is always a good idea
<puritylake[m]>
Everything comes with advantages and disadvantages
<puritylake[m]>
If there was a perfect solution we'd all be using it
<enebo[m]>
yeah exactly
<puritylake[m]>
Old sprintf is a chunky file
<enebo[m]>
yeah. The new one will still be pretty chunk
<enebo[m]>
just a bit more separation
<puritylake[m]>
Ya, new one kinda decouples the process
<puritylake[m]>
Liking the look of it so far
<enebo[m]>
It is possible once the first phase of conversion is done some other types like Arg can change
<enebo[m]>
We have a requirement of passing in arguments as an Array of primitive Array when in cases we only pass in a single value
<enebo[m]>
The extra creation of a data structure is a tiny performance cut
<enebo[m]>
but this is why this activity is valuable the more we simplify this the easier it will be to make other changes
<enebo[m]>
This is currently broken. I will try and fix it now and then we should be green with new for both test suites
<puritylake[m]>
Is there meant to be no failing tests in the jruby/spec test suite?
<enebo[m]>
nope
<enebo[m]>
err no sprintf tests or are you seeing something else?
<enebo[m]>
2E?
<enebo[m]>
2 erros with 2 few arguments?
<puritylake[m]>
No just zero failures on the specs, the first command you have in the gist
<puritylake[m]>
I cleaned the files before rebuilding on the new branch
<enebo[m]>
the idea was that everything in both command lines should not have anything failing or erroring
<enebo[m]>
but there are 2F for the second on involving upto
<enebo[m]>
I just fixed that locally but I think it hit a different error so a little mroe debugging and both should be green
<puritylake[m]>
Ah cool, just making sure
<enebo[m]>
upto for "00" calls sprintf %.*d internally and the new parser was going off the rails
<puritylake[m]>
Thought I might have had some failing tests to make go green as I go
<enebo[m]>
yeah I was hoping to give you a complete as I can do parser so it is just making sure you move over code without having to debug it
<enebo[m]>
but I will push a fix once I have it and then on my machine I will be green for both of those command lines
<puritylake[m]>
Cool, should I hold off til then?
<enebo[m]>
naw. This problem will not bite you per se
<enebo[m]>
and I will have it fixed in next 20 minutes so I doubt you will hit it before I fix it
<enebo[m]>
moving the code over will tkae some time
<puritylake[m]>
Cool, heading for a shower anyway then gonna settle in for the night, hopefully get something done tonight or at least gain some more knowledge of how I can make things work
<puritylake[m]>
It's one thing being told how to do a problem but actually looking at the code is another
<enebo[m]>
oh yeah this will take some time to start to grok it
<enebo[m]>
so many independent variables
<puritylake[m]>
I'll update you on my progress as I feel necessary