teepee changed the topic of #openscad to: OpenSCAD - The Programmers Solid 3D CAD Modeller | This channel is logged! | Website: http://www.openscad.org/ | FAQ: https://goo.gl/pcT7y3 | Request features / report bugs: https://goo.gl/lj0JRI | Tutorial: https://bit.ly/37P6z0B | Books: https://bit.ly/3xlLcQq | FOSDEM 2020: https://bit.ly/35xZGy6 | Logs: https://bit.ly/32MfbH5
<JordanBrown[m]> [expletive deleted]
<JordanBrown[m]> There's nothing wrong with the startup processing on an MSSY2 program (with respect to non-ASCII characters).
<JordanBrown[m]> The problem is in the MSYS2 shell.
<JordanBrown[m]> when I run a simple "dump the args" program with a 🦃 as an argument, it gets botched.
<JordanBrown[m]> but when I run it under the debugger (where it doesn't get shell processing), it's correct.
<JordanBrown[m]> Of course part of this wild goose chase was that I was thinking in terms of a typical "UNIX program under Windows" environment, where the program itself simulates shell processing on its Windows command line.
<JordanBrown[m]> But MSYS2 is more like a real UNIX environment, where the shell does that processing.
<JordanBrown[m]> But argh.
<JordanBrown[m]> Well that's ... interesting. When I run OpenSCAD under the debugger and give it 🦃.scad as an argument, what OpenSCAD sees is those bytes interpreted as 8859-1. Which, I guess, is unsurprising since that's the one place left that's using fromLocal8Bits.
<JordanBrown[m]> need to go back to real work...
J1A8499 has joined #openscad
J1A84 has quit [Ping timeout: 252 seconds]
<InPhase> JordanBrown[m]: How about cmd.exe?
<InPhase> JordanBrown[m]: Think like the masses. :)
ur5us has quit [Ping timeout: 244 seconds]
ur5us has joined #openscad
ur5us has quit [Remote host closed the connection]
ur5us has joined #openscad
tachoknight has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
teepee has quit [Remote host closed the connection]
teepee has joined #openscad
<JordanBrown[m]> msys2 build has problems running from cmd.exe last time I tried. I didn't try hard :-)
<JordanBrown[m]> yeah, can't find a couple of DLLs.
<JordanBrown[m]> So I tried running it from the directory where those DLLs are, and it's working, and I'm back to seeing bad behavior. Need to look more.
<JordanBrown[m]> Maybe running it inside gdb is somehow bypassing problematic parts of startup?
<JordanBrown[m]> That would be wrong, but not shocking - running a MSYS2 program from another MSYS2 program might want to yield more "pure" UNIX semantics.
<peepsalot> isn't there preferences to the encoding of the terminal itself?
<peepsalot> also the choice of terminal has no bearing on loading DLLs or not. that's entirely down the the working directory
<JordanBrown[m]> For cmd.exe? No, I expect that there's no encoding option. Windows generally doesn't let you do that kind of thing.
<JordanBrown[m]> And while in some sense the choice of terminal has no bearing on loading DLLs, the choice of *environment* certainly can. An MSYS2 shell sets up environment variables that a cmd.exe does not, and it is entirely plausible that MSYS2 programs spawning MSYS2 programs use private interfaces to move argument lists.
<JordanBrown[m]> With the same current directory (that isn't /mingw64/bin), openscad runs in an MSYS2 environment but not in a cmd.exe environment. The difference is, I assume, in the setting of the PATH environment variable.
<peepsalot> ok, i just quickly checked on my windows machine, and cmd.exe doesn't have an option to change the code page, but it does dislpay it under preferences. it says code page 437 for me
<JordanBrown[m]> I didn't see that, but I see it now.
<peepsalot> iirc there are some system calls to set the code page appropriately, but not sure how that works for command line arguments
<JordanBrown[m]> It kind of makes sense that cmd.exe would use CP437; that's the traditional code page dating back to DOS days.
<peepsalot> JordanBrown[m]: did you see the old PR 1296 that InPhase linked the other day?
<JordanBrown[m]> No, I didn't. Thanks.
ur5us has quit [Ping timeout: 244 seconds]
<peepsalot> that PR was mainly about implementing a library called "nowide", which is specifically to work around windows' crazy wide-character commands
<peepsalot> nowide actually eventually got added to boost
<peepsalot> but anyways some of the various fixes there were not necessarily dependent on nowide either
<peepsalot> i'm still not sure if nowide boost is old enough to be on all our supported platforms yet. also i don't know if its largely obsolete under win 10 and up now
<peepsalot> "Since the May 2019 update Windows 10 does support UTF-8 for narrow strings via a manifest file. So setting "UTF-8" as the active code page would allow using the narrow API without any other changes with UTF-8 encoded strings. See the documentation for details."
<JordanBrown[m]> It needs to "just work" with default settings.
<JordanBrown[m]> Being able to select non-default settings and have it work is interesting but not a solution.
<peepsalot> the code page can setting be changed programmatically, and that is the only solution, as far as I can see. CP 437 will never be compatible with your turkey emoji
<JordanBrown[m]> BTW, regarding Windows' Unicode support as "crazy" ignores the fact that they started shipping a product about the same year that UTF-8 was proposed.
<JordanBrown[m]> Cmd.exe doesn't display a turkey correctly, but if I type "notepad <turkey>.txt", it does the right thing.
<JordanBrown[m]> and if I type "echo turkey > 🦃.txt" it does the right thing.
<JordanBrown[m]> It doesn't *display* right, either when echoing or in a directory listing, but it does create a file that Explorer shows with the correct name.
<JordanBrown[m]> And you can open it again from cmd.exe with "notepad 🦃.txt".
<JordanBrown[m]> I need to find a character that's in UCS-2 but not in CP437.
<JordanBrown[m]> Having to make wide-ness visible that pervasively seems repugnant.
<JordanBrown[m]> The program should be UTF-8 or QString everywhere internally, except at the perimeter.
<JordanBrown[m]> That might well require wrapping cerr << and cout <<, but it should be done in a different way.
<peepsalot> its not making wideness visible its doing the opposite. that's why its called nowide
<JordanBrown[m]> Is it scattering the word "wide" throughout the program?
<peepsalot> no, its scattering "nowide" :P
<peepsalot> anyways, sorry you don't like the name of the library that is specifically designed to address the problems you are running into.
<JordanBrown[m]> Don't really care about that... what I care about is that it's scattered throughout the program.
<JordanBrown[m]> But it does bother me, because I have not yet found any wide characters anywhere in the program. (Except as an implementation detail inside QString where they aren't visible.)
<JordanBrown[m]> It does look like the cmd.exe console is CP437 for at least some purposes. I wrote an OpenSCAD program that echoes a turkey and an ä, and ran it under cmd.exe, and they both came out as UTF-8-to-CP437 mojibake.
<peepsalot> well if you only want to fix the command line arguments, you would just need to change that 1 line basically (or look how the library does it). it just wouldn't be able to echo utf-8 etc to terminal without swapping cout etc.
<JordanBrown[m]> I would prefer to fix all of it. We'll see.
<peepsalot> dunno if it loaded for you, but the line of changed code I linked to was in openscad.cc: int main(int argc, char **argv) { nowide::args a(argc, argv); // Fix arguments - make them UTF-8
<JordanBrown[m]> But does that break it for starting from MSYS2, or for starting from Explorer?
<JordanBrown[m]> As best I can tell right now, the argv coming into main is already corrupted.
<JordanBrown[m]> Now, this is for MSYS2, which isn't the production Windows build. That may matter. But it should be made to work for both.
<peepsalot> The class uses GetCommandLineW(), CommandLineToArgvW() and GetEnvironmentStringsW() in order to obtain Unicode-encoded values. It does not relate to actual values of argc, argv and env under Windows.
<JordanBrown[m]> Ah. That would help.
<JordanBrown[m]> Still, my OCD wants to know what's really going on.
<JordanBrown[m]> Hmm. cmd.exe properly echoes and displays ¾, but that's not in CP437.
<peepsalot> well, all the to/fromLocal8bit you mentioned before should probably be to/fromStdString
<JordanBrown[m]> For some but not all cases, yes.
<JordanBrown[m]> Internally, where we can require UTF-8 everywhere, yes.
<JordanBrown[m]> At the perimeters where we might need some other encoding, no.
<peepsalot> such as?
<JordanBrown[m]> Which perimeters, or which encodings?
<JordanBrown[m]> CP437, 8859-1, whatever the environment wants.
<JordanBrown[m]> In the particular case of cmd.exe it looks ... odd. OpenSCAD's output is UTF8-to-CP437 mojibake, but the command line is capable of accepting, and "dir" is capable of correctly displaying, non-CP437 characters.
<peepsalot> sorry man, its frustrating that you seem to be immediately dismissing all the work that kintel and I put into that PR which iirc fixed most of encoding issues, because you think its ugly or something
<peepsalot> maybe there is some way to replace std::cout with nowide that doesn't require touching all the files, but I don't know what that would be
<JordanBrown[m]> Touching all the files, yes. But replace those invocations with abstractions.
<JordanBrown[m]> And I want to understand the problem before I layer another piece on top of it.
<JordanBrown[m]> All too often - and without looking deeper I don't know here - people layer more stuff on top, when the right answer is to use the existing layers correctly.
<JordanBrown[m]> I've fixed a number of other character set problems so far, and in all of them the right answer was to use existing tools correctly, not to add something new on top of them.
<JordanBrown[m]> Almost all of them got noticeably simpler, and worked better.
<JordanBrown[m]> I suspect that the console is capable of accepting either CP437 characters or UCS-2 / UTF-16LE characters, probably depending on the system call used.
<peepsalot> the point of libraries is that someone has already done all the legwork of how to use those "existing tools correctly". if you really want to understand every nitty gritty detail then why not look at the nowide source?
<peepsalot> especially for a compatibility layer like that. because no one wants to write platform specific code in their cross platform app
qeed_ has joined #openscad
<JordanBrown[m]> Well, first, it won't tell me which component in the stack is corrupting the arguments. But also I don't want to add in another 20+ files if they aren't truly necessary.
qeed has quit [Ping timeout: 264 seconds]
<JordanBrown[m]> We already have one cross-platform compatibility layer - Qt. One of the hardest things I was looking at the other day was a place where three different schemes for representing strings all collided. I want to get *rid* of that kind of complexity, not add to it.
<JordanBrown[m]> And maybe I will be able to, and maybe not.
<JordanBrown[m]> But I'm doing this for fun, so I don't have to justify the time I spend on it...
<peepsalot> oh. well i didn't know you were having *fun* chasing text encoding errors. not my cup of tea but to each their own
<JordanBrown[m]> It's a puzzle.
<peepsalot> btw you might try playing with "chcp 65001" in command line to see how it behaves (and "chcp 437" to set it back, i think)
<JordanBrown[m]> Yes, that makes it better. Still not right. Based on a very quick test I think they may be squeezing their UTF-8 through UCS-2.
<JordanBrown[m]> ä made it through. 🦃 did not.
<JordanBrown[m]> and yes, chcp 437 takes it back, and chcp alone reports.
<JordanBrown[m]> So why didn't this PR from long ago ever get merged?
<peepsalot> i don't remember exactly. i think we had been waiting to get it into boost, as it had been approved for a couple years or something before it officially was added to boost
<peepsalot> the PR as it sits has an old "nowide standalone" with all its files being added to the repo, as opposed torequiring it as a dependency
<peepsalot> and as far as why notepad works, I suspect that under the hood maybe it does similar calls mentioned by nowide::args docs. GetCommandLineW() and CommandLineToArgvW()
<JordanBrown[m]> Yes, very likely.
califax has quit [Ping timeout: 258 seconds]
<JordanBrown[m]> So the mingw64 startup functions should be doing that too.
<JordanBrown[m]> (Or something.)
GNUmoon2 has quit [Ping timeout: 258 seconds]
teepee has quit [Ping timeout: 258 seconds]
<JordanBrown[m]> I should probably also find out how to do the cross-compile that the production builds do. I don't particularly like doing development in a Linux VM, but I'll need to test it.
califax has joined #openscad
<JordanBrown[m]> I suspect that cmd.exe is in some sense supporting two encodings: if you're a 437-friendly program you get 437, and if you're a UCS-2-friendly program you get UCS-2. (Or maybe UTF-16LE.)
<JordanBrown[m]> There's some evidence that it's only UCS-2, not UTF-16LE.
teepee has joined #openscad
<JordanBrown[m]> Based primarily on the fact that it appears to think that 🦃 is two characters, not one.
GNUmoon2 has joined #openscad
<JordanBrown[m]> (BTW, if you wonder why my test case is 🦃, it's because it amuses me and is more polite than 🚽 or 💩. There's also 🖖 but I don't like the color.)
<JordanBrown[m]> And I want something that's way out there, that will only work properly if the Unicode support is really working end to end.
<JordanBrown[m]> ä is more realistic, but could be going through any number of translations and still come out the other end OK.
teepee has quit [Remote host closed the connection]
<JordanBrown[m]> Anyhow, as much as I'm having fun (and I am), I need to prep for a trip tomorrow. Later, thanks for the help and the chat.
teepee has joined #openscad
GNUmoon2 has quit [Remote host closed the connection]
GNUmoon2 has joined #openscad
GNUmoon2 has quit [Remote host closed the connection]
GNUmoon2 has joined #openscad
GNUmoon2 has quit [Remote host closed the connection]
GNUmoon2 has joined #openscad
J1A8499 has quit [Ping timeout: 252 seconds]
GNUmoon2 has quit [Remote host closed the connection]
GNUmoon2 has joined #openscad
teepee_ has joined #openscad
teepee has quit [Ping timeout: 258 seconds]
teepee_ is now known as teepee
J1A84 has joined #openscad
teepee has quit [Remote host closed the connection]
teepee has joined #openscad
TheAssass1n has quit [Remote host closed the connection]
hrberg has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
hrberg has joined #openscad
TheAssassin has joined #openscad
TheAssassin has quit [Remote host closed the connection]
TheAssassin has joined #openscad
qeed has joined #openscad
qeed_ has quit [Ping timeout: 252 seconds]
teepee has quit [Remote host closed the connection]
teepee has joined #openscad
little_blossom has quit [Quit: little_blossom]
little_blossom has joined #openscad
p3ck has joined #openscad
fling has quit [Quit: ZNC 1.8.2+deb2+b1 - https://znc.in]
fling has joined #openscad
teepee has quit [Ping timeout: 258 seconds]
teepee has joined #openscad
ur5us has joined #openscad
ur5us has quit [Ping timeout: 246 seconds]
teepee has quit [Quit: bye...]
teepee has joined #openscad
teepee has quit [Remote host closed the connection]
teepee has joined #openscad
ur5us has joined #openscad
rue_mohr has joined #openscad
<rue_mohr> what version supports path_extrude?
ur5us has quit [Ping timeout: 264 seconds]
<teepee> no idea, that's not a built-in
<rue_mohr> oh no
<rue_mohr> hopefully its pulled one day soon
<teepee> pulled from where?
<teepee> there's libraries doing that, but that could server as inspiration, not directly pulled as code
<rue_mohr> I noticed that and had hoped it was long enough ago to have been incorperated
<rue_mohr> one day openscad will have all the features of povray :)
<rue_mohr> and the next day a new piece of software will come along with no features, and everyone will flock to it because its something new
<teepee> it's a library, so it should work with current version
<rue_mohr> ah, ok
fling has quit [Ping timeout: 258 seconds]
teepee_ has joined #openscad
fling has joined #openscad
teepee has quit [Ping timeout: 258 seconds]
teepee_ is now known as teepee
Guest17 has joined #openscad
Guest17 has quit [Client Quit]
TheAssassin has quit [Remote host closed the connection]
TheAssassin has joined #openscad