#openscad on 2021-10-07 — irc logs at libera.irclog.whitequark.org

00:09 <InPhase> peepsalot: Yeah, extra space consumption might be it, but that quantity sure isn't bad. Essentially, the operative question I'm poking at is: If it works better that way, why isn't the standard one working that way? The license is such that it could be integrated as-is. But it hasn't been.

00:10 <InPhase> I could buy the answer being that minimal space consumption is an expectation of the standard one.

00:18 LordOfBikes has quit [Ping timeout: 252 seconds]

00:31 LordOfBikes has joined #openscad

00:35 <peepsalot> InPhase: devil's advocate question: If glibc's implementation is so well optimized, why do so many replacements exist, and are used in large projects?

00:40 ur5us_ has quit [Ping timeout: 245 seconds]

00:44 <peepsalot> mimalloc project is also still relatively new, looks like only about 2yrs since inception

00:45 <peepsalot> and i wouldn't underestimate social factors like "greybeards hate microsoft" coming into play either

00:46 <peepsalot> the technically superior solution doesn't always end up as "standard" in general

00:55 <dalias> if a custom malloc implementation makes a 35% *overall* speed improvement, fixing whatever has you performing 10 million mallocs per second would give you a 200% speed improvement

00:56 <dalias> i don't get what the hype over mimalloc is. the only novel stuff in it is for nonstandard api where you explicitly pick a heap context you want to allocate in

00:56 <dalias> otherwise it's a fairly standard design

00:57 luissen has quit [Quit: exited unexpectedly with error code -1]

01:00 luissen has joined #openscad

01:03 <InPhase> peepsalot: From what I gather most replacement allocators do what's called slot allocation. I'm not sure if they can do slot allocation without type hints.

01:04 <InPhase> peepsalot: i.e., if you're going to have a bunch of 24 byte std::variants, you first make an array of those, and start assigning out of it, then mark unused and reassign from that pile after delete/new.

01:05 <peepsalot> dalias: problem is the way libGMP works by having every value on the heap, multiply that by 3 coordinates * #of vertices etc.

01:05 <dalias> libgmp is a really bad choice for bignum :-p

01:05 <InPhase> I guess you could blindly do it by malloc size, but that seems like variable size allocation would really throw that into a space inefficiency issue.

01:06 <dalias> inphase, how so?

01:08 <InPhase> dalias: If you allocate 2000 strings from size 1 to 2000, and your slot allocator preallocates 512 elements for each, suddenly you have 512*1000*2000 bytes, or 1GB, for what could have fit into 2MB.

01:08 <dalias> ah yes

01:09 <InPhase> dalias: Obviously parameterizable with different tradeoffs, but as an example.

01:09 <dalias> this is why all the performance-oriented allocators have horrible memory usage

01:09 <dalias> which then leads to bad performance because of swapping :-p

01:09 <InPhase> But if you have type hints, then you don't treat an array of chars as if it's a fixed size thing.

01:09 <dalias> well you kinda do just because otherwise you make bad fragmentation

01:10 <InPhase> A well balanced slot allocator would slot fixed types, and use a variable heap area for array type allocations.

01:10 <dalias> speaking from a standpoint of having done this, but focusing on hardening, fragmentation, and memory usage -- not speed

01:15 <InPhase> Hahah. Found this while looking for mimalloc reviews. "On beta versions of Windows 95, SimCity wasn’t working in testing. Microsoft tracked down the bug and added specific code to Windows 95 that looks for SimCity. If it finds SimCity running, it runs the memory allocator in a special mode that doesn’t free memory right away."

01:24 <InPhase> peepsalot: Found technical details here, reviewing now: https://www.microsoft.com/en-us/research/uploads/prod/2019/06/mimalloc-tr-v1.pdf

01:25 <InPhase> Key info: Size-classes are used.

01:26 <InPhase> It's also design focused on cache locality of operations.

01:26 <InPhase> Which has the side benefit that the allocated memory for active areas of execution ends up more cache local.

01:30 <peepsalot> InPhase: yeah i saw the paper, but didn't read through it yet. i'm just checking out some of the other alternatives right now

01:30 <InPhase> peepsalot: Okay, done reviewing. I definitely approve of this as a performance boost option. I think it does not have the C++-game-allocator performance you'd expect from a type aware slot allocator, but I don't think we have an option of using such with CGAL anyway. But with size-classing it does seem to make a smart balance between the variable size issue I raised above, wasting a little space and

01:30 <InPhase> spreading things out a little, but overall clumping things much better into cache-hot less fragmented areas.

01:31 <InPhase> I would not be surprised if some of the memory consumptions exceed your initial example, but I think a little extra is okay.

01:40 gunnbr has joined #openscad

01:49 gunnbr has quit [Ping timeout: 265 seconds]

01:50 gunnbr has joined #openscad

01:51 snaked has quit [Quit: Leaving]

01:51 snaked has joined #openscad

01:52 gunnbr_ has joined #openscad

01:55 ur5us_ has joined #openscad

01:55 gunnbr has quit [Ping timeout: 265 seconds]

02:23 arebil has joined #openscad

02:33 ur5us_ has quit [Quit: Leaving]

02:33 ur5us has joined #openscad

02:41 <gunnbr_> status?

02:41 <othx> Gthx.NET version 2.08 2021-08-14: OK; Up for 1 day, 13 hours, 4 minutes, 6 seconds; mood: pretty good.

02:41 <gunnbr_> othx: Take five!

02:41 othx has quit [Remote host closed the connection]

02:44 othx has joined #openscad

02:44 <gunnbr_> status?

02:44 <othx> Gthx.NET version 2.08 2021-08-14: OK; Up for 22 seconds; mood: pretty good.

03:01 arebil has quit [Quit: My keyboard has gone to sleep. ZZZzzz…]

03:07 <ccox> peeps - that's kind of what I was going to try. But I have to spend more time reading and learning the code and how OpenSCAD uses CGAL.

03:08 <ccox> peepsalot: I think there is a larger opportunity for speed increases there, but I won't know until I dig into it more.

03:08 <ccox> peepsalot: which is a shame, because the OSes already have a suballocator for small sized objects, but Mac, Win, and Linux all still waste a lot of time in them.

03:10 <ccox> peepsalot: BUT, changing allocators needs a lot of testing. Many times they have dark corner cases with horrible performance.

03:11 <ccox> InPhase: the OS suballocator doesn't know which pointer belongs to which sub block without doing a hefty lookup. An inline suballocator can know that ALL pointers belong to the same size block, and skip a lot of work.

03:12 <ccox> Also, inline suballocators can skip VM locks and a lot of other cruft that some OSes hit in the allocator (or at least hit them N/BLOCK times less often)

03:14 ferdna has joined #openscad

03:14 <ccox> peepsalot: glibc does not have a suballocator that I know of, but the major OSes do. The OS vendors optimize it, but for more common cases (Word, Excel, Chrome, Call of Duty, etc.).

03:16 <ccox> peepsalot: you left out the overhead of each pointer having information buried somewhere by the OS. Allocating 12 bytes usually means you really allocated 20 or more. Never forget overhead!

03:23 <ccox> Fortunately, I've done this kind of investigation and tuning before on well used desktop applications.

03:26 <peepsalot> ccox: what do you mean I left out? in my explanation of libGMP? i was just addressing the number of calls to malloc, but yeah pointer overhead can be an issue too

03:27 <ccox> peepsalot: sorry, was responding to each comment as I read them (trying to catch up after a not-great day)

03:32 othx has quit [Remote host closed the connection]

03:51 othx has joined #openscad

03:56 gunnbr_ has quit [Ping timeout: 265 seconds]

04:26 ferdna has quit [Quit: Leaving]

04:35 ur5us has quit [Ping timeout: 245 seconds]

04:44 arebil has joined #openscad

05:12 linext_ has joined #openscad

05:15 linext has quit [Ping timeout: 264 seconds]

05:23 gunnbr has joined #openscad

05:33 othx has quit [Remote host closed the connection]

05:35 othx has joined #openscad

05:36 othx has quit [Remote host closed the connection]

05:44 othx has joined #openscad

05:56 InPhase has quit [Ping timeout: 260 seconds]

06:01 InPhase has joined #openscad

06:15 gunnbr has quit [Ping timeout: 265 seconds]

06:43 TheAssassin has quit [Remote host closed the connection]

06:43 TheAssassin has joined #openscad

06:54 <gbruno> [github] thehans opened issue #3930 (Use an optimized malloc replacement for better CGAL performance). https://github.com/openscad/openscad/issues/3930

07:57 linext__ has joined #openscad

08:00 linext_ has quit [Ping timeout: 245 seconds]

08:38 arebil has quit [Quit: My keyboard has gone to sleep. ZZZzzz…]

08:40 arebil has joined #openscad

08:48 <Scopeuk> whilst there may be "more gains to be had" by going more specific if something as simple as that general approach can get a significant win that's quite exciting

08:48 <Scopeuk> I suppose it would be interesting to see if it holds up properly across the multi thread rendering branches as well

09:01 mhroncok has joined #openscad

09:09 lastrodamo has joined #openscad

09:25 arebil has quit [Quit: My keyboard has gone to sleep. ZZZzzz…]

11:33 arebil has joined #openscad

12:59 teepee has quit [Ping timeout: 276 seconds]

13:18 teepee has joined #openscad

13:47 arebil has quit [Quit: My keyboard has gone to sleep. ZZZzzz…]

14:32 <teepee> peepsalot: more visualization... https://twitter.com/ralight/status/1446052723594604544 :)

14:57 <Scopeuk> thats seams fun and also a touch unessesery

14:57 arebil has joined #openscad

14:58 <teepee> haha, yes, I'd agree on both points :)

15:19 peeps[zen] has joined #openscad

15:20 peepsalot has quit [Ping timeout: 245 seconds]

15:26 TheAssassin has quit [Remote host closed the connection]

15:26 TheAssassin has joined #openscad

15:43 peepsalot has joined #openscad

15:43 peeps[zen] has quit [Ping timeout: 245 seconds]

16:03 luissen has quit [Ping timeout: 245 seconds]

16:06 luissen has joined #openscad

16:08 arebil has quit [Quit: My keyboard has gone to sleep. ZZZzzz…]

16:19 wed has quit [Ping timeout: 260 seconds]

16:19 rogeliodh has joined #openscad

16:20 hisacro has quit [Ping timeout: 264 seconds]

16:21 wed has joined #openscad

16:21 NoGare[m] has quit [Ping timeout: 260 seconds]

16:22 rogeliodh3 has quit [Ping timeout: 250 seconds]

16:22 gbruno has quit [Ping timeout: 264 seconds]

16:25 gbruno has joined #openscad

16:34 hisacro has joined #openscad

16:36 NoGare[m] has joined #openscad

17:01 arebil has joined #openscad

18:02 <gbruno> [github] GilesBathgate closed issue #630 (User Space Primitives). https://github.com/openscad/openscad/issues/630

18:04 arebil has quit [Quit: My keyboard has gone to sleep. ZZZzzz…]

18:07 mhroncok has quit [Quit: Leaving.]

20:11 ur5us has joined #openscad

21:05 lastrodamo has quit [Quit: Leaving]

21:09 <peepsalot> teepee: looks pretty :) but yeah doesn't seem to convey any extra information. looks like the z values are all constant?

21:09 <peepsalot> z increments i mean

21:11 <peepsalot> was there ever an isue made about creating a benchmark suite? i could have sworn there was one but can't find it. maybe i'm just thinking of all the times I considered creating an issue but didn't get around to it...

21:23 josephl has quit [Ping timeout: 250 seconds]

21:24 raboof has quit [Ping timeout: 260 seconds]

21:26 raboof has joined #openscad

21:31 josephl has joined #openscad

22:40 <InPhase> peepsalot: I remember many discussions, but don't recall (or see) an issue about it.

22:50 <gbruno> [github] rcolyer closed issue #2464 (Segfault seemingly results from concavity in polyhedron). https://github.com/openscad/openscad/issues/2464

22:56 <InPhase> peepsalot: I believe my thinking about this before was that it would be meritorious to try to make the running of all tests log the runtimes, and then we simply need a routine (a little python script probably) to do a comparison of the runtime logs with some reference values for this, and use some fancy heuristic to decide what's different enough to raise a red flag. But of course it's machine

22:56 <InPhase> specific, so these are local non-repository reference values.

23:00 <InPhase> peepsalot: The analysis script should look for individual things that get slower, and also do some appropriate stats on the whole thing. If you want to team up on it, you could work on the code and logic for aggregating the test data into a log, and thinking about what extra tests if any would be needed to be more explicit about benchmarking certain things, and I could work on the timing analysis logic

23:00 <InPhase> (I do a lot of that type of thing for work).

23:02 <InPhase> peepsalot: I'm imagining an analysis script which takes as an input a list of filenames of timing logs, and so you pass in the timing logs from whichever runs you want to compare. Then it does some smart things (to be thought of) to tell you what's changing.

23:03 ali1234 has quit [Remote host closed the connection]

23:03 <peepsalot> i found an old text file where I started writing up the issue, i'm trying to finalize it now. ctest already does report a time for each test, which is a start. the thing is that our existing test suite is geared towards quickly testing as many possible bugs in a short time frame. i think we need more longer running tests, tuned to a general time frame (say 1-10s each?)

23:03 <InPhase> peepsalot: Which means one output file per benchmark run, perhaps test name and time per line, or some similar sensible format.

23:04 <InPhase> Yeah, a few seconds is typically adequate for most benchmarking purposes. But we'll also need a few longer ones tossed in for the things that do not scale linearly.

23:05 ali1234 has joined #openscad

23:05 <peepsalot> to facilitate reasonable timing across various hardware, the tests should be designed with a complexity parameter passed on command line, which scales the processing time roughly linearly

23:05 <InPhase> For analysis I might completely throw out all tests under some threshold value. At some point you're just testing the file system there.

23:06 <peepsalot> for example, if the test is unioning 2 very smooth spheres, then $fn should be proportional to sqrt(complexity)

23:07 <InPhase> A complexity analysis would be pretty nice. We'd need something different for a file format than test per line then, or we would need magic labels in the test name for grouping.

23:07 <peepsalot> ...since spheres faces are propertional to $fn^2

23:16 <InPhase> I can pretty easily pop out some plots too if we're going to do complexity scales. Maybe even some fits to expectation.

23:16 <InPhase> Expectation would need to be special cased, but we can do that.

23:25 ur5us has quit [Ping timeout: 252 seconds]

23:27 <peepsalot> maybe some miscommunication. i mean, i wouldn't particularly want to compare tests *across* different hardware, complexity scaling of tests would just be to keep runtimes in check, limiting CI time etc

23:28 <peepsalot> i would view results as only really comparable when coming from the same machine, so comparing 2 different builds of openscad or with different features enabled, etc.

23:29 <InPhase> Yeah, I assumed you meant that.

23:30 <InPhase> I meant expectation in terms of the scaling factors.

23:30 <InPhase> We'd want to know if that changed. :)

23:31 <InPhase> In general I think this will not be useful on the CI systems we use. Benchmarking will need to be manual by developers, because they cannot occur on shared systems with variable load.

23:32 <InPhase> If we wanted to automate it in any way, we'd need a dedicated or unloaded system automating the tests.

23:34 ferdna has joined #openscad

23:39 <peepsalot> hmm, true