<InPhase>
peepsalot: Yeah, extra space consumption might be it, but that quantity sure isn't bad. Essentially, the operative question I'm poking at is: If it works better that way, why isn't the standard one working that way? The license is such that it could be integrated as-is. But it hasn't been.
<InPhase>
I could buy the answer being that minimal space consumption is an expectation of the standard one.
LordOfBikes has quit [Ping timeout: 252 seconds]
LordOfBikes has joined #openscad
<peepsalot>
InPhase: devil's advocate question: If glibc's implementation is so well optimized, why do so many replacements exist, and are used in large projects?
ur5us_ has quit [Ping timeout: 245 seconds]
<peepsalot>
mimalloc project is also still relatively new, looks like only about 2yrs since inception
<peepsalot>
and i wouldn't underestimate social factors like "greybeards hate microsoft" coming into play either
<peepsalot>
the technically superior solution doesn't always end up as "standard" in general
<dalias>
if a custom malloc implementation makes a 35% *overall* speed improvement, fixing whatever has you performing 10 million mallocs per second would give you a 200% speed improvement
<dalias>
i don't get what the hype over mimalloc is. the only novel stuff in it is for nonstandard api where you explicitly pick a heap context you want to allocate in
<dalias>
otherwise it's a fairly standard design
luissen has quit [Quit: exited unexpectedly with error code -1]
luissen has joined #openscad
<InPhase>
peepsalot: From what I gather most replacement allocators do what's called slot allocation. I'm not sure if they can do slot allocation without type hints.
<InPhase>
peepsalot: i.e., if you're going to have a bunch of 24 byte std::variants, you first make an array of those, and start assigning out of it, then mark unused and reassign from that pile after delete/new.
<peepsalot>
dalias: problem is the way libGMP works by having every value on the heap, multiply that by 3 coordinates * #of vertices etc.
<dalias>
libgmp is a really bad choice for bignum :-p
<InPhase>
I guess you could blindly do it by malloc size, but that seems like variable size allocation would really throw that into a space inefficiency issue.
<dalias>
inphase, how so?
<InPhase>
dalias: If you allocate 2000 strings from size 1 to 2000, and your slot allocator preallocates 512 elements for each, suddenly you have 512*1000*2000 bytes, or 1GB, for what could have fit into 2MB.
<dalias>
ah yes
<InPhase>
dalias: Obviously parameterizable with different tradeoffs, but as an example.
<dalias>
this is why all the performance-oriented allocators have horrible memory usage
<dalias>
which then leads to bad performance because of swapping :-p
<InPhase>
But if you have type hints, then you don't treat an array of chars as if it's a fixed size thing.
<dalias>
well you kinda do just because otherwise you make bad fragmentation
<InPhase>
A well balanced slot allocator would slot fixed types, and use a variable heap area for array type allocations.
<dalias>
speaking from a standpoint of having done this, but focusing on hardening, fragmentation, and memory usage -- not speed
<InPhase>
Hahah. Found this while looking for mimalloc reviews. "On beta versions of Windows 95, SimCity wasn’t working in testing. Microsoft tracked down the bug and added specific code to Windows 95 that looks for SimCity. If it finds SimCity running, it runs the memory allocator in a special mode that doesn’t free memory right away."
<InPhase>
It's also design focused on cache locality of operations.
<InPhase>
Which has the side benefit that the allocated memory for active areas of execution ends up more cache local.
<peepsalot>
InPhase: yeah i saw the paper, but didn't read through it yet. i'm just checking out some of the other alternatives right now
<InPhase>
peepsalot: Okay, done reviewing. I definitely approve of this as a performance boost option. I think it does not have the C++-game-allocator performance you'd expect from a type aware slot allocator, but I don't think we have an option of using such with CGAL anyway. But with size-classing it does seem to make a smart balance between the variable size issue I raised above, wasting a little space and
<InPhase>
spreading things out a little, but overall clumping things much better into cache-hot less fragmented areas.
<InPhase>
I would not be surprised if some of the memory consumptions exceed your initial example, but I think a little extra is okay.
gunnbr has joined #openscad
gunnbr has quit [Ping timeout: 265 seconds]
gunnbr has joined #openscad
snaked has quit [Quit: Leaving]
snaked has joined #openscad
gunnbr_ has joined #openscad
ur5us_ has joined #openscad
gunnbr has quit [Ping timeout: 265 seconds]
arebil has joined #openscad
ur5us_ has quit [Quit: Leaving]
ur5us has joined #openscad
<gunnbr_>
status?
<othx>
Gthx.NET version 2.08 2021-08-14: OK; Up for 1 day, 13 hours, 4 minutes, 6 seconds; mood: pretty good.
<gunnbr_>
othx: Take five!
othx has quit [Remote host closed the connection]
othx has joined #openscad
<gunnbr_>
status?
<othx>
Gthx.NET version 2.08 2021-08-14: OK; Up for 22 seconds; mood: pretty good.
arebil has quit [Quit: My keyboard has gone to sleep. ZZZzzz…]
<ccox>
peeps - that's kind of what I was going to try. But I have to spend more time reading and learning the code and how OpenSCAD uses CGAL.
<ccox>
peepsalot: I think there is a larger opportunity for speed increases there, but I won't know until I dig into it more.
<ccox>
peepsalot: which is a shame, because the OSes already have a suballocator for small sized objects, but Mac, Win, and Linux all still waste a lot of time in them.
<ccox>
peepsalot: BUT, changing allocators needs a lot of testing. Many times they have dark corner cases with horrible performance.
<ccox>
InPhase: the OS suballocator doesn't know which pointer belongs to which sub block without doing a hefty lookup. An inline suballocator can know that ALL pointers belong to the same size block, and skip a lot of work.
<ccox>
Also, inline suballocators can skip VM locks and a lot of other cruft that some OSes hit in the allocator (or at least hit them N/BLOCK times less often)
ferdna has joined #openscad
<ccox>
peepsalot: glibc does not have a suballocator that I know of, but the major OSes do. The OS vendors optimize it, but for more common cases (Word, Excel, Chrome, Call of Duty, etc.).
<ccox>
peepsalot: you left out the overhead of each pointer having information buried somewhere by the OS. Allocating 12 bytes usually means you really allocated 20 or more. Never forget overhead!
<ccox>
Fortunately, I've done this kind of investigation and tuning before on well used desktop applications.
<peepsalot>
ccox: what do you mean I left out? in my explanation of libGMP? i was just addressing the number of calls to malloc, but yeah pointer overhead can be an issue too
<ccox>
peepsalot: sorry, was responding to each comment as I read them (trying to catch up after a not-great day)
othx has quit [Remote host closed the connection]
othx has joined #openscad
gunnbr_ has quit [Ping timeout: 265 seconds]
ferdna has quit [Quit: Leaving]
ur5us has quit [Ping timeout: 245 seconds]
arebil has joined #openscad
linext_ has joined #openscad
linext has quit [Ping timeout: 264 seconds]
gunnbr has joined #openscad
othx has quit [Remote host closed the connection]
othx has joined #openscad
othx has quit [Remote host closed the connection]
othx has joined #openscad
InPhase has quit [Ping timeout: 260 seconds]
InPhase has joined #openscad
gunnbr has quit [Ping timeout: 265 seconds]
TheAssassin has quit [Remote host closed the connection]
arebil has quit [Quit: My keyboard has gone to sleep. ZZZzzz…]
arebil has joined #openscad
<Scopeuk>
whilst there may be "more gains to be had" by going more specific if something as simple as that general approach can get a significant win that's quite exciting
<Scopeuk>
I suppose it would be interesting to see if it holds up properly across the multi thread rendering branches as well
mhroncok has joined #openscad
lastrodamo has joined #openscad
arebil has quit [Quit: My keyboard has gone to sleep. ZZZzzz…]
arebil has joined #openscad
teepee has quit [Ping timeout: 276 seconds]
teepee has joined #openscad
arebil has quit [Quit: My keyboard has gone to sleep. ZZZzzz…]
arebil has quit [Quit: My keyboard has gone to sleep. ZZZzzz…]
mhroncok has quit [Quit: Leaving.]
ur5us has joined #openscad
lastrodamo has quit [Quit: Leaving]
<peepsalot>
teepee: looks pretty :) but yeah doesn't seem to convey any extra information. looks like the z values are all constant?
<peepsalot>
z increments i mean
<peepsalot>
was there ever an isue made about creating a benchmark suite? i could have sworn there was one but can't find it. maybe i'm just thinking of all the times I considered creating an issue but didn't get around to it...
josephl has quit [Ping timeout: 250 seconds]
raboof has quit [Ping timeout: 260 seconds]
raboof has joined #openscad
josephl has joined #openscad
<InPhase>
peepsalot: I remember many discussions, but don't recall (or see) an issue about it.
<InPhase>
peepsalot: I believe my thinking about this before was that it would be meritorious to try to make the running of all tests log the runtimes, and then we simply need a routine (a little python script probably) to do a comparison of the runtime logs with some reference values for this, and use some fancy heuristic to decide what's different enough to raise a red flag. But of course it's machine
<InPhase>
specific, so these are local non-repository reference values.
<InPhase>
peepsalot: The analysis script should look for individual things that get slower, and also do some appropriate stats on the whole thing. If you want to team up on it, you could work on the code and logic for aggregating the test data into a log, and thinking about what extra tests if any would be needed to be more explicit about benchmarking certain things, and I could work on the timing analysis logic
<InPhase>
(I do a lot of that type of thing for work).
<InPhase>
peepsalot: I'm imagining an analysis script which takes as an input a list of filenames of timing logs, and so you pass in the timing logs from whichever runs you want to compare. Then it does some smart things (to be thought of) to tell you what's changing.
ali1234 has quit [Remote host closed the connection]
<peepsalot>
i found an old text file where I started writing up the issue, i'm trying to finalize it now. ctest already does report a time for each test, which is a start. the thing is that our existing test suite is geared towards quickly testing as many possible bugs in a short time frame. i think we need more longer running tests, tuned to a general time frame (say 1-10s each?)
<InPhase>
peepsalot: Which means one output file per benchmark run, perhaps test name and time per line, or some similar sensible format.
<InPhase>
Yeah, a few seconds is typically adequate for most benchmarking purposes. But we'll also need a few longer ones tossed in for the things that do not scale linearly.
ali1234 has joined #openscad
<peepsalot>
to facilitate reasonable timing across various hardware, the tests should be designed with a complexity parameter passed on command line, which scales the processing time roughly linearly
<InPhase>
For analysis I might completely throw out all tests under some threshold value. At some point you're just testing the file system there.
<peepsalot>
for example, if the test is unioning 2 very smooth spheres, then $fn should be proportional to sqrt(complexity)
<InPhase>
A complexity analysis would be pretty nice. We'd need something different for a file format than test per line then, or we would need magic labels in the test name for grouping.
<peepsalot>
...since spheres faces are propertional to $fn^2
<InPhase>
I can pretty easily pop out some plots too if we're going to do complexity scales. Maybe even some fits to expectation.
<InPhase>
Expectation would need to be special cased, but we can do that.
ur5us has quit [Ping timeout: 252 seconds]
<peepsalot>
maybe some miscommunication. i mean, i wouldn't particularly want to compare tests *across* different hardware, complexity scaling of tests would just be to keep runtimes in check, limiting CI time etc
<peepsalot>
i would view results as only really comparable when coming from the same machine, so comparing 2 different builds of openscad or with different features enabled, etc.
<InPhase>
Yeah, I assumed you meant that.
<InPhase>
I meant expectation in terms of the scaling factors.
<InPhase>
We'd want to know if that changed. :)
<InPhase>
In general I think this will not be useful on the CI systems we use. Benchmarking will need to be manual by developers, because they cannot occur on shared systems with variable load.
<InPhase>
If we wanted to automate it in any way, we'd need a dedicated or unloaded system automating the tests.