#openscad on 2023-01-19 — irc logs at libera.irclog.whitequark.org

00:00 <JordanBrown[m]> I thought that most good malloc implementations these days managed caches of small allocations, so that they are very fast.

00:04 <JordanBrown[m]> However, in reviewing the documentation on mimalloc, it seems that it is not one of those...

00:18 guso78 has quit [Quit: Client closed]

00:38 teepee has quit [Ping timeout: 255 seconds]

00:41 teepee has joined #openscad

00:48 <InPhase> JordanBrown[m]: There's not much they can do to improve it really. A little bit, but not a lot. A cache miss itself is easily on the same scale. It'll be tricky to keep your whole disorganized bucket of 8 byte slots and the equally sized ring buffer of free 8 byte slots in cache. This is going to be a much more spread out pile of memory than the stack.

00:49 <InPhase> SBO will win over this by a lot. Even if you end up with your SBO objects on the heap in a vector or something, they are still sequential and cacheable.

00:49 <InPhase> The key performance requirement is active data locality.

00:51 <InPhase> If on the other hand you try to keep the active 8 byte allocations local with the allocator, then you have a complicated search algorithm for free space, and you're back to high overhead for short allocate/deallocate cycles.

00:52 <InPhase> Data locality really needs to be planned into the program flow, or result somewhat automatically from the program flow, to work well.

00:56 <InPhase> An easy allocation optimization example: You're working with 3D vectors, so try to handle all 3 at once, and allocate as a single chunk. The allocation for 3D vectors would then have 1/3rd of the overhead if you can keep the same general calculation requirement while doing this.

00:56 <InPhase> But that requires breaking apart the abstraction.

01:23 ur5us has joined #openscad

01:27 kintel has joined #openscad

01:53 Guest5498 is now known as Sauvin

02:06 kintel has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

02:51 LordOfBikes has quit [Ping timeout: 256 seconds]

02:58 SamantazFox has quit [Ping timeout: 246 seconds]

02:58 SamantazFox has joined #openscad

03:02 SamantazFox has quit [Killed (NickServ (GHOST command used by SamantazFox_))]

03:02 SamantazFox_ has joined #openscad

03:03 LordOfBikes has joined #openscad

03:27 ur5us has quit [Ping timeout: 260 seconds]

03:28 Teslamax has quit [Read error: Connection reset by peer]

03:28 Teslamax has joined #openscad

03:39 <peepsalot> InPhase: the data structure is documented here btw in case you were curious: https://gmplib.org/manual/Integer-Internals

03:39 <peepsalot> just a struct with 3 members

03:46 <JordanBrown[m]> InPhase: Sorry, the performance hit that I was concerned about was the 5 billion allocations of 8 bytes each. With a cache (not in the RAM sense) of 8-byte allocations, an a malloc could normally be as little as a few dozen instructions, and a free similarly cheap.

04:31 <InPhase> JordanBrown[m]: Yes. I meant that if you did that, and you could, the implementation would necessarily over the execution of a complicated program end up with those allocated 8 bytes being spread almost randomly over a very wide section of memory reserved for these 8 byte allocations.

04:32 <InPhase> JordanBrown[m]: So if you have a vector of pointers to 8 byte chunks spread randomly all over a gigabyte of RAM, this is going to massively underperform a vector of SBO'd 8 byte data, even if it's SBO'd into 64 byte blocks.

04:33 <InPhase> And the performance difference is going to be similar to, or maybe even worse, than just using a classic allocator.

04:43 <JordanBrown[m]> Yes… not allocating is better than allocating. But of course it requires the cooperation of the caller. All I’m saying is that large numbers of small allocations need not be very expensive.

04:44 <InPhase> The standard Linux allocator makes an effort at locality. It clumps regions of memory by size ranges, and has a spot for small stuff and a spot for large stuff, and then it tries to reuse the most recently deallocated small stuff pointer if it can fit a new allocation into the space following it.

04:47 <JordanBrown[m]> This is the one I’m used to: https://en.wikipedia.org/wiki/Libumem

04:48 <JordanBrown[m]> I’m not familiar with the details - not my specialty - but it was written by some pretty smart people.

04:50 <InPhase> Ah yes. Looking at that reminds me of one of the other core issues I completely forgot about.

04:50 <JordanBrown[m]> This one is better but requires SPARC hardware features: https://docs.oracle.com/cd/E86824_01/html/E54772/libadimalloc-3lib.html

04:50 <InPhase> I clicked through to the Oracle docs, which point out that many of the features break in threaded programs.

04:51 <peepsalot> i think on x86_64, sizeof(mpz_t) is 16: 4+8+4, so that would leave 48 bytes of cacheline, or 24 bytes of each numerator and denominator

04:51 <InPhase> Locking to allocate memory is part of the issue. You basically have to preallocate memory chunks for each thread to get around the locking.

04:52 <JordanBrown[m]> Hmm I will look. We certainly use it and it’s relatives in very highly threaded programs.

04:52 <InPhase> And then you cannot pass data between threads if you do that.

04:52 <JordanBrown[m]> Oops it’s

04:52 <InPhase> https://docs.oracle.com/cd/E86824_01/html/E54766/umem-alloc-3malloc.html

04:52 <InPhase> I was looking there.

04:52 <InPhase> There are the different call options.

04:52 <JordanBrown[m]> Argh DYAC its

04:53 <InPhase> And this conversation in fact started with a discussion of multhreaded CGAL. :)

04:54 <JordanBrown[m]> Where are you seeing a MT limitation?

04:54 <InPhase> "have no support for concurrency"

04:55 <JordanBrown[m]> Um that is a comparison with other libraries. Libumem is the one that does have MT support.

04:55 <JordanBrown[m]> Could be written more clearly.

04:56 <JordanBrown[m]> Solaris has an appalling number of implementations of malloc().

04:57 <InPhase> Oh, I did not understand that. I thought it was multiple calls the library provided.

04:57 <JordanBrown[m]> I am basically certain that that statement is intended to compare the libc version of malloc with the libumem functions, including libumem’s malloc.

04:57 <JordanBrown[m]> Like I said, we use it in very large MT programs.

04:58 <JordanBrown[m]> Notably, libumem is a user space port of the allocator used in the kernel.

04:58 <JordanBrown[m]> I don’t know offhand whether it shares sources, but I think it might.

04:58 <InPhase> I don't know what 3C and 3MALLOC are supposed to stand for.

04:58 <JordanBrown[m]> Manual page sections.

04:58 <InPhase> Although I haven't seen a non-thread-safe standard malloc in a very long time.

04:59 <InPhase> Outside of non-threading systems, that is.

04:59 snaked has quit [Ping timeout: 256 seconds]

04:59 <JordanBrown[m]> Yeah, I am kind of surprised by that. But libumem is enough better than the libc one that I think the only reason anybody uses the libc one is that they haven’t remembered to -lumem.

05:01 <InPhase> The standard Linux allocator does in fact waste space sometimes, under the premise that runtime and cache locality are more favorable aims than optimal packing.

05:02 <JordanBrown[m]> Hmm the libc malloc says that it is MT safe. Maybe the comparison is intended to describe that libumem is optimized for MT; I believe it tries to avoid needing locks most of the time, or at least to spread the lock usage across multiple locks to avoid contention.

05:02 <InPhase> They have an "mtmalloc" listed there, which is perhaps closer to the standard malloc everybody else has been using for a while? :)

05:04 epony has joined #openscad

05:04 <JordanBrown[m]> Reading that doc, I think mtmalloc is a malloc derivative that does better in MT environments, perhaps by reducing lock contention or by avoiding cache contention.

05:04 <InPhase> I don't know the platform though. I haven't touched a Solaris system in 21 years. I actually thought they stopped releasing it, but wikipedia says it's still active!

05:08 <JordanBrown[m]> Still pays for my mortgage.

05:08 <InPhase> One of my college-years jobs was for the CS department, to spruce up the new Linux lab so that people would migrate over there and stop using the Sun lab.

05:09 <JordanBrown[m]> https://github.com/illumos/illumos-gate/blob/9af60fb077770c1324d6f7501d6081f02c1d36c1/usr/src/lib/libc/port/gen/malloc.c#L163 is the standard malloc, at least as of ten years ago or so. Nobody has been working on it so my bet is it is identical to current.

05:09 <InPhase> So I setup a little system to standardize the configurations and package deployments across all the Linux systems, and dust started growing on the Sun machines.

05:09 <JordanBrown[m]> Note the big global lock around malloc.

05:10 <InPhase> Yeah, it's pretty mandatory to have that lock.

05:11 <JordanBrown[m]> Libumem is in some fashion - again, not my specialty - much cleverer. When they say “concurrent allocations”, they mean that two threads can be doing allocations at the same time.

05:13 <JordanBrown[m]> By way of comparison, here is mtmalloc: https://github.com/illumos/illumos-gate/blob/9af60fb077770c1324d6f7501d6081f02c1d36c1/usr/src/lib/libmtmalloc/common/mtmalloc.c#L233

05:14 <JordanBrown[m]> It is not merely MT safe; it is MT hot.

05:18 <peepsalot> JordanBrown[m]: by the way adding mimalloc resulted in something like 40-80% speedup of nef operations, so its not like mimalloc is slow by any means

05:19 <peepsalot> i compared it against every other malloc replacement option I could find and get running at the time, and it was fastest (with reasonably efficient mem usage)

05:22 <InPhase> peepsalot: Ah, I see mimalloc describes having a lot of separately managed memory blocks so that cross-thread allocations and frees don't block on small stuff.

05:23 <InPhase> You would need thousands of threads trying to allocate at once to get significant collisions.

05:23 <JordanBrown[m]> Here is some stuff on how libumem tries to avoid locking: https://github.com/illumos/illumos-gate/blob/9af60fb077770c1324d6f7501d6081f02c1d36c1/usr/src/lib/libumem/common/umem.c#L377

05:24 <peepsalot> yeah, it does some kinda thread local allocations as i understand

05:26 <InPhase> JordanBrown[m]: From a quick skim, libumem sounds similar but with some sort of elaborate tree structure.

05:27 <JordanBrown[m]> I have never really tried to understand it, but I know it caches allocated blocks by size.

05:31 <peepsalot> if gmp weren't limited to C, for exacmple if it had accessor functions instead of PODs, then I think a lot of the struct overhead for SBO could be avoided

05:32 <peepsalot> and I made a mistake earlier, even if its 16bytes for mpz_t, would need two of them for a rational, which would leave only 32bytes for the limb data, or 16 bytes for each numerator/denominator (not 24)

05:34 <peepsalot> i mean, i'm not sure how crucial the whole cacheline thing is, it could be bigger still. (eg I think one of the tricks fmt library does is actually SSO of ~500 bytes)

05:36 <peepsalot> but would also potentially lead to very poor utilization, extra high memory usage for storing geometries

05:37 <peepsalot> i wish someone would make a good c++ multiprecision library already

05:38 <peepsalot> i think the memory management aspect could be done a lot smarter, at a low level

05:51 use-value has quit [Quit: use-value]

05:55 snaked has joined #openscad

05:57 Teslamax has quit [Read error: Connection reset by peer]

05:57 Teslamax has joined #openscad

06:33 Teslamax has quit [Read error: Connection reset by peer]

06:34 Teslamax has joined #openscad

06:34 ur5us has joined #openscad

07:15 guso78 has joined #openscad

08:46 GNUmoon has quit [Remote host closed the connection]

08:47 GNUmoon has joined #openscad

09:03 use-value has joined #openscad

09:06 use-value has quit [Client Quit]

09:07 teepee_ has joined #openscad

09:10 teepee has quit [Ping timeout: 255 seconds]

09:10 teepee_ is now known as teepee

09:26 use-value has joined #openscad

09:35 castaway has joined #openscad

09:40 xxpolitf has joined #openscad

09:43 SamantazFox has joined #openscad

09:45 ur5us has quit [Ping timeout: 256 seconds]

09:45 SamantazFox_ has quit [Ping timeout: 252 seconds]

10:04 GNUmoon has quit [Remote host closed the connection]

10:05 GNUmoon has joined #openscad

10:12 <greenbigfrog> printing the threaded parts at 0.15 layer height made them work so much better. but I just realized, if everything else works with magnets here, why not make the lid use magnets as well

10:16 <guso78> do you use the magnets with the printer or with the thread ?

10:36 be7b5 has joined #openscad

10:36 be7b5 has quit [Client Quit]

10:41 guso7884 has joined #openscad

10:50 guso7884 has quit [Ping timeout: 260 seconds]

10:54 <greenbigfrog> guso78: not sure I follow.

11:02 ccox has joined #openscad

11:03 ccox_ has quit [Ping timeout: 256 seconds]

11:47 <guso78> greenbigfrog, where do you use the magnet ? which lid ?

11:50 <guso78> in "python-openscad" i could actually store two nodes(one negative one in addition, which has negative volume by default and can be created by "invert" function ). the final output function will calculate the difference between the positive and the negative node before output.

11:51 <guso78> another feature which i investigate is to attach a python dictionary to each openscad object which can be accessed with [] operator to store arbritary data together with each node. one idea is to store named coordinates of a cube, a cylinder ... to access them later for relative positioning.

11:54 <greenbigfrog> oh, sorry, https://github.com/greenbigfrog/ultimate-ptfe-coupling was getting help on the threads the last days, hence I shared here.

11:58 J23 has joined #openscad

13:18 snaked has quit [Quit: Leaving]

13:37 aiyion has quit [Ping timeout: 255 seconds]

13:37 TheCoffeMaker has quit [Ping timeout: 252 seconds]

13:40 TheCoffeMaker has joined #openscad

13:40 aiyion has joined #openscad

13:58 use-value has quit [Remote host closed the connection]

13:58 use-value has joined #openscad

14:43 qeed has joined #openscad

15:33 <gbruno> [github] gsohler synchronize pull request #4498 (OpenSCAD with a Python Engine) https://github.com/openscad/openscad/pull/4498

15:39 <guso78> next could be "+" for union and "*" for intersection IMHO

15:40 <guso78> "-" for difference

15:45 J23 has quit [Quit: Client closed]

15:45 J23 has joined #openscad

15:47 J23 has quit [Client Quit]

15:48 J23 has joined #openscad

15:52 J23 has quit [Client Quit]

15:52 J23 has joined #openscad

16:07 <guso78> output(cube() * sphere() )

16:07 <gbruno> [github] gsohler synchronize pull request #4498 (OpenSCAD with a Python Engine) https://github.com/openscad/openscad/pull/4498

16:10 guso78 has quit [Quit: Client closed]

16:11 xxpolitf has quit [Quit: leaving]

16:26 J23 has quit [Quit: Client closed]

16:26 J23 has joined #openscad

16:33 guso78 has joined #openscad

16:44 lauraaa has joined #openscad

16:44 <lauraaa> what's up guys, its been a while

16:44 <lauraaa> how are you doing?

16:58 use-value has quit [Remote host closed the connection]

16:58 use-value has joined #openscad

18:09 lauraaa has quit [Quit: Client closed]

18:19 kintel has joined #openscad

18:26 lauraaa has joined #openscad

18:27 teepee_ has joined #openscad

18:29 teepee has quit [Ping timeout: 255 seconds]

18:29 teepee_ is now known as teepee

18:33 <JordanBrown[m]> Rather than adding dictionary functionality to your geometry object, note that the caller can always put a geometry object in whatever structure *it* wants to - in a dictionary, next to a dictionary, et cetera.

18:33 <JordanBrown[m]> By way of analogy to boolean arithmetic, the obvious operator for intersection is &.

18:47 L29Ah has left #openscad [#openscad]

18:49 lauraaa has quit [Quit: Client closed]

18:52 epony has quit [Quit: QUIT]

18:59 J23 has quit [Quit: Client closed]

18:59 J23 has joined #openscad

19:10 <J23> lauraaa .. wasn't that the braille project ?

19:32 kintel has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

19:54 guso7812 has joined #openscad

19:55 <guso7812> jordan, you are right, & id better :)

19:56 <guso7812> there are two worlds ... + -, *

19:56 <guso7812> and binary: | for union, & for intersection, but what is difference ? (and not does not have an easy operator)

20:00 ur5us has joined #openscad

20:05 L29Ah has joined #openscad

20:05 L29Ah has left #openscad [#openscad]

20:18 Lagopus has quit [Read error: Connection reset by peer]

20:26 <JordanBrown[m]> In the Boolean arithmetic world, in C, "not" is "~". While it's conceptually meaningful in geometry - ~cube() is the entire universe *except* the cube - OpenSCAD doesn't currently support that concept. I don't know whether CGAL does.

20:27 <JordanBrown[m]> https://xkcd.com/2403/

20:40 <InPhase> JordanBrown[m]: https://github.com/openscad/openscad/issues/3818

20:40 <InPhase> There is at least one extra semantic issue that I identified in that issue post.

20:41 <InPhase> I had to look it up to remember what it was.

20:42 <InPhase> In retrospect, syntactic children() was probably a design flaw. It sure has caused a lot of downstream inconveniences.

20:45 L29Ah has joined #openscad

20:50 L29Ah has quit [Read error: Connection reset by peer]

20:53 L29Ah has joined #openscad

21:11 guso7812 has quit [Ping timeout: 260 seconds]

21:25 ur5us has quit [Ping timeout: 255 seconds]

21:28 ur5us has joined #openscad

21:31 kintel has joined #openscad

21:32 <kintel> Considering OpenSCAD was pretty much a proof of concept which got popular before we had time to clean it up, it has done impressively well :)

21:34 <kintel> If we can get a multi-language design working nicely, on top of the Node API, a future opportunity could be to design and implement a new OpenSCAD language without killing backwards compatibility (and perhaps mix and match libraries). The Python exploration could be an important crystallization point for building out that infrastructure

21:54 J23 has quit [Quit: Client closed]

21:54 J23 has joined #openscad

21:57 <linext> i thought of a good customizer

21:58 <linext> laptop and desktop RAM holder for spare sticks of RAM

21:58 <linext> input would be number of pieces for each type

22:10 guso78 has quit [Quit: Client closed]

22:10 Lagopus has joined #openscad

22:19 Teslamax has quit [Read error: Connection reset by peer]

22:19 Teslamax has joined #openscad

22:37 Lagopus has quit [Read error: Connection reset by peer]

22:40 J23 has quit [Quit: Client closed]

22:40 J23 has joined #openscad

23:01 ur5us has quit [Ping timeout: 256 seconds]

23:22 <JordanBrown[m]> InPhase: note that Boolean “not” is not negation. The result is the complement of the input; is it the result of subtracting the input from the universe.

23:23 kintel has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

23:24 <JordanBrown[m]> Oops is it => it is

23:25 <JordanBrown[m]> Union(a, not(a)) is the universe, not the empty set.

23:27 <JordanBrown[m]> Difference(a, b) is intersection(a, not(b))

23:30 la1yv has quit [Ping timeout: 260 seconds]

23:34 la1yv has joined #openscad

23:41 gunnbr has joined #openscad