teepee changed the topic of #openscad to: OpenSCAD - The Programmers Solid 3D CAD Modeller | This channel is logged! | Website: http://www.openscad.org/ | FAQ: https://goo.gl/pcT7y3 | Request features / report bugs: https://goo.gl/lj0JRI | Tutorial: https://bit.ly/37P6z0B | Books: https://bit.ly/3xlLcQq | FOSDEM 2020: https://bit.ly/35xZGy6 | Logs: https://bit.ly/32MfbH5
<JordanBrown[m]> I thought that most good malloc implementations these days managed caches of small allocations, so that they are very fast.
<JordanBrown[m]> However, in reviewing the documentation on mimalloc, it seems that it is not one of those...
guso78 has quit [Quit: Client closed]
teepee has quit [Ping timeout: 255 seconds]
teepee has joined #openscad
<InPhase> JordanBrown[m]: There's not much they can do to improve it really. A little bit, but not a lot. A cache miss itself is easily on the same scale. It'll be tricky to keep your whole disorganized bucket of 8 byte slots and the equally sized ring buffer of free 8 byte slots in cache. This is going to be a much more spread out pile of memory than the stack.
<InPhase> SBO will win over this by a lot. Even if you end up with your SBO objects on the heap in a vector or something, they are still sequential and cacheable.
<InPhase> The key performance requirement is active data locality.
<InPhase> If on the other hand you try to keep the active 8 byte allocations local with the allocator, then you have a complicated search algorithm for free space, and you're back to high overhead for short allocate/deallocate cycles.
<InPhase> Data locality really needs to be planned into the program flow, or result somewhat automatically from the program flow, to work well.
<InPhase> An easy allocation optimization example: You're working with 3D vectors, so try to handle all 3 at once, and allocate as a single chunk. The allocation for 3D vectors would then have 1/3rd of the overhead if you can keep the same general calculation requirement while doing this.
<InPhase> But that requires breaking apart the abstraction.
ur5us has joined #openscad
kintel has joined #openscad
Guest5498 is now known as Sauvin
kintel has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
LordOfBikes has quit [Ping timeout: 256 seconds]
SamantazFox has quit [Ping timeout: 246 seconds]
SamantazFox has joined #openscad
SamantazFox has quit [Killed (NickServ (GHOST command used by SamantazFox_))]
SamantazFox_ has joined #openscad
LordOfBikes has joined #openscad
ur5us has quit [Ping timeout: 260 seconds]
Teslamax has quit [Read error: Connection reset by peer]
Teslamax has joined #openscad
<peepsalot> InPhase: the data structure is documented here btw in case you were curious: https://gmplib.org/manual/Integer-Internals
<peepsalot> just a struct with 3 members
<JordanBrown[m]> InPhase: Sorry, the performance hit that I was concerned about was the 5 billion allocations of 8 bytes each. With a cache (not in the RAM sense) of 8-byte allocations, an a malloc could normally be as little as a few dozen instructions, and a free similarly cheap.
<InPhase> JordanBrown[m]: Yes. I meant that if you did that, and you could, the implementation would necessarily over the execution of a complicated program end up with those allocated 8 bytes being spread almost randomly over a very wide section of memory reserved for these 8 byte allocations.
<InPhase> JordanBrown[m]: So if you have a vector of pointers to 8 byte chunks spread randomly all over a gigabyte of RAM, this is going to massively underperform a vector of SBO'd 8 byte data, even if it's SBO'd into 64 byte blocks.
<InPhase> And the performance difference is going to be similar to, or maybe even worse, than just using a classic allocator.
<JordanBrown[m]> Yes… not allocating is better than allocating. But of course it requires the cooperation of the caller. All I’m saying is that large numbers of small allocations need not be very expensive.
<InPhase> The standard Linux allocator makes an effort at locality. It clumps regions of memory by size ranges, and has a spot for small stuff and a spot for large stuff, and then it tries to reuse the most recently deallocated small stuff pointer if it can fit a new allocation into the space following it.
<JordanBrown[m]> This is the one I’m used to: https://en.wikipedia.org/wiki/Libumem
<JordanBrown[m]> I’m not familiar with the details - not my specialty - but it was written by some pretty smart people.
<InPhase> Ah yes. Looking at that reminds me of one of the other core issues I completely forgot about.
<JordanBrown[m]> This one is better but requires SPARC hardware features: https://docs.oracle.com/cd/E86824_01/html/E54772/libadimalloc-3lib.html
<InPhase> I clicked through to the Oracle docs, which point out that many of the features break in threaded programs.
<peepsalot> i think on x86_64, sizeof(mpz_t) is 16: 4+8+4, so that would leave 48 bytes of cacheline, or 24 bytes of each numerator and denominator
<InPhase> Locking to allocate memory is part of the issue. You basically have to preallocate memory chunks for each thread to get around the locking.
<JordanBrown[m]> Hmm I will look. We certainly use it and it’s relatives in very highly threaded programs.
<InPhase> And then you cannot pass data between threads if you do that.
<JordanBrown[m]> Oops it’s
<InPhase> I was looking there.
<InPhase> There are the different call options.
<JordanBrown[m]> Argh DYAC its
<InPhase> And this conversation in fact started with a discussion of multhreaded CGAL. :)
<JordanBrown[m]> Where are you seeing a MT limitation?
<InPhase> "have no support for concurrency"
<JordanBrown[m]> Um that is a comparison with other libraries. Libumem is the one that does have MT support.
<JordanBrown[m]> Could be written more clearly.
<JordanBrown[m]> Solaris has an appalling number of implementations of malloc().
<InPhase> Oh, I did not understand that. I thought it was multiple calls the library provided.
<JordanBrown[m]> I am basically certain that that statement is intended to compare the libc version of malloc with the libumem functions, including libumem’s malloc.
<JordanBrown[m]> Like I said, we use it in very large MT programs.
<JordanBrown[m]> Notably, libumem is a user space port of the allocator used in the kernel.
<JordanBrown[m]> I don’t know offhand whether it shares sources, but I think it might.
<InPhase> I don't know what 3C and 3MALLOC are supposed to stand for.
<JordanBrown[m]> Manual page sections.
<InPhase> Although I haven't seen a non-thread-safe standard malloc in a very long time.
<InPhase> Outside of non-threading systems, that is.
snaked has quit [Ping timeout: 256 seconds]
<JordanBrown[m]> Yeah, I am kind of surprised by that. But libumem is enough better than the libc one that I think the only reason anybody uses the libc one is that they haven’t remembered to -lumem.
<InPhase> The standard Linux allocator does in fact waste space sometimes, under the premise that runtime and cache locality are more favorable aims than optimal packing.
<JordanBrown[m]> Hmm the libc malloc says that it is MT safe. Maybe the comparison is intended to describe that libumem is optimized for MT; I believe it tries to avoid needing locks most of the time, or at least to spread the lock usage across multiple locks to avoid contention.
<InPhase> They have an "mtmalloc" listed there, which is perhaps closer to the standard malloc everybody else has been using for a while? :)
epony has joined #openscad
<JordanBrown[m]> Reading that doc, I think mtmalloc is a malloc derivative that does better in MT environments, perhaps by reducing lock contention or by avoiding cache contention.
<InPhase> I don't know the platform though. I haven't touched a Solaris system in 21 years. I actually thought they stopped releasing it, but wikipedia says it's still active!
<JordanBrown[m]> Still pays for my mortgage.
<InPhase> One of my college-years jobs was for the CS department, to spruce up the new Linux lab so that people would migrate over there and stop using the Sun lab.
<JordanBrown[m]> https://github.com/illumos/illumos-gate/blob/9af60fb077770c1324d6f7501d6081f02c1d36c1/usr/src/lib/libc/port/gen/malloc.c#L163 is the standard malloc, at least as of ten years ago or so. Nobody has been working on it so my bet is it is identical to current.
<InPhase> So I setup a little system to standardize the configurations and package deployments across all the Linux systems, and dust started growing on the Sun machines.
<JordanBrown[m]> Note the big global lock around malloc.
<InPhase> Yeah, it's pretty mandatory to have that lock.
<JordanBrown[m]> Libumem is in some fashion - again, not my specialty - much cleverer. When they say “concurrent allocations”, they mean that two threads can be doing allocations at the same time.
<JordanBrown[m]> It is not merely MT safe; it is MT hot.
<peepsalot> JordanBrown[m]: by the way adding mimalloc resulted in something like 40-80% speedup of nef operations, so its not like mimalloc is slow by any means
<peepsalot> i compared it against every other malloc replacement option I could find and get running at the time, and it was fastest (with reasonably efficient mem usage)
<InPhase> peepsalot: Ah, I see mimalloc describes having a lot of separately managed memory blocks so that cross-thread allocations and frees don't block on small stuff.
<InPhase> You would need thousands of threads trying to allocate at once to get significant collisions.
<peepsalot> yeah, it does some kinda thread local allocations as i understand
<InPhase> JordanBrown[m]: From a quick skim, libumem sounds similar but with some sort of elaborate tree structure.
<JordanBrown[m]> I have never really tried to understand it, but I know it caches allocated blocks by size.
<peepsalot> if gmp weren't limited to C, for exacmple if it had accessor functions instead of PODs, then I think a lot of the struct overhead for SBO could be avoided
<peepsalot> and I made a mistake earlier, even if its 16bytes for mpz_t, would need two of them for a rational, which would leave only 32bytes for the limb data, or 16 bytes for each numerator/denominator (not 24)
<peepsalot> i mean, i'm not sure how crucial the whole cacheline thing is, it could be bigger still. (eg I think one of the tricks fmt library does is actually SSO of ~500 bytes)
<peepsalot> but would also potentially lead to very poor utilization, extra high memory usage for storing geometries
<peepsalot> i wish someone would make a good c++ multiprecision library already
<peepsalot> i think the memory management aspect could be done a lot smarter, at a low level
use-value has quit [Quit: use-value]
snaked has joined #openscad
Teslamax has quit [Read error: Connection reset by peer]
Teslamax has joined #openscad
Teslamax has quit [Read error: Connection reset by peer]
Teslamax has joined #openscad
ur5us has joined #openscad
guso78 has joined #openscad
GNUmoon has quit [Remote host closed the connection]
GNUmoon has joined #openscad
use-value has joined #openscad
use-value has quit [Client Quit]
teepee_ has joined #openscad
teepee has quit [Ping timeout: 255 seconds]
teepee_ is now known as teepee
use-value has joined #openscad
castaway has joined #openscad
xxpolitf has joined #openscad
SamantazFox has joined #openscad
ur5us has quit [Ping timeout: 256 seconds]
SamantazFox_ has quit [Ping timeout: 252 seconds]
GNUmoon has quit [Remote host closed the connection]
GNUmoon has joined #openscad
<greenbigfrog> printing the threaded parts at 0.15 layer height made them work so much better. but I just realized, if everything else works with magnets here, why not make the lid use magnets as well
<guso78> do you use the magnets with the printer or with the thread ?
be7b5 has joined #openscad
be7b5 has quit [Client Quit]
guso7884 has joined #openscad
guso7884 has quit [Ping timeout: 260 seconds]
<greenbigfrog> guso78: not sure I follow.
ccox has joined #openscad
ccox_ has quit [Ping timeout: 256 seconds]
<guso78> greenbigfrog, where do you use the magnet ? which lid ?
<guso78> in "python-openscad" i could actually store two nodes(one negative one in addition, which has negative volume by default and can be created by "invert" function ). the final output function will calculate the difference between the positive and the negative node before output.
<guso78> another feature which i investigate is to attach a python dictionary to each openscad object which can be accessed with [] operator to store arbritary data together with each node. one  idea is to store named coordinates of a cube, a cylinder ... to access them later for relative positioning.
<greenbigfrog> oh, sorry, https://github.com/greenbigfrog/ultimate-ptfe-coupling was getting help on the threads the last days, hence I shared here.
J23 has joined #openscad
snaked has quit [Quit: Leaving]
aiyion has quit [Ping timeout: 255 seconds]
TheCoffeMaker has quit [Ping timeout: 252 seconds]
TheCoffeMaker has joined #openscad
aiyion has joined #openscad
use-value has quit [Remote host closed the connection]
use-value has joined #openscad
qeed has joined #openscad
<gbruno> [github] gsohler synchronize pull request #4498 (OpenSCAD with a Python Engine) https://github.com/openscad/openscad/pull/4498
<guso78> next could be "+" for union and "*" for intersection IMHO
<guso78> "-" for difference
J23 has quit [Quit: Client closed]
J23 has joined #openscad
J23 has quit [Client Quit]
J23 has joined #openscad
J23 has quit [Client Quit]
J23 has joined #openscad
<guso78> output(cube() * sphere() )
<gbruno> [github] gsohler synchronize pull request #4498 (OpenSCAD with a Python Engine) https://github.com/openscad/openscad/pull/4498
guso78 has quit [Quit: Client closed]
xxpolitf has quit [Quit: leaving]
J23 has quit [Quit: Client closed]
J23 has joined #openscad
guso78 has joined #openscad
lauraaa has joined #openscad
<lauraaa> what's up guys, its been a while
<lauraaa> how are you doing?
use-value has quit [Remote host closed the connection]
use-value has joined #openscad
lauraaa has quit [Quit: Client closed]
kintel has joined #openscad
lauraaa has joined #openscad
teepee_ has joined #openscad
teepee has quit [Ping timeout: 255 seconds]
teepee_ is now known as teepee
<JordanBrown[m]> Rather than adding dictionary functionality to your geometry object, note that the caller can always put a geometry object in whatever structure *it* wants to - in a dictionary, next to a dictionary, et cetera.
<JordanBrown[m]> By way of analogy to boolean arithmetic, the obvious operator for intersection is &.
L29Ah has left #openscad [#openscad]
lauraaa has quit [Quit: Client closed]
epony has quit [Quit: QUIT]
J23 has quit [Quit: Client closed]
J23 has joined #openscad
<J23> lauraaa  .. wasn't that the  braille project ?
kintel has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
guso7812 has joined #openscad
<guso7812> jordan, you are right, & id better :)
<guso7812> there are two worlds ... + -, *
<guso7812> and binary: | for union, & for intersection, but what is difference ? (and not does not have an easy operator)
ur5us has joined #openscad
L29Ah has joined #openscad
L29Ah has left #openscad [#openscad]
Lagopus has quit [Read error: Connection reset by peer]
<JordanBrown[m]> In the Boolean arithmetic world, in C, "not" is "~". While it's conceptually meaningful in geometry - ~cube() is the entire universe *except* the cube - OpenSCAD doesn't currently support that concept. I don't know whether CGAL does.
<JordanBrown[m]> https://xkcd.com/2403/
<InPhase> There is at least one extra semantic issue that I identified in that issue post.
<InPhase> I had to look it up to remember what it was.
<InPhase> In retrospect, syntactic children() was probably a design flaw. It sure has caused a lot of downstream inconveniences.
L29Ah has joined #openscad
L29Ah has quit [Read error: Connection reset by peer]
L29Ah has joined #openscad
guso7812 has quit [Ping timeout: 260 seconds]
ur5us has quit [Ping timeout: 255 seconds]
ur5us has joined #openscad
kintel has joined #openscad
<kintel> Considering OpenSCAD was pretty much a proof of concept which got popular before we had time to clean it up, it has done impressively well :)
<kintel> If we can get a multi-language design working nicely, on top of the Node API, a future opportunity could be to design and implement a new OpenSCAD language without killing backwards compatibility (and perhaps mix and match libraries). The Python exploration could be an important crystallization point for building out that infrastructure
J23 has quit [Quit: Client closed]
J23 has joined #openscad
<linext> i thought of a good customizer
<linext> laptop and desktop RAM holder for spare sticks of RAM
<linext> input would be number of pieces for each type
guso78 has quit [Quit: Client closed]
Lagopus has joined #openscad
Teslamax has quit [Read error: Connection reset by peer]
Teslamax has joined #openscad
Lagopus has quit [Read error: Connection reset by peer]
J23 has quit [Quit: Client closed]
J23 has joined #openscad
ur5us has quit [Ping timeout: 256 seconds]
<JordanBrown[m]> InPhase: note that Boolean “not” is not negation. The result is the complement of the input; is it the result of subtracting the input from the universe.
kintel has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<JordanBrown[m]> Oops is it => it is
<JordanBrown[m]> Union(a, not(a)) is the universe, not the empty set.
<JordanBrown[m]> Difference(a, b) is intersection(a, not(b))
la1yv has quit [Ping timeout: 260 seconds]
la1yv has joined #openscad
gunnbr has joined #openscad