<recordgroovy>
Hi, all. Does anyone have experience extending iolib's streams? I want a stream that buffers data and file descriptors together, and then `finish-output` would sendmsg them off in unison.
<beach>
SR-71: Are you doing this for fun? I mean insertion sort has quadratic time complexity.
azimut has quit [Remote host closed the connection]
anticomputer has quit [Remote host closed the connection]
<beach>
SR-71: Then, your SLICE function adds another linear factor it seems.
anticomputer has joined #commonlisp
<SR-71>
I'm trying to write clean CL code.
<SR-71>
*cleaner
<beach>
SR-71: Even thought it is horribly inefficient?
<SR-71>
I know
azimut has joined #commonlisp
<beach>
*though
<beach>
No, I mean, are you trying to write clean code yet still inefficient?
<beach>
It is important in order to determine what advice to givec.
<beach>
*give
aartaka has quit [Ping timeout: 276 seconds]
<SR-71>
Well, I was using insertion sort on a linked list.
<SR-71>
As an exercise.
aartaka has joined #commonlisp
<beach>
OK, so then let's start with the simple stuff. (+ <mumble> 1) is better expressed as (1+ <mumble>).
<beach>
It is an example of a general rule in programming where you always want to use the most specific construct that will do the trick.
<beach>
Then, it is a horrible idea to use recursion on a linear structure like a list.
* SR-71
nods.
<beach>
And SLICE does what the built-in SUBSEQ does.
<beach>
So you can remove SLICE.
<beach>
Then, there is no reason to stick a newline after (COND. It just wastes vertical space if the clauses will fit on their respective lines anyway.
<SR-71>
right
<beach>
(= <mumble> 0) is better expressed as (zerop <mumble>)
<beach>
I am not sure what the code is doing, but insertion sort is implemented as a loop over the elements of a list, starting with an empty resulting list. In each iteration, you remove the first element of the original list and insert it in the resulting list in the right place. There is no reason to thing about what numeric position an element has.
azimut has quit [Remote host closed the connection]
azimut has joined #commonlisp
notzmv has quit [Ping timeout: 276 seconds]
<beach>
That last one looks like it has the right algorithm, but it violates some style rules.
<SR-71>
I think it'll take quiet a while before I would be able to translate proccedural implementations into cool lisp code.
<beach>
I think the problem here is the definition of "cool". Insertion sort is already bad with its quadratic performance. So it is already not a "cool" sorting algorithm. Then it looks like your implementation adds another quadratic factor, making it O(n^4), which is totally not "cool". The second link you showed is probably cubic, because it splits the list and then uses APPEND, and that is also not "cool" in my book.
<SR-71>
By cool, I meant efficient and elegent.
<SR-71>
For which, I had been translating classic algorithms into lisp.
<beach>
Then anything beyond quadratic is not "cool".
<SR-71>
lol
<beach>
Would you like me to try to write one for you?
<SR-71>
Cool
<beach>
I am a bit busy today, so I'll work for a few minutes and if I can't do it, it shall have to wait.
<SR-71>
Nice
mon_aaraj has quit [Ping timeout: 276 seconds]
<beach>
Nah, it shall have to wait. I need to start my chores. Maybe someone else can pitch in.
<beach>
The thing is that it's a bit messy to reuse the CONS cells, which is what you want to do in order to avoid consing.
<SR-71>
Yes
mon_aaraj has joined #commonlisp
<SR-71>
I'm working on my own *horrible* version of insertion sort.
<SR-71>
Lets see how it goes.
pranavats has left #commonlisp [Error from remote client]
azimut has quit [Remote host closed the connection]
anticomputer has quit [Read error: Connection reset by peer]
anticomputer has joined #commonlisp
azimut has joined #commonlisp
<beach>
SR-71: Try running some benchmarks with lists of length, say 10, 100, and 1000, and see how the running time evolves.
pranavats has joined #commonlisp
<SR-71>
I'll be comparing my implementations of Heap sort, merge sort, quick sort ... etc.
mon_aaraj has quit [Ping timeout: 268 seconds]
mon_aaraj has joined #commonlisp
<hayley>
If you want "clean", writing the functional/list-comprehension style "quick sort" algorithm is usually interesting.
<hayley>
I wrote a parallel and very monomorphized merge-sort that blew SBCL's SORT out of the water for vectors of small integers. A vectorised sort algorithm would also be interesting, though performance will be in constant factors and not your O().
<SR-71>
Btw, here is my insertion sort implementation.
<hayley>
I bet that measurement is going to be noisy; it's nice to use e.g. the-cost-of-nothing to measure timing. (Anyone interested in taking more statistics? Minimum/maximum? Recording a whole distribution?)
<jackdaniel>
hayley: Shinmera's dissect calculates these things afair
<jackdaniel>
or was it trivial-benchmark? rather the latter
<hayley>
dissect gives backtraces, I think. Probably the latter.
<Shinmera>
trivial-benchmark is in dire need of a rewrite
<Shinmera>
but it does have the stats capture and calculation parts somewhat well separated, so would be easy to nick those parts for a better library :)
karlosz has joined #commonlisp
karlosz has quit [Ping timeout: 260 seconds]
dataRonix has joined #commonlisp
dataRonix has left #commonlisp [#commonlisp]
notzmv has joined #commonlisp
Algernon69 has joined #commonlisp
varjag has joined #commonlisp
Inline has joined #commonlisp
Th30n has joined #commonlisp
<beach>
SR-71: https://dpaste.com/FRPSWYQCB is quadratic, so that's optimal in some ways. But you also do a quadratic number of allocations of CONS cells which is not so great. And of course, both INSERTION-SORT and INSERT are recursive.
rogersm has joined #commonlisp
<hayley>
Henry Baker also has a paper on a "linear" quicksort, which doesn't take linear time, but doesn't cons (compare to "linear logic").
<beach>
SR-71: But, like I said the other day, no constant-time optimization can fix a quadratic algorithm. Mine takes a bit more than 30 seconds (on my not-so-fast computer) for 10000 elements. That's 75 billion processor cycles.
recordgroovy has quit [Ping timeout: 268 seconds]
recordgroovy has joined #commonlisp
<beach>
SR-71: An algorithms with O(n log n) complexity should take a few million cycles or so.
<beach>
SR-71: SBCL SORT takes around 50ms and 100 million processor cycles.
gxt has joined #commonlisp
jmdaemon has quit [Ping timeout: 240 seconds]
verisimilitude has left #commonlisp [ERC (IRC client for Emacs 27.2)]
<seok->
beach: nope, didn't get information on how GC works
<beach>
Ah. OK, it's a matter of determining what objects are live.
<beach>
So the registers of the processors, including the stack pointer, are considered the "roots", i.e., the objects that the application can reference immediately.
<beach>
Then, the GC follows the objects referenced by the roots, and so on recursively, until it has "traced" every reachable (i.e., "live") object.
<beach>
The remaining objects are then garbage.
<beach>
That's mainly it for the basic idea.
<Inline>
immediately dead objects, temporarily/ephemeral dead objects and dead ones for sure...
<beach>
Er, what?
<Inline>
does it mark things eventually alive ?
<beach>
I don't know what that means. It marks every object that the application could possibly reference, but it can't know whether the application will actually do that, because that would be undecidable.
<Inline>
ah
<beach>
seok-: Does that make sense to you?
<beach>
seok-: If not, feel free to ask questions.
Topre35g has joined #commonlisp
<seok->
So do implementations usually have this recursion running on a separate thread every once in a while?
<pjb>
seok-: often it's when you try to allocate memory and there's no more in the heap. But there are parallel GC that run in separte threads,
mon_aaraj has quit [Ping timeout: 272 seconds]
<pjb>
seok-: and incremental GC that run a little bit each time you allocate memory.
<seok->
woah, gc only runs when there's no more memory?
<beach>
seok-: Some implementations use a single thread, "stop the world", so that the application pauses when the GC runs.
<beach>
seok-: Garbage collectors are notoriously difficult to write and test, so it is not something that is changed lightly. Plus, the garbage collector is not an independent pluggable module. The rest of the system must collaborate with it.
<beach>
seok-: So if you change the garbage collector, it is likely that large parts of the rest of the system must be modified as well.
<seok->
Ugh
<beach>
Indeed.
mon_aaraj has joined #commonlisp
<seok->
Is that one of the reasons why new CL implementations are being made? - might as well start over?
<beach>
That is one reason I started SICL, yes. But not the only one.
<seok->
I see
<beach>
Also, it is not just about new GC algorithms being invented. It also has to do with processors changing.
Th30n has quit [Ping timeout: 268 seconds]
<beach>
It used to be the case that there was very little cache memory, and that a memory operation took roughly the same time as a register operation.
<beach>
So copying GC algorithms were invented, because they compact memory nicely.
<beach>
But they are now pretty bad when it comes to memory accesses and cache performance.
<seok->
Ah, I've never thought about cache size increases in modern CPU's changing algorithm efficiencies
igemnace has quit [Remote host closed the connection]
<beach>
I see. Well, that's a very important aspect. The entire RAM model used in asymptotic complexity assumes that memory operations cost the same as register operations.
<seok->
So it would seem this would apply to not only GC algorithms but various data structure manipulation algorithms too?
<beach>
Exactly!
<seok->
If the current implementations are not up to date with these algorithms, then I can see how a new one written again would be of benefit
<seok->
It would make everything faster
<beach>
seok-: So to relate to your previous question, I invented a new algorithm for generic dispatch, because the one used by PCL is based on hashing with a table kept in memory. My algorithm uses cheaper register comparisons.
<seok->
PCL? Practical Common Lisp?
random-nick has joined #commonlisp
<beach>
Portable Common Loops, a portable CLOS implementation used by many current Common Lisp systems.
<hayley>
beach, seok-: I modified SBCL enough to allow for using a non-moving collector.
<beach>
Another thing that has become important is that conditional branches are expensive if the branch prediction logic of the processor makes the wrong prediction.
<seok->
1992, sheesh we've used that for a while
<beach>
hayley: Good to know.
<seok->
I've heard of this optimization - it is better to write assembly with bitwise operations than conditionals whenever possible
<beach>
seok-: In the past, processor pipelines were short, so taking a branch cost just a cycle or two. Now, if the branch is incorrectly predicted, a deep pipeline may have to be abandoned and recomputed, which might take dozens of cycles.
azimut_ has quit [Ping timeout: 268 seconds]
<beach>
seok-: These are just examples of how processor evolution might change what is the optimal technique to use, so existing Common Lisp implementations may become bad even though they don't change.
azimut has joined #commonlisp
<seok->
I see, that was very informative
<seok->
Thank you
<beach>
Pleasure.
gateway2000 has joined #commonlisp
<hayley>
seok-: Implementing a concurrent GC is hard. And it's even harder if you need to move concurrently.
Algernon69 has quit [Quit: Leaving]
taiju has quit [Ping timeout: 244 seconds]
<hayley>
But I've been discussing if we can do a sort of incremental compaction, where we pick a part of the heap to compact in a stop-the-world pause, where the heap is small enough that copying and fixing up pointers (determined by a concurrent marking pass before) that the pause should be no longer than some user-specified target duration.
Topre35g has quit [Remote host closed the connection]
puhi has quit [Quit: (;´Д`)]
pillton has quit [Remote host closed the connection]
<beach>
So in fact, just as every good programmer needs to know about compiler design and processor technology, every good programmer needs to know about techniques for memory management. Or else, applications written by that programmer may not be very good in terms of performance.
<hayley>
pjb: I am implementing a parallel GC for SBCL (eventually, I just finished implementing an algorithm that's easier to parallelise). How's that?
puhi has joined #commonlisp
notzmv has quit [Ping timeout: 240 seconds]
<hayley>
s/where the heap/where the part of the heap/
<hayley>
I would tell Topre35g that indeed ACL uses a parallel GC.
lf00 has quit [Ping timeout: 240 seconds]
<pjb>
hayley: great!
waleee has joined #commonlisp
mon_aaraj has quit [Ping timeout: 268 seconds]
mon_aaraj has joined #commonlisp
Th30n has joined #commonlisp
lifeisfoo has joined #commonlisp
thuna` has joined #commonlisp
lifeisfoo has quit [Ping timeout: 255 seconds]
lifeisfoo has joined #commonlisp
glaucon has joined #commonlisp
tyson2 has joined #commonlisp
waleee has quit [Ping timeout: 272 seconds]
lifeisfoo has quit [Ping timeout: 240 seconds]
Topre35g has joined #commonlisp
<Topre35g>
How does SBCL GC compare to LW / ACL / CCL / ECL?
<jackdaniel>
ecl uses boehm garbage collector (it may work in both conservative and precise mode), historically it had a homegrown gc
<hayley>
I would assume the ACL GC is the most developed; ECL uses Boehm-Demers-Weiser, and SBCL and CCL use their own moving, single-threaded stop-the-world collectors (though please say if I'm wrong about CCL).
<Topre35g>
Would concurrency help avoid stopping the world?
<hayley>
It would, but implementing a concurrent GC is hard. (Pedantically, many concurrent GCs do stop the world, but for a shorter period of time.)
<Topre35g>
I see
recordgroovy has quit [Ping timeout: 244 seconds]
<Topre35g>
What about the venerable JVM - any GC ideas from there that can be used?
<hayley>
Many.
<Topre35g>
Or is it all closed sourced and hidden from the world at large
livoreno has quit [Remote host closed the connection]
livoreno has joined #commonlisp
pranavats has joined #commonlisp
OlCe has quit [Remote host closed the connection]
Topre35g has quit [Quit: Leaving...]
<jcowan>
the bdw gc has been developed since the late 1980s; I would be very slow to say that it can easily be beaten except in specialized circumstances (e.g. a language that has to run in very limited amounts of code space). In particular, it's nearly portable and it's, contra what beach was saying, quite pluggable even if you want precise collection
<jcowan>
also, beach: I think that 1+ and zerop and similar things, which were written in an attempt to optimize common operations in the era of very dumb compilers, are anything but a personal choice nowadays: any even slightly reasonable compiler should inline them in the same way. In particular, I find (1- x) very confusing and I never use it: it reads too much like 1 - x whereas it means x - 1.
<beach>
I totally disagree. It is a way to show the intentions to the person reading the code as soon as possible, so that that person does not need to remember to look for more arguments.
<beach>
If I lead someone to think that it had to do with performance, I must have omitted some information.
Th30n has quit [Quit: WeeChat 3.5]
<beach>
But I won't enter into this discussion, because of what I have said recently about the effect they have on me.
<jcowan>
I recognize the force of both your arguments; my reference was historical only
<hayley>
I've only read the papers on BDW, which might not reflect the current version of BDW. But address-order location (and compacting as a backup) are useful for improving mutator locality, and using memory protection for a write barrier isn't ideal (as SBCL runs a bit faster with the write barrier done in software).
<hayley>
On the latter, SBCL just uses card marks to store old->new pointers, but I wonder if one of those "refinement" techniques would be useful, to store a set of addresses rather than cards, when old->new pointers are sparse on cards.
<jackdaniel>
in that case perhaps FIRST, REST and LIST* would better reflect intention of operating on the list than CAR, CDR and CONS
<jcowan>
jackdaniel: Exactly
<jcowan>
And yet.
lf00 has joined #commonlisp
notzmv has joined #commonlisp
taiju has joined #commonlisp
<Nilby>
One could then make the logcial but argument that 0- 3- 4- should exist and cddadr is very clear, but it fails average human psychological reality.
<jcowan>
I'm glad someone else pointed that out
<jcowan>
my first Lisp implementation, when handed an undefined identifier in operator position, ran a little interpreter on the name which succeeded if it was of the form c[ad]+r
<jcowan>
so you could write not merely cddadr but caddadddaaadaaaaaddddaaaaaaaddddddddddddddr
<Nilby>
jcowan: cool. now one could make macros that might let you choose any two unicode characters use like as cons locator
<pjb>
jcowan: I doubt 1+ zerop etc were written to optimize anything. They were written to be used with HOF, to avoid having to copy-and-paste (lambda (x) (+ x 1)) etc, everywhere.
<jcowan>
to generalize that we could have a rule that !x is read as (! x) where x is any emoji
<jcowan>
sorry, where ! is any emoji
<jcowan>
pjb: you may be right at that
asarch has quit [Quit: Leaving]
<jcowan>
though if you are ever copying and pasting in any Lisp you are failing to abstract
<jcowan>
that however is not an argument for using them in operato posiiton
<jcowan>
@*#$ keyboard
<pjb>
jcowan: indeed, and this is why I'm not so insistent, personnaly, on using 1+ 1- or even zerop, etc in those cases. (= x 0) is as good, (- x 1) is better.
<pjb>
Notably when you have a serie of substractions: (list (- a 0) (- b 1) (- c 2)) vs. (list a (1- b) (- c 2)) ; beurk
<pjb>
All compilers will generate the same code for those two expressions anyways.
glaucon has joined #commonlisp
rogersm has quit [Quit: Leaving...]
<hayley>
If I was a Lisp compiler I would """simply""" generate a vectorised subtraction of a b c - 0 1 2 and then CDR-code the list produced. So I assume SBCL has the same optimisations?
<hayley>
(:
cage has joined #commonlisp
<jcowan>
I think Interlisp and maybe Genera are the only surviving cdr-coding systems.
<hayley>
There's a paper by Henry Baker from 1991 which argues that we should start CDR-coding on stock hardware, to save on memory traffic. Any day now.
<jcowan>
hmm
<hayley>
"Although CDR-coding is currently out of fashion due to not being required by Tak and 8Queens, the inexorable pressure to reduce off-chip bandwidth will force its revival."
<jcowan>
Kernel has both mutable and immutable lists, and I pointed out that a vector can be understood as a list whose cars are mutable and whose cdrs are immutable, which means that you don't have to expose specialized vector or sequence operations
<_death>
how does (cons x (cdr y)) look w/ a vector
<jcowan>
the result isn't a vector any more, that's all
<jcowan>
"vector" being simply a name for "list with guarnateed O(1) get/set behavior"
<White_Flame>
also I wonder what software cdr code tagging would look like, as it would have to be there in addition to the type tags
<seok->
vector is a list?
<jcowan>
you need 2 more bits for the 4 cases of: cdr in the next word, pointer to cdr in the next word, cdr is nil, this is a pointer-to-cdr (to keep the gc happy)
<White_Flame>
right, and I think that would generally blow out the current scheme of using allocation granularity in the address to hide the tags
<jcowan>
there are some spare bits in the middle of the word if you are both nanboxing and low type tagging
<jcowan>
you only need 48 bits for an address and you have a 52-bit mantissa to pllay with
<White_Flame>
but that's only 256TB of addressable memory!
<jcowan>
which is all you get at present anyhow
<White_Flame>
I thought hardware went past 48 bit in the last cpu generation or two
<jcowan>
you could use the top 2 bits of the middle tag and then you could handle up to 1 EB
<hayley>
Well. I was kidding about the CDR coding part. But the way SBCL allocates lists could allow for vectorised setting of elements (if we had a scatter instruction, which might only be an AVX-512 feature).
<jcowan>
I think that even the latest x86-64 and AArch64 systems support only 48 bits of virtual memory, though they can go up to 52 bits of physical memory. Extending virtual memory would require either yet another level of paging tables or increasing the page size
<hayley>
The allocation code allocates space for the entire list at once, which could allow for avoiding some pointers chasing (and I had to fix it to make sure it writes all the bits in my allocation bitmap).
aartaka has quit [Ping timeout: 272 seconds]
<varjag>
is clocc basically abandoned?
<varjag>
or does it live on under some different name(s)
dlowe has quit [Remote host closed the connection]
mon_aaraj has quit [Ping timeout: 244 seconds]
<pjb>
varjag: IIRC, there has been a period when some of it has been scavenged in some other library. Perhaps alexandria? But in case of doubt, just check clocc an see if there's anything useful in it.
jeosol has joined #commonlisp
dlowe has joined #commonlisp
<varjag>
pjb: yeah, am wondering if i should repackage the union-find bit