#picolisp on 2023-03-02 — irc logs at libera.irclog.whitequark.org

2021-05-27 09:06 beneroth changed the topic of #picolisp to: PicoLisp language | The scalpel of software development | Channel Log: https://libera.irclog.whitequark.org/picolisp | Check www.picolisp.com for more information

00:00 seninha has joined #picolisp

01:18 chexum has quit [Remote host closed the connection]

01:18 chexum has joined #picolisp

01:28 seninha has quit [Quit: Leaving]

02:01 seninha has joined #picolisp

02:43 seninha has quit [Remote host closed the connection]

03:33 chexum has quit [Remote host closed the connection]

03:33 chexum has joined #picolisp

03:41 razzy has joined #picolisp

05:52 razzy has quit [Ping timeout: 252 seconds]

07:35 razzy has joined #picolisp

08:44 chexum_ has joined #picolisp

08:47 chexum has quit [Ping timeout: 255 seconds]

10:45 razzy has quit [Ping timeout: 256 seconds]

10:48 razzy has joined #picolisp

11:59 seninha has joined #picolisp

12:00 seninha has quit [Remote host closed the connection]

12:00 seninha has joined #picolisp

13:56 seninha has quit [Quit: Leaving]

14:31 <fbytez> Are there any options for limiting the number of bytes a call like `(line T)` would read?

14:32 <fbytez> usecase: reading untrustable data from socket up to a delimiter, abused by no delimiter being found.

14:56 <abu[m]> Yes, 'line' takes arguments for that

14:57 <abu[m]> But I think calling 'char' is more flexible

14:58 <abu[m]> (make (do 7 (link (char))))

14:59 <abu[m]> What I said about 'line' is not correct. It always reads a full line

15:00 <abu[m]> (the arguments are just about how the pieces of the line are grouped together)

15:02 <abu[m]> And: Reading till a delimiter is best done with 'till'

15:12 <fbytez> Right, as you say, `(char)` looks the most fitting as `(till)` doesn't have a way to limit it.

15:13 <abu[m]> Right. 'till' only stops at some char or eof.

15:13 <fbytez> Are calls to `(char)` backed by a buffer or more like calling `read(stdin, &ch, 1)` ?

15:14 <abu[m]> For such kind of parsing you might look at @lib/http.l or @lib/xm.l

15:14 <fbytez> OK, thanks.

15:47 <beneroth> fbytez, (rd 'cnt) -> num. "When called with a cnt argument (second form), that number of raw bytes (in big endian format if cnt is positive, otherwise little endian) is read as a single number."

15:48 <beneroth> (rd) without argument, or with a symbol as argument, reads picolisp binary format.

15:48 <fbytez> "... read as a single number". Not sure that would be useful.

15:48 <fbytez> I think I see where you're heading.

15:49 <beneroth> depends on the case. I used it before for parsing binary formats, which had fixed lengths of 4 byte chunks

15:50 <abu[m]> You can also read individual bytes with (rd 1)

15:50 <abu[m]> (make (do 10 (link (rd 1]

15:50 <beneroth> (echo) also takes a 'cnt argument, and does not parse the content except when you give it symbols to look for (as UTF-8 strings).

15:50 <beneroth> (till) stops on NULL byte. echo doesn't.

15:51 <beneroth> but generally NULL byte is considered invalid input (as in most protocols)

15:53 <beneroth> if you read binary stuff, then usually either 1) you need to process it, usually in small chunks, and then decide how to proceed. so (rd) is usable. 2) or you only handle parts of the stream, and other parts you just relay somewhere without looking into it, so (echo) is usable for that.

15:54 <beneroth> "it depends", as always

15:54 <abu[m]> indeed ☺

15:54 <beneroth> fbytez, do you have a specific use case? or just probing what options there would be?

15:55 <fbytez> Sort of both...

15:56 <fbytez> The example in mind is as I described: reading from a server socket.

15:56 <beneroth> well what kind of protocol?

15:57 <beneroth> whats the other end of the server socket, a specific client software? a custom written software implemented in another stack? or another picolisp program (then you should just use picolisp binary protocol and maybe wrap TLS around it)

15:57 <fbytez> For instance, how HTTP headers are separated from the body by "\r\n\r\n".

15:58 <beneroth> HTTP is textual. so read chars.

15:58 <beneroth> you can it do the easy way, which is not really so secure against DOS, and use only (till) and (line)

15:59 <fbytez> Yes, well, what I asked in the beginning was options for limiting how much data is read at a time.

15:59 <beneroth> like in the normal webserver implementation in http.l

15:59 <fbytez> They are both examples of what I would not use.

15:59 <beneroth> or you do read piece by piece, probably using (state) to implement an FSM. that's how I did it at several occasions.

15:59 <fbytez> What would be nice is `(line)` with a byte limit.

16:00 <beneroth> well.. I do not disagree. I would also wish for a way to limit (char) on number of bytes tbh :P

16:01 <fbytez> Isn't `(char)` only reading a single byte anyway?

16:01 <beneroth> you can do it with reading as single bytes first, and then use (input) (in pil21) to turn it into chars..

16:01 <beneroth> (char) is reading one UTF-8 char. so can be 1-4 bytes.

16:01 <beneroth> which you still can handle and check afterwards, if its a valid char. of course.

16:02 <beneroth> the DoS risk is a sender sending a partial multi-byte char and then stopping.

16:02 <fbytez> Oh, right, not so keen on that then; still OK for the most part, though.

16:02 <beneroth> so you want to wrap (abort) with timeout around it

16:02 <abu[m]> It is a simple loop with a max check

16:02 <beneroth> what do you mean, abu[m] ?

16:03 <abu[m]> I mean (line) with a max is just a loop around (char)

16:04 <abu[m]> But what to do when the max is reached? Abort the whole transaction?

16:04 <beneroth> it's not a loop around char

16:04 <beneroth> the max would be bytes, not chars

16:05 <beneroth> yes, abort transaction

16:05 <abu[m]> Reading bytes is not helpful for an UTF-8 stream

16:05 <beneroth> no, but to ensure that you keep in Content-Length limits

16:06 <beneroth> if the sender states they will send 100 bytes, then you must not read more than 100 bytes, even though you want to read it as UTF-8 chars

16:06 <abu[m]> OK, but this is not necessary in Pil, as there is no fixed buffer that might overflow

16:06 <abu[m]> Just read chars till a limit, then abort

16:07 <abu[m]> The length in bytes can be checked with 'size'

16:07 <abu[m]> (sum size ListOfChars)

16:07 <beneroth> yeah but then you can only detect if after you read too much

16:08 <abu[m]> yes, but that's no problem. It is max 4 times the desired size

16:08 <beneroth> in nearly all cases, its right, you just need able to detect the invalid content and then you will stop the whole connection anyway

16:08 <abu[m]> ... as you abort anyway

16:08 <abu[m]> you can also keep track of the size during 'make'

16:09 <beneroth> it might be problematic if only one message/request is invalid but you want to keep the connection open for multiple messages

16:09 <abu[m]> (sum size (made))

16:09 <abu[m]> If you abort you cannot contiue well, as the position in the stream is bad

16:09 <beneroth> yes, because of this issue

16:10 <beneroth> if we can say "read one line/read one char, or maximum N bytes" then this issue would not occur

16:10 <beneroth> probably manageable with the new (input) (ouput) functions. first read binary, then read as chars. I haven't tried them yet.

16:11 <abu[m]> Why not? The rest of the line is still in the stream.

16:11 <abu[m]> I think 'input' does not help

16:11 <abu[m]> it is a logical problem

16:11 <abu[m]> If you abort at some position, you cannot continue anyway

16:11 <beneroth> not with that message, but maybe with the connection

16:12 <abu[m]> You need to close the connection

16:12 <abu[m]> or find the start of the next transactiog?

16:12 <beneroth> yes, finding the start of the next transaction.

16:12 <beneroth> which might be a magic byte, but could also be size limited.

16:13 <abu[m]> I think this is not a real world problem

16:13 <abu[m]> if one transaction is bad, it is all bad

16:13 <abu[m]> better close

16:13 <beneroth> e.g. a protocol which first sends a binary header declaring a size of the payload, then the payload, then next header. the payload can contain same bytes/chars as a header. so you need to read by size.

16:14 <abu[m]> Why should it fail? TCP guaratees physical correctness. So the sender must be bad.

16:14 <abu[m]> *guarantees

16:14 <beneroth> yes, the fault is with the sender. but maybe just with a single message/transaction. not the whole stream.

16:15 <abu[m]> I cannot conceive such a situation

16:15 <abu[m]> but ok

16:15 <beneroth> e.g. when the sender is a proxy getting input from multiple connections or files on a disk, and pipelines it into a single stream to your application?

16:15 <beneroth> granted, kinda specific :)

16:16 <abu[m]> yeah

16:16 <beneroth> or e.g. reading a broken disk

16:16 <beneroth> purposely, because you try to restore files from it.

16:17 <abu[m]> But here we are supposed to have a socket stream if I understood correctly

16:17 <beneroth> doesn't make a difference really for picolisp

16:20 <beneroth> abu[m], (input) and (output) run within the same process, not like pipe, right?

16:20 <fbytez> (make (output (link @@) (in m3u (echo 5))] -- Something like this might work.

16:20 <beneroth> abu[m], just handing the stdin/stdout within pil differently, right?

16:21 <abu[m]> mom, phone

16:21 <fbytez> Obviously, extra handling in `output` and "m3u" and "5" would be whatever.

16:22 <beneroth> yeah

16:23 <beneroth> sprinkle some (state) FSM around it and check every char/byte you can as early as possible for plausibility and you got it secured.

16:24 <beneroth> for HTTP.. well its all text except for chunked payloads in the body

16:24 <beneroth> the very first char must be either "G"et or "P"ost or the first letter of another supported HTTP method

16:25 <beneroth> HTTP headers are case-insensitive keys with values, sometimes values also have a standardized format.

16:25 <beneroth> then ^M^J^M^J and then the body

16:26 <beneroth> the most annoying part is enctype=multipart/form-data

16:26 <beneroth> otherwise HTTP is quite easy :)

16:26 <abu[m]> Grr, got a parcel from Hermes which is not for me

16:27 <beneroth> free gifts

16:27 <beneroth> something useful?

16:27 <abu[m]> I called Hermes and tried to explain, but they insist it was correctly delivered ☺

16:27 <abu[m]> I don't want to open

16:27 <abu[m]> It is from private to private

16:27 <beneroth> I always have a hard time receiving deliveries. nowadays they don't say when they will come, just on day X, but not around which time. and they always happen to come perfectly on lunch time.

16:28 <abu[m]> T

16:28 <beneroth> abu[m], well, then you only have one possible way

16:28 <beneroth> you need to build an X-ray scanner

16:28 <abu[m]> ok, I lost this thread

16:28 <beneroth> :D

16:28 <beneroth> no worries

16:28 <abu[m]> oh, yeah, cool

16:28 <beneroth> more questions, fbytez ?

16:31 <fbytez> I guess I'm just wondering about the buffering underneath. I'll probably just have to do some testing. It might be simpler to just call out to C.

16:31 <abu[m]> Faster, but probably not simpler

16:32 <abu[m]> C is good only if it is absolutely time-critical

16:32 <fbytez> Pretty darn simple just asking linux for a number of bytes.

16:33 <abu[m]> yes, so not C but perhaps (in '(dd ...

16:34 chexum_ has quit [Ping timeout: 255 seconds]

16:34 chexum has joined #picolisp

16:34 <abu[m]> I think the main problem is thinking in bytes while reading chars

16:35 <fbytez> Yeah, I'm beginning to think high level languages don't really suit me.

16:36 <abu[m]> In C we have the same problem

16:36 <abu[m]> Counting bytes is not in sync with characters

16:37 <fbytez> I don't see that as an issue. If I can handle the bytes, I can handle the characters / encoding.

16:37 <abu[m]> yes, of course, but it is tedious

16:38 <fbytez> It's also tedious fighting against abstractions. Swings and roundabouts.

16:38 <abu[m]> True, low-level is often much better

16:39 <abu[m]> it is too easy to over-abstract everything. See Java.

16:39 <fbytez> Indeed. It's the only reason I ever encountered C and assembly: trying to get underneath.

16:43 <beneroth> maybe try out to split your tasks in small C binaries, and orchestrate them with picolisp?

16:43 <fbytez> It's a pleasure talking with both of you, by the way. Thanks.

16:43 <beneroth> you're welcome :)

16:44 <abu[m]> Let me post a simple loop

16:44 <abu[m]> http://pb1n.de/?fb17e6

16:45 <abu[m]> This reads max 7 chars or until EOF or "e" is encountered

16:45 <abu[m]> Instead of a count, you could also accumulate the 'size's and check it as a byte count

16:46 <abu[m]> So the overrun would be maximally 3 bytes (if a 4-byte char appears right at the limit)

16:48 <abu[m]> http://pb1n.de/?21f1ac

17:00 <beneroth> right. nice code.

17:20 razzy has quit [Ping timeout: 252 seconds]

17:22 razzy has joined #picolisp

17:28 <beneroth> abu[m], I've found notes of a previous similar discussion in august 2020-08-26 between you and aw- https://freenode.irclog.whitequark.org/picolisp/2020-08-26

17:28 <beneroth> (was still freenode back then)

17:29 <beneroth> the pastes are gone.. I've noted the following down, I guess that was the last paste...

17:31 <beneroth> http://pb1n.de/?204b79

17:31 <abu[m]> Gone with the wind. Good that we now use tankfeeder`s pastes

17:31 <beneroth> pb1n.de is from tankfeeder?

17:32 <abu[m]> yes, he wrote and hosts it

17:32 <beneroth> nice

17:32 <abu[m]> (written in PicoLisp I think)

17:33 <beneroth> aw- made first a version using a pipe, similar to how I did it sometimes I think

17:33 <beneroth> and your optimized version looks at the UTF-8 encoding I think

17:34 chexum has quit [Remote host closed the connection]

17:34 chexum has joined #picolisp

17:35 <abu[m]> Interesting. I don't remember at all ;)

17:36 <abu[m]> Dementia praecox

17:41 seninha has joined #picolisp

17:42 <beneroth> I have random long-term memory. but I also didn't remember, I just found my notes :P

17:43 <abu[m]> 👍

18:04 razzy has quit [Ping timeout: 256 seconds]