beneroth changed the topic of #picolisp to: PicoLisp language | The scalpel of software development | Channel Log: https://libera.irclog.whitequark.org/picolisp | Check www.picolisp.com for more information
seninha has joined #picolisp
chexum has quit [Remote host closed the connection]
chexum has joined #picolisp
seninha has quit [Quit: Leaving]
seninha has joined #picolisp
seninha has quit [Remote host closed the connection]
chexum has quit [Remote host closed the connection]
chexum has joined #picolisp
razzy has joined #picolisp
razzy has quit [Ping timeout: 252 seconds]
razzy has joined #picolisp
chexum_ has joined #picolisp
chexum has quit [Ping timeout: 255 seconds]
razzy has quit [Ping timeout: 256 seconds]
razzy has joined #picolisp
seninha has joined #picolisp
seninha has quit [Remote host closed the connection]
seninha has joined #picolisp
seninha has quit [Quit: Leaving]
<fbytez> Are there any options for limiting the number of bytes a call like `(line T)` would read?
<fbytez> usecase: reading untrustable data from socket up to a delimiter, abused by no delimiter being found.
<abu[m]> Yes, 'line' takes arguments for that
<abu[m]> But I think calling 'char' is more flexible
<abu[m]> (make (do 7 (link (char))))
<abu[m]> What I said about 'line' is not correct. It always reads a full line
<abu[m]> (the arguments are just about how the pieces of the line are grouped together)
<abu[m]> And: Reading till a delimiter is best done with 'till'
<fbytez> Right, as you say, `(char)` looks the most fitting as `(till)` doesn't have a way to limit it.
<abu[m]> Right. 'till' only stops at some char or eof.
<fbytez> Are calls to `(char)` backed by a buffer or more like calling `read(stdin, &ch, 1)` ?
<abu[m]> For such kind of parsing you might look at @lib/http.l or @lib/xm.l
<fbytez> OK, thanks.
<beneroth> fbytez, (rd 'cnt) -> num. "When called with a cnt argument (second form), that number of raw bytes (in big endian format if cnt is positive, otherwise little endian) is read as a single number."
<beneroth> (rd) without argument, or with a symbol as argument, reads picolisp binary format.
<fbytez> "... read as a single number". Not sure that would be useful.
<fbytez> I think I see where you're heading.
<beneroth> depends on the case. I used it before for parsing binary formats, which had fixed lengths of 4 byte chunks
<abu[m]> You can also read individual bytes with (rd 1)
<abu[m]> (make (do 10 (link (rd 1]
<beneroth> (echo) also takes a 'cnt argument, and does not parse the content except when you give it symbols to look for (as UTF-8 strings).
<beneroth> (till) stops on NULL byte. echo doesn't.
<beneroth> but generally NULL byte is considered invalid input (as in most protocols)
<beneroth> if you read binary stuff, then usually either 1) you need to process it, usually in small chunks, and then decide how to proceed. so (rd) is usable. 2) or you only handle parts of the stream, and other parts you just relay somewhere without looking into it, so (echo) is usable for that.
<beneroth> "it depends", as always
<abu[m]> indeed ☺
<beneroth> fbytez, do you have a specific use case? or just probing what options there would be?
<fbytez> Sort of both...
<fbytez> The example in mind is as I described: reading from a server socket.
<beneroth> well what kind of protocol?
<beneroth> whats the other end of the server socket, a specific client software? a custom written software implemented in another stack? or another picolisp program (then you should just use picolisp binary protocol and maybe wrap TLS around it)
<fbytez> For instance, how HTTP headers are separated from the body by "\r\n\r\n".
<beneroth> HTTP is textual. so read chars.
<beneroth> you can it do the easy way, which is not really so secure against DOS, and use only (till) and (line)
<fbytez> Yes, well, what I asked in the beginning was options for limiting how much data is read at a time.
<beneroth> like in the normal webserver implementation in http.l
<fbytez> They are both examples of what I would not use.
<beneroth> or you do read piece by piece, probably using (state) to implement an FSM. that's how I did it at several occasions.
<fbytez> What would be nice is `(line)` with a byte limit.
<beneroth> well.. I do not disagree. I would also wish for a way to limit (char) on number of bytes tbh :P
<fbytez> Isn't `(char)` only reading a single byte anyway?
<beneroth> you can do it with reading as single bytes first, and then use (input) (in pil21) to turn it into chars..
<beneroth> (char) is reading one UTF-8 char. so can be 1-4 bytes.
<beneroth> which you still can handle and check afterwards, if its a valid char. of course.
<beneroth> the DoS risk is a sender sending a partial multi-byte char and then stopping.
<fbytez> Oh, right, not so keen on that then; still OK for the most part, though.
<beneroth> so you want to wrap (abort) with timeout around it
<abu[m]> It is a simple loop with a max check
<beneroth> what do you mean, abu[m] ?
<abu[m]> I mean (line) with a max is just a loop around (char)
<abu[m]> But what to do when the max is reached? Abort the whole transaction?
<beneroth> it's not a loop around char
<beneroth> the max would be bytes, not chars
<beneroth> yes, abort transaction
<abu[m]> Reading bytes is not helpful for an UTF-8 stream
<beneroth> no, but to ensure that you keep in Content-Length limits
<beneroth> if the sender states they will send 100 bytes, then you must not read more than 100 bytes, even though you want to read it as UTF-8 chars
<abu[m]> OK, but this is not necessary in Pil, as there is no fixed buffer that might overflow
<abu[m]> Just read chars till a limit, then abort
<abu[m]> The length in bytes can be checked with 'size'
<abu[m]> (sum size ListOfChars)
<beneroth> yeah but then you can only detect if after you read too much
<abu[m]> yes, but that's no problem. It is max 4 times the desired size
<beneroth> in nearly all cases, its right, you just need able to detect the invalid content and then you will stop the whole connection anyway
<abu[m]> ... as you abort anyway
<abu[m]> you can also keep track of the size during 'make'
<beneroth> it might be problematic if only one message/request is invalid but you want to keep the connection open for multiple messages
<abu[m]> (sum size (made))
<abu[m]> If you abort you cannot contiue well, as the position in the stream is bad
<beneroth> yes, because of this issue
<beneroth> if we can say "read one line/read one char, or maximum N bytes" then this issue would not occur
<beneroth> probably manageable with the new (input) (ouput) functions. first read binary, then read as chars. I haven't tried them yet.
<abu[m]> Why not? The rest of the line is still in the stream.
<abu[m]> I think 'input' does not help
<abu[m]> it is a logical problem
<abu[m]> If you abort at some position, you cannot continue anyway
<beneroth> not with that message, but maybe with the connection
<abu[m]> You need to close the connection
<abu[m]> or find the start of the next transactiog?
<beneroth> yes, finding the start of the next transaction.
<beneroth> which might be a magic byte, but could also be size limited.
<abu[m]> I think this is not a real world problem
<abu[m]> if one transaction is bad, it is all bad
<abu[m]> better close
<beneroth> e.g. a protocol which first sends a binary header declaring a size of the payload, then the payload, then next header. the payload can contain same bytes/chars as a header. so you need to read by size.
<abu[m]> Why should it fail? TCP guaratees physical correctness. So the sender must be bad.
<abu[m]> *guarantees
<beneroth> yes, the fault is with the sender. but maybe just with a single message/transaction. not the whole stream.
<abu[m]> I cannot conceive such a situation
<abu[m]> but ok
<beneroth> e.g. when the sender is a proxy getting input from multiple connections or files on a disk, and pipelines it into a single stream to your application?
<beneroth> granted, kinda specific :)
<abu[m]> yeah
<beneroth> or e.g. reading a broken disk
<beneroth> purposely, because you try to restore files from it.
<abu[m]> But here we are supposed to have a socket stream if I understood correctly
<beneroth> doesn't make a difference really for picolisp
<beneroth> abu[m], (input) and (output) run within the same process, not like pipe, right?
<fbytez> (make (output (link @@) (in m3u (echo 5))] -- Something like this might work.
<beneroth> abu[m], just handing the stdin/stdout within pil differently, right?
<abu[m]> mom, phone
<fbytez> Obviously, extra handling in `output` and "m3u" and "5" would be whatever.
<beneroth> yeah
<beneroth> sprinkle some (state) FSM around it and check every char/byte you can as early as possible for plausibility and you got it secured.
<beneroth> for HTTP.. well its all text except for chunked payloads in the body
<beneroth> the very first char must be either "G"et or "P"ost or the first letter of another supported HTTP method
<beneroth> HTTP headers are case-insensitive keys with values, sometimes values also have a standardized format.
<beneroth> then ^M^J^M^J and then the body
<beneroth> the most annoying part is enctype=multipart/form-data
<beneroth> otherwise HTTP is quite easy :)
<abu[m]> Grr, got a parcel from Hermes which is not for me
<beneroth> free gifts
<beneroth> something useful?
<abu[m]> I called Hermes and tried to explain, but they insist it was correctly delivered ☺
<abu[m]> I don't want to open
<abu[m]> It is from private to private
<beneroth> I always have a hard time receiving deliveries. nowadays they don't say when they will come, just on day X, but not around which time. and they always happen to come perfectly on lunch time.
<abu[m]> T
<beneroth> abu[m], well, then you only have one possible way
<beneroth> you need to build an X-ray scanner
<abu[m]> ok, I lost this thread
<beneroth> :D
<beneroth> no worries
<abu[m]> oh, yeah, cool
<beneroth> more questions, fbytez ?
<fbytez> I guess I'm just wondering about the buffering underneath. I'll probably just have to do some testing. It might be simpler to just call out to C.
<abu[m]> Faster, but probably not simpler
<abu[m]> C is good only if it is absolutely time-critical
<fbytez> Pretty darn simple just asking linux for a number of bytes.
<abu[m]> yes, so not C but perhaps (in '(dd ...
chexum_ has quit [Ping timeout: 255 seconds]
chexum has joined #picolisp
<abu[m]> I think the main problem is thinking in bytes while reading chars
<fbytez> Yeah, I'm beginning to think high level languages don't really suit me.
<abu[m]> In C we have the same problem
<abu[m]> Counting bytes is not in sync with characters
<fbytez> I don't see that as an issue. If I can handle the bytes, I can handle the characters / encoding.
<abu[m]> yes, of course, but it is tedious
<fbytez> It's also tedious fighting against abstractions. Swings and roundabouts.
<abu[m]> True, low-level is often much better
<abu[m]> it is too easy to over-abstract everything. See Java.
<fbytez> Indeed. It's the only reason I ever encountered C and assembly: trying to get underneath.
<beneroth> maybe try out to split your tasks in small C binaries, and orchestrate them with picolisp?
<fbytez> It's a pleasure talking with both of you, by the way. Thanks.
<beneroth> you're welcome :)
<abu[m]> Let me post a simple loop
<abu[m]> This reads max 7 chars or until EOF or "e" is encountered
<abu[m]> Instead of a count, you could also accumulate the 'size's and check it as a byte count
<abu[m]> So the overrun would be maximally 3 bytes (if a 4-byte char appears right at the limit)
<beneroth> right. nice code.
razzy has quit [Ping timeout: 252 seconds]
razzy has joined #picolisp
<beneroth> abu[m], I've found notes of a previous similar discussion in august 2020-08-26 between you and aw- https://freenode.irclog.whitequark.org/picolisp/2020-08-26
<beneroth> (was still freenode back then)
<beneroth> the pastes are gone.. I've noted the following down, I guess that was the last paste...
<abu[m]> Gone with the wind. Good that we now use tankfeeder`s pastes
<beneroth> pb1n.de is from tankfeeder?
<abu[m]> yes, he wrote and hosts it
<beneroth> nice
<abu[m]> (written in PicoLisp I think)
<beneroth> aw- made first a version using a pipe, similar to how I did it sometimes I think
<beneroth> and your optimized version looks at the UTF-8 encoding I think
chexum has quit [Remote host closed the connection]
chexum has joined #picolisp
<abu[m]> Interesting. I don't remember at all ;)
<abu[m]> Dementia praecox
seninha has joined #picolisp
<beneroth> I have random long-term memory. but I also didn't remember, I just found my notes :P
<abu[m]> 👍
razzy has quit [Ping timeout: 256 seconds]