#pypy on 2022-06-30 — irc logs at libera.irclog.whitequark.org

2022-04-07 20:04 cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | Matti: I made a bit of progress, the tests now only segfault towards the end

02:20 jcea has quit [Quit: jcea]

06:51 otisolsen70 has joined #pypy

07:13 otisolsen70_ has joined #pypy

07:16 otisolsen70 has quit [Ping timeout: 255 seconds]

07:33 otisolsen70_ has quit [Ping timeout: 264 seconds]

07:41 <mjacob> arigato: hi! i was looking into porting the revdb frontend to python 3. i saw that you starting it, but left out the encoding-related stuff. as it turns out, that part is already buggy on python 2.

07:42 <mjacob> arigato: the CMD_PRINT handler in the backend passes the expression to the compiler, which expects latin-1 on default and utf-8 on py3.8. however, the frontend (only running on python 2) passes the expression as raw bytes. if the backend is PyPy2, it only works correctly if the terminal is latin-1; if it's PyPy3, it only works correctly if the terminal is utf-8.

07:45 <mjacob> arigato: i see two approaches of fixing it: either we convert it in the frontend, the encoding depending on whether the backend is PyPy2 or PyPy3; or we always pass UTF-8 to the backend, potentially converting it there.

07:47 <mjacob> arigato: the former approach means we have to know in the frontend what the backend expects (not sure whether this is possible at all).

07:47 <mjacob> arigato: the latter approach gets us a more consistent interface

07:47 <mjacob> arigato: what do you think?

10:44 Atque has quit [Remote host closed the connection]

10:45 Atque has joined #pypy

11:45 <cfbolz> mjacob: I vote for utf-8 always

12:03 reneeontheweb has joined #pypy

12:09 mattip has joined #pypy

13:00 <arigato> yes, probably a sane choice

13:01 <arigato> latin-1 is bogus anyway, because it can't encode anything outside latin-1

13:09 otisolsen70 has joined #pypy

13:40 jcea has joined #pypy

13:55 otisolsen70 has quit [Quit: Leaving]

15:49 jcea has quit [Ping timeout: 268 seconds]

17:21 <exarkun> Do I remember correctly that some PyPy developers were not satisfied that CPython's hash randomization changes are sufficient to protect against DoS attacks if attacker-specified keys are put into dictionaries? If so, is anything about this written up?

17:28 <mjacob> exarkun: if i remember correctly, cpython had two attempts at fixing it: first, only the randomization change was introduced, slowing the attack by a small constant factor. second, a proper cryptographic hash function was used, solving that problem. and if i remember correctly, pypy never implemented only the first step.

17:29 <Hodgestar> Unrelated, but I have to complain somewhere: Arg! concurrent.futures! Arg! :)

17:30 <mjacob> exarkun: see the first point in https://doc.pypy.org/en/latest/cpython_differences.html#miscellaneous

17:36 <exarkun> mjacob: Thank you!

17:56 tsraoien has joined #pypy

17:59 xcm has joined #pypy

18:00 xcm_ has quit [Remote host closed the connection]

18:01 tsraoien has quit [Quit: WeeChat 3.5]

18:02 tsraoien has joined #pypy

18:05 tsraoien has quit [Client Quit]

18:05 tsraoien has joined #pypy

18:07 <mjacob> there's another encoding issue with revdb (if both the frontend and backend are python 2):

18:07 <mjacob> the backend overrides stdout / stderr and displayhook to send it back to the frontend.

18:08 <mjacob> if the backend is a python 2 interpreter, they can handle with both bytes and unicode.

18:08 <mjacob> currently, the backend encodes any unicode as utf-8. if the frontend terminal has another encding, this would result in nonsense.

18:10 <mjacob> i'm not sure how this should be solved. of course one important question is if we want to support the case where the frontend and backend stdout encoding is different.

18:12 reneeontheweb has quit [Quit: Client closed]

18:43 marvin has quit [Remote host closed the connection]

18:43 lazka has quit [Quit: bye]

18:44 marvin_ has joined #pypy

18:44 lazka has joined #pypy

19:00 tsraoien has quit [Quit: WeeChat 3.5]

19:03 tsraoien has joined #pypy

20:02 <arigato> ideally, the backend should work as closely to the corresponding non-revdb pypy

20:02 <arigato> as possible

20:02 <arigato> that seems to mean writing things in whatever encoding some environment variables specify

20:03 <arigato> but then, ideally too, it should work if we replay inside a different terminal

20:04 <arigato> so the pure python wrapper used when replaying should do some decoding/re-encoding

20:04 <arigato> I guess it should do it in a mode where unencodable things don't cause a crash, too

20:20 <mjacob> arigato: with "whatever encoding some environment variables", do you mean those in the frontend or backend?

20:24 <arigato> I mean, depending on the version of pypy, a normal pypy will use environment variables to decide which encoding the stdout should have, right?

20:26 <mjacob> yes

20:28 <arigato> so ideally, revdb-pypy in recording mode should do the same

20:29 <arigato> and also write somehow which encoding was chosen, or which value the environment variables had, something like that---into the log file

20:29 <mjacob> that should be the case, unless you special-cased it for revdb

20:30 <arigato> OK, then it's missing writing which encoding was chosen / which value the environment variable had

20:30 <mjacob> writing to?

20:30 <arigato> writing to the log file

20:32 <arigato> then when later to run "revdb.py logfile", revdb.py knows which encoding to expect, and decode/re-encode to the actual encoding of the terminal used at that point (or just decode to unicode, and write as unicode, and leave Python to do the right thing)

20:32 <mjacob> wouldn't it be sufficient to run python code `sys.stdout.encoding` in replay mode to get what it was in recording mode?

20:33 <mjacob> ah, i understand; the frontend doesn't currently have a way to catch it

20:33 <arigato> maybe, but that would assume it was a Python interpreter, which the revdb.py currently doesn't assume

20:33 <arigato> yes, it would need to be written in the log file in a way that revdb.py can parse

20:34 <arigato> in some headers or something

20:34 <mjacob> why not add a command to query it? (i don't have a preference, just asking)

20:35 <arigato> yes, it can work too

20:35 <arigato> I don't exactly remember the protocol

20:36 <arigato> maybe the revdb-pypy can emit the encoding as a special command when it has figured it out

20:36 <arigato> and when replying it re-reads that special command from the log file

20:36 <arigato> and then communicate it somehow to revdb.py

20:37 <arigato> but I'm not sure how it works any more, so simply reserving a field in the header of the logfile for it sounds simpler

20:38 <mjacob> the stdout encoding could change (e.g. `sys.stdout.reconfigure(encoding='ascii')`)

20:38 <mjacob> it that too obscure to support?

20:38 shimst3r_ has left #pypy [#pypy]

20:38 <arigato> uh, no idea

20:39 shimst3r has joined #pypy

20:45 tsraoien has quit [Ping timeout: 268 seconds]

20:52 <mjacob> also, you can write bytes directly (sys.stdout.write('foo') on python 2, sys.stdout.buffer.write(b'foo'))

20:53 <mjacob> an interpreter could not have the notion of unicode or stdout encoding at all

20:55 <mjacob> maybe it's better to print the raw bytes at the frontend; and possibly implement "show escaped bytes" later for the case when the encodings differ

21:23 tsraoien has joined #pypy