cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | Matti: I made a bit of progress, the tests now only segfault towards the end
jcea has quit [Quit: jcea]
otisolsen70 has joined #pypy
otisolsen70_ has joined #pypy
otisolsen70 has quit [Ping timeout: 255 seconds]
otisolsen70_ has quit [Ping timeout: 264 seconds]
<mjacob> arigato: hi! i was looking into porting the revdb frontend to python 3. i saw that you starting it, but left out the encoding-related stuff. as it turns out, that part is already buggy on python 2.
<mjacob> arigato: the CMD_PRINT handler in the backend passes the expression to the compiler, which expects latin-1 on default and utf-8 on py3.8. however, the frontend (only running on python 2) passes the expression as raw bytes. if the backend is PyPy2, it only works correctly if the terminal is latin-1; if it's PyPy3, it only works correctly if the terminal is utf-8.
<mjacob> arigato: i see two approaches of fixing it: either we convert it in the frontend, the encoding depending on whether the backend is PyPy2 or PyPy3; or we always pass UTF-8 to the backend, potentially converting it there.
<mjacob> arigato: the former approach means we have to know in the frontend what the backend expects (not sure whether this is possible at all).
<mjacob> arigato: the latter approach gets us a more consistent interface
<mjacob> arigato: what do you think?
Atque has quit [Remote host closed the connection]
Atque has joined #pypy
<cfbolz> mjacob: I vote for utf-8 always
reneeontheweb has joined #pypy
mattip has joined #pypy
<arigato> yes, probably a sane choice
<arigato> latin-1 is bogus anyway, because it can't encode anything outside latin-1
otisolsen70 has joined #pypy
jcea has joined #pypy
otisolsen70 has quit [Quit: Leaving]
jcea has quit [Ping timeout: 268 seconds]
<exarkun> Do I remember correctly that some PyPy developers were not satisfied that CPython's hash randomization changes are sufficient to protect against DoS attacks if attacker-specified keys are put into dictionaries? If so, is anything about this written up?
<mjacob> exarkun: if i remember correctly, cpython had two attempts at fixing it: first, only the randomization change was introduced, slowing the attack by a small constant factor. second, a proper cryptographic hash function was used, solving that problem. and if i remember correctly, pypy never implemented only the first step.
<Hodgestar> Unrelated, but I have to complain somewhere: Arg! concurrent.futures! Arg! :)
<exarkun> mjacob: Thank you!
tsraoien has joined #pypy
xcm has joined #pypy
xcm_ has quit [Remote host closed the connection]
tsraoien has quit [Quit: WeeChat 3.5]
tsraoien has joined #pypy
tsraoien has quit [Client Quit]
tsraoien has joined #pypy
<mjacob> there's another encoding issue with revdb (if both the frontend and backend are python 2):
<mjacob> the backend overrides stdout / stderr and displayhook to send it back to the frontend.
<mjacob> if the backend is a python 2 interpreter, they can handle with both bytes and unicode.
<mjacob> currently, the backend encodes any unicode as utf-8. if the frontend terminal has another encding, this would result in nonsense.
<mjacob> i'm not sure how this should be solved. of course one important question is if we want to support the case where the frontend and backend stdout encoding is different.
reneeontheweb has quit [Quit: Client closed]
marvin has quit [Remote host closed the connection]
lazka has quit [Quit: bye]
marvin_ has joined #pypy
lazka has joined #pypy
tsraoien has quit [Quit: WeeChat 3.5]
tsraoien has joined #pypy
<arigato> ideally, the backend should work as closely to the corresponding non-revdb pypy
<arigato> as possible
<arigato> that seems to mean writing things in whatever encoding some environment variables specify
<arigato> but then, ideally too, it should work if we replay inside a different terminal
<arigato> so the pure python wrapper used when replaying should do some decoding/re-encoding
<arigato> I guess it should do it in a mode where unencodable things don't cause a crash, too
<mjacob> arigato: with "whatever encoding some environment variables", do you mean those in the frontend or backend?
<arigato> I mean, depending on the version of pypy, a normal pypy will use environment variables to decide which encoding the stdout should have, right?
<mjacob> yes
<arigato> so ideally, revdb-pypy in recording mode should do the same
<arigato> and also write somehow which encoding was chosen, or which value the environment variables had, something like that---into the log file
<mjacob> that should be the case, unless you special-cased it for revdb
<arigato> OK, then it's missing writing which encoding was chosen / which value the environment variable had
<mjacob> writing to?
<arigato> writing to the log file
<arigato> then when later to run "revdb.py logfile", revdb.py knows which encoding to expect, and decode/re-encode to the actual encoding of the terminal used at that point (or just decode to unicode, and write as unicode, and leave Python to do the right thing)
<mjacob> wouldn't it be sufficient to run python code `sys.stdout.encoding` in replay mode to get what it was in recording mode?
<mjacob> ah, i understand; the frontend doesn't currently have a way to catch it
<arigato> maybe, but that would assume it was a Python interpreter, which the revdb.py currently doesn't assume
<arigato> yes, it would need to be written in the log file in a way that revdb.py can parse
<arigato> in some headers or something
<mjacob> why not add a command to query it? (i don't have a preference, just asking)
<arigato> yes, it can work too
<arigato> I don't exactly remember the protocol
<arigato> maybe the revdb-pypy can emit the encoding as a special command when it has figured it out
<arigato> and when replying it re-reads that special command from the log file
<arigato> and then communicate it somehow to revdb.py
<arigato> but I'm not sure how it works any more, so simply reserving a field in the header of the logfile for it sounds simpler
<mjacob> the stdout encoding could change (e.g. `sys.stdout.reconfigure(encoding='ascii')`)
<mjacob> it that too obscure to support?
shimst3r_ has left #pypy [#pypy]
<arigato> uh, no idea
shimst3r has joined #pypy
tsraoien has quit [Ping timeout: 268 seconds]
<mjacob> also, you can write bytes directly (sys.stdout.write('foo') on python 2, sys.stdout.buffer.write(b'foo'))
<mjacob> an interpreter could not have the notion of unicode or stdout encoding at all
<mjacob> maybe it's better to print the raw bytes at the frontend; and possibly implement "show escaped bytes" later for the case when the encodings differ
tsraoien has joined #pypy