_0az3 has quit [Remote host closed the connection]
_0az3 has joined #hpy
<antocuni>
ronan: the fact that you don't have a ctx in tp_traverse is on purpose: for example in PyPy it is called by the GC at "random" points and you can't call arbitrary Python code from there
<antocuni>
but I agree that this doesn't solve your concrete problem
<ronan>
antocuni: yes, I remember the reasoning, but I don't see a way to meet both constraints
<antocuni>
I assume that the dtype is stored as an HPyField?
<ronan>
yes
<antocuni>
can the dtype be an arbitrary object, or it's always of a specific type? Maybe instead of HPyField we could store a "struct dtype *" or something like that?
<antocuni>
or you could "cache" the needed info inside the ndarray struct itself, although this is suboptimal because you waste unnecessary memory
<ronan>
hmm, I'm not sure, there are plans for user-created dtypes, but I think for now it's always a PyArray_Descr
<ronan>
caching is difficult because you can change the dtype
<ronan>
in any case, the "struct dtype *" idea doesn't really work, because it contains PyObjects
<ronan>
and the needed info is most of what's in the dtype, including some PyObjects
<antocuni>
I'm confused. For tp_traverse, you just need to know if/where are the HPyField inside the array, don't you?
<antocuni>
this must be somewhat "fixed" and cannot change too much dynamically
<antocuni>
like, if I have an float64 ndarray, I cannot change the dtype to something which contains HPyField/PyObject* because that's just nonsense
<ronan>
the problem is with dtypes containing objects
<ronan>
and particularly structured dtypes
<ronan>
they can have PyObject* at mostly random, possibly unaligned, places
<antocuni>
ok, but for a given structured dtype you can get the list of offsets which contains PyObject* once for all
<antocuni>
e.g. at creation time
<antocuni>
or when the dtype is overwritten, if this is really a supported use case (I think it shouldn't be supported, but maybe it's another topic)
<ronan>
so it looks like you can't change the dtype from Python if it contains objects, but who knows what happens in the internals?
<antocuni>
I don't know, but I don't think it can have any reasonable semantics
<antocuni>
especially if you have a moving GC
<ronan>
I guess it is technically possible to store a list of offsets, but that would double the size of the array in the worst case
<ronan>
I agree that storing objects is a mess, but it needs to be supported for compatibility
<antocuni>
no, I mean that changing the dtype from "no objects" to "yes objects" or vice versa cannot be supported (and probably shouldn't)
<ronan>
I'm pretty sure there are ways to do it, at least in the "yes" to "no" direction, such that you get garbage but don't crash
<ronan>
though we probably don't have to care
<antocuni>
yes, we should ask the numpy devs whether there is any actual useful use case for that or not
<antocuni>
mattip: ^^^ ?
<ronan>
anyway, it seems possible to store a list of offsets on the dtype, but it would still be nice to be able to access the dtype instead of copying the list to the array
<ronan>
in principle, I think it should be OK for the PyPy GC to just walk the object graph
<ronan>
random idea: can we say that a handle is an HPyField without an owner? And if so, how far can we take that idea in the API or the implementation?
<antocuni>
"the PyPy GC to just walk the object graph": what do you mean exactly? How is the GC supposed to know how to walk the graph, apart calling tp_traverse?
<antocuni>
re random idea: I don't understand what you mean, can you provide an example?
<ronan>
I guess my phrasing was confusing: I mean that, in principle, it shouldn't be a problem for the GC if tp_traverse does it, i.e. that it should be possible to implement _hf2py on pypy so that it's safe to call it from tp_traverse