_0az3 has quit [Remote host closed the connection]
_0az3 has joined #hpy
<antocuni> ronan: the fact that you don't have a ctx in tp_traverse is on purpose: for example in PyPy it is called by the GC at "random" points and you can't call arbitrary Python code from there
<antocuni> but I agree that this doesn't solve your concrete problem
<ronan> antocuni: yes, I remember the reasoning, but I don't see a way to meet both constraints
<antocuni> I assume that the dtype is stored as an HPyField?
<ronan> yes
<antocuni> can the dtype be an arbitrary object, or it's always of a specific type? Maybe instead of HPyField we could store a "struct dtype *" or something like that?
<antocuni> or you could "cache" the needed info inside the ndarray struct itself, although this is suboptimal because you waste unnecessary memory
<ronan> hmm, I'm not sure, there are plans for user-created dtypes, but I think for now it's always a PyArray_Descr
<ronan> caching is difficult because you can change the dtype
<ronan> in any case, the "struct dtype *" idea doesn't really work, because it contains PyObjects
<ronan> and the needed info is most of what's in the dtype, including some PyObjects
<antocuni> I'm confused. For tp_traverse, you just need to know if/where are the HPyField inside the array, don't you?
<antocuni> this must be somewhat "fixed" and cannot change too much dynamically
<antocuni> like, if I have an float64 ndarray, I cannot change the dtype to something which contains HPyField/PyObject* because that's just nonsense
<ronan> the problem is with dtypes containing objects
<ronan> and particularly structured dtypes
<ronan> they can have PyObject* at mostly random, possibly unaligned, places
<antocuni> ok, but for a given structured dtype you can get the list of offsets which contains PyObject* once for all
<antocuni> e.g. at creation time
<antocuni> or when the dtype is overwritten, if this is really a supported use case (I think it shouldn't be supported, but maybe it's another topic)
<ronan> so it looks like you can't change the dtype from Python if it contains objects, but who knows what happens in the internals?
<antocuni> I don't know, but I don't think it can have any reasonable semantics
<antocuni> especially if you have a moving GC
<ronan> I guess it is technically possible to store a list of offsets, but that would double the size of the array in the worst case
<ronan> I agree that storing objects is a mess, but it needs to be supported for compatibility
<antocuni> no, I mean that changing the dtype from "no objects" to "yes objects" or vice versa cannot be supported (and probably shouldn't)
<ronan> I'm pretty sure there are ways to do it, at least in the "yes" to "no" direction, such that you get garbage but don't crash
<ronan> though we probably don't have to care
<antocuni> yes, we should ask the numpy devs whether there is any actual useful use case for that or not
<antocuni> mattip: ^^^ ?
<ronan> anyway, it seems possible to store a list of offsets on the dtype, but it would still be nice to be able to access the dtype instead of copying the list to the array
<ronan> in principle, I think it should be OK for the PyPy GC to just walk the object graph
<ronan> random idea: can we say that a handle is an HPyField without an owner? And if so, how far can we take that idea in the API or the implementation?
<antocuni> "the PyPy GC to just walk the object graph": what do you mean exactly? How is the GC supposed to know how to walk the graph, apart calling tp_traverse?
<antocuni> re random idea: I don't understand what you mean, can you provide an example?
<ronan> I guess my phrasing was confusing: I mean that, in principle, it shouldn't be a problem for the GC if tp_traverse does it, i.e. that it should be possible to implement _hf2py on pypy so that it's safe to call it from tp_traverse