Circular references + detaching objects in Python

TomSvilans · 21 June 2022 10:16

Hi!
I am trying to represent a graph-like data structure in Python with Speckle schemas and am running into a recursion problem. I’ve gone over all the documentation I can find without much luck.

I have a pair of classes that each references each other. Defining the schema in Python works fine, using the Brep example in the standard objects kit (Foo.update_forward_refs()) so I am able to make and use these objects. I have set all the reference members as detachable, as in the examples (class Element(Base, speckle_type=... + "Element", detachable={"foo", "bar"}):).

When it comes to serialization, however, I get RecursionError: maximum recursion depth exceeded while calling a Python object, which suggests that the serializer goes off chasing those references.

Modifying the basic Python example also gives the same result, so it doesn’t seem to be a product of my schemas:

from specklepy.objects import Base
from specklepy.serialization.base_object_serializer import BaseObjectSerializer

detached_base = Base()
detached_base.name = "a detached base"

base_obj = Base()
base_obj.name = "my base"
base_obj["@nested"] = detached_base

# Spiderman meme -----------------------
detached_base["@reference"] = base_obj
# --------------------------------------------

serializer = BaseObjectSerializer()
hash, obj_dict = serializer.traverse_base(base_obj)

hash, serialized = serializer.write_json(base_obj)
deserialized = serializer.read_json(serialized)

Is there a part to the serialization that I am missing to be able to properly serialize cross-referenced objects? Apologies if it is right there, staring me in the face… Thanks!

izzylys · 21 June 2022 11:12

heya @TomSvilans !

so this is part bug and part design

the design: self referential loops in core (py as well as c#) should be ignored on serialisation (see #2 in this list here).

if you want the parent to be injected into nested child elements, you’ll need to write your classes and handle this on set as the Brep object does

github.com

specklesystems/specklepy/blob/782f70fb49ae975e3621ff22aa4ad208ae145bbd/specklepy/objects/geometry.py#L643-L649

      
        
            def _inject_self_into_children(self, children: Optional[List[Base]]) -> List[Base]:
                if children is None:
                    return children
            
            
    for child in children:
                    child._Brep = self
                return children

I believe we settled on this because serialisation happens from inside out so the deepest nested objects get serialised first - the last object to be serialised is the root object. this means we don’t have the id of the root until the very end which means it’s not possible to nest it’s id as a reference in deeper child objects.

the bug: this should not throw an error but should instead be ignored as it is in sharp! i’ll pop in a fix for this soon hehe

TomSvilans · 21 June 2022 12:01

Thanks @izzylys
I was wondering what the _inject... methods were! I’ll give it a shot and see what happens.

I noticed that it serializes/“works” when I set serialize_ignore, but then I don’t get any data back out afterwards

izzylys · 21 June 2022 12:14

no prob dude! yep, serialize_ignore does exactly that - ignores the property on serialization (which is what should happen in this case). you should add any self referential props to be ignored when defining your classes - again the Brep class and all it’s little bits are a prime example of this

lemme know how you get on!

TomSvilans · 21 June 2022 13:58

Alright, it serializes… but then there is the problem with deserialization:

I have 3 classes: Element, Joint, and JointPart
Element has a list of JointPart objects
Joint also has a list of JointPart objects
JointPart has a reference to one Element and one Joint each. It acts as an interface between elements and joints (hyper-edge).
I’ve implemented _inject_self_into_children for Element and Joint and added the element and joint properties of JointPart to serialize_ignore.

Sending this off works, and receiving it works, but now when I crawl the returned Base object, each JointPart object turns up in 2 places:

When looping through each Element, I get the JointPart objects with a correct element property (referring to the Element) but None for the Joint property.
When looping through each Joint, I get the JointPart objects with a correct joint property (referring to the Joint) but None for the Joint` property.
While both JointPart objects share the same id, they fail an identity assertion, which makes it tricky to extract one “true” JointPart object from this.

I’ve attached the object schema and 2 scripts that push and pull data, in case that is helpful

Does this call for a custom deserializer or something like that? Are there any Python docs for this? Even C# docs could help

glulamb_speckle.zip (2.9 KB)

izzylys · 21 June 2022 17:14

looking at the code, I think this is the expected behaviour.

EDIT: tbd i am checking something in sharp atm but maybe i am wrong and there is something we can do in py to handle this

when you’ve received the objects back, your Part looks something like this

  {
    "id": "f9ac9962aa45c8e7a9b2e24a82536d18",
    "speckle_type": "GluLamb.JointPart",
    "totalChildrenCount": 0,
    "name": "jp1",
    "units": "m",
    "applicationId": null
  }

it has no knowledge of any Joints or Elements.

what inject_self_into_children is doing is adding a reference of the parent onto the child when requesting the children. so eg when getting the Parts on Joint, each Part will have the Joint inserted into it. however, each Part in Joint has no awareness of any Elements.

same with each Part in Element - each Part gets the parent Element injected into it, but has no awareness of any Joints.

this is because upon deserialization, you’re creating objects from strings so even thought you conceptually have the same Part inside Joint and Element they do not point to the same thing in memory so mutating the Part in Joint will not also mutate a Part in Element. this is also why they don’t == each other - they are not the same object in memory. i don’t know if there’s currently a way around this on the serialisation side.

I think it might be more helpful to take a step back and understand what it is you want to do and determine if this really is the optimal data structure.

maybe need some additional helper methods on your objects to construct one of these object types rather than try to rely only on serialisation to get what you want?

or maybe one of these object types doesn’t actually need to be an object and can just be a tag / property on the others?

TomSvilans · 21 June 2022 19:53

Righto. I think to completely reconstruct the objects, I’d need to aggregate the two Part objects together, probably by holding the “one, true” Part in a dictionary with the ID as the key or something. Then replace the instances of the fragmented Part objects with the true one.

You’re right, there probably is a better way to structure the data for serialization / transport. But in that respect this is exactly an attempt to see where the tricky stuff is for serializing such a data structure… In native code it is a very convenient structure: basically just bidirectional references between joints and elements, so it stays quite flexible… at the cost of the difficulty for serialization.

Another way might be to construct IDs for each object independently of their children - or otherwise give each object a unique tag (or a Reference object?) - serialize them independently, then deserialize them independently, and make the connections / pointers to the actual objects in a second pass.

I’ll give it a crack Thanks for the detailed feedback!

izzylys · 22 June 2022 09:53

ok, SO! update for ya - I was wrong and we actually did make a change in Core a few months ago regarding holding deserialized objects in memory so that two disconnected objs with the same ID would still return the same instance. this change was not propagated to specklepy (oops!) and I will add it to my to dos - however it will not be the quickest fix as I’ll need to rework a couple other things - namely Brep deseiralisation (as some of the brep components are likely to have the same id while belonging to different breps which can cause some nasty bugs as these carry references to their parent breps!)

regarding what you can do right at the mo, my first instinct (as it seems the easiest solution to me lol) would be to add a unique tag eg maybe when setting a joint on a part, also add the name of the joint in the setter to a serializable prop. then you could eg have a helper method that goes through a list of elements and constructs and returns a list of joints by name each containing its list of parts.

btw I think it would help to add __repr__ methods to the classes to help you better visualise what’s inside them. i’ve made some edits to the classes and the create method that will let you see the properties just by printing (zip attached).

also remember that if you don’t have an __init__ method, you shouldn’t set class attributes with mutable defaults (eg empty lists) because you’ll run the risk of mutating a class attribute rather than the intended instance attribute (see here). either only use None as the default and initialise it in the __init__ method or always re-set the val on a property setter.

glulamb_speckle_izzy.zip (29.7 KB)

TomSvilans · 22 June 2022 13:30

Thanks, Izzy! That sounds great, and obviously no rush from my side on the specklepy changes, it’s not mission-critical at all

The tagging idea sounds like a good method, and I was thinking about something like this or self-generated IDs to tell what is what afterwards anyway. I’ll give it a whirl, and perhaps also see if I can implement these schemas in .NET and see if the serialization there is any different.

I was wondering about the __repr__ methods… I implemented them on a couple of the classes, but was also wondering if they in any way affect the hashing of an object? Or is it purely for display purposes?

I did run into the problem with modifying a class list instead of an instance It led to a lot of confusion, as different elements’ joints properties were showing as equal and adding something to one list, added it to all the lists (now obviously since it was the same list ).

izzylys · 22 June 2022 14:13

sure thing - let me know how it goes! and i’ll update ya when I am able to implement the serialization update in specklepy! and yes, as of the update to BaseObjectSerializerV2 in C#, you will get the expected behaviour you’re looking for if you implement this in .NET

__repr__ is a special python “dunder” method that any class can implement to overwrite the string representation of a class (eg what you see what you print the object). any class (speckle or not!) can implement this and it won’t affect any speckle stuff at all.

it’s really useful for cases like this where you want to pull up the important attributes. I implemented them on each of your classes to display the properties we’re talking about in a condensed way so you don’t need to do any manual traversal and printing to debug this.

hehe yes I noticed this issue in your object model which is why I brought it up! it’s a pretty common mistake in python hehe. I fixed this as well in the previously attached files to avoid this issue