GraphQL Queries - Speckle Object Structure Vibe Check

cdriesler.iv · 8 July 2023 19:17

I’ve been playing with the GraphQL API all afternoon and am in love. I’d like to say what I’m planning to do out loud in order to make sure I’m understanding the system’s constraints right and see if you experts would go about it in the same way.

TLDR: How might I structure deeply nested data with dynamic keys if I want to be able to perform queries by values at multiple depths?

Existing Conditions and Problems

I am producing dozens (hundreds!) of Grasshopper data trees per script execution. By default, they’re organized as a set of nested dictionaries that look something like this:

{
  [nodeInstanceId: string]: { // grasshopper component id
    [portInstanceId: string]: { // grasshopper parameter id
      [branchPath: string]: { // grasshopper data tree path like "{0;0;1}"
        type: string // grasshopper goo type like "curve"
        description: string // value summary like "Trimmed Surface"
        geometry: unknown // result from `ConvertToSpeckle()`
      }[]
    }
  }
}

This is a comfortable way for me to group and then access the data throughout the app. Incredibly conveniently, writing it as one object to my speckle stream is enough to load it into the speckle viewer and see all of values in all those deeply-nested geometry properties.

Even while testing locally, I’m immediately running into size and performance issues, though. In the current “solve” loop, I:

Send down a grasshopper document as json
Convert it into a gh document with rhino compute
Solve the document and format the values like above
Write the result to a speckle stream
Return the result as json, so the client can use it and the speckle viewer can load the objects

So, there’s a few immediate bottlenecks:

(1) I will very quickly hit assorted limits for JSON response size if I keep trying to serialize all those speckle objects.
(2) I am “downloading” the solution once via the response, and then redundantly again when calling viewer.loadObject().

I want to be smarter about using Speckle’s capabilities, though. It feels wrong to send the stream over JSON when I have such great tools available for querying what’s in the stream after a commit.

Requirements

At the moment, all I need to do is:

(1) Provide the speckle viewer an object id to load from the most recent commit.
(2) Provide the client the type and description values of a result. Not the speckle geometry.

The problems I’m facing above are a result of doing the simplest available thing. Which has been good. But it’s redundant and not suitable for anything more than the smallest and simplest of scripts.

Alternatives Attempted

I could add an endpoint that reads the whole stream and strips out the JSON data in C# or javascript land, but then I have to load the entire stream every request. This could do some caching, but do we ever really want to deal with caching? At the very least it seems premature and is definitely a slow solution.

I also tried to flag the geometry properties as [JsonIgnore]. This successfully chopped the response payload down to size, but it also prevented the values from being written to the stream. If there is a way to conditionally [JsonIgnore] in a way that plays nice with speckle serialization, I would definitely reach for that first!

Solution

I’d like to make heavy use of the query and select and orderBy arguments when querying a stream object’s children. It should simplify my life, reduce the size of my payloads, and allow myself to defer loading.

Assume, to address requirement 1, I only return the stream object id once a solution has been committed. This is enough to begin loading the geometry in the viewer. It’s also enough to begin to construct a GraphQL query for some subset of solution values. The client knows every node/port instance id in play.

If the solution object is one massive item formatted like I described above, then it appears that I can’t do a complex transformation of the result to omit those deeply nested geometry values or fetch specific properties. Every id is dynamic, and it appears that the query and select operations can only work with top-level properties of the given object.

So, it seems like I need to restructure the shape of my solutions to make them more speckle-y.

All of that context is for my only real direct question: does this set of structures make sense?

// Top level container. The first object we load or query by id.
class DocumentSolutionData : Base {

  public string Id;

  [DetachProperty]
  // Flat list of solution data per-port (param)
  public List<PortSolutionData> PortSolutionData;

}

class PortSolutionData : Base {
  
  public string NodeInstanceId;

  public string PortInstanceId;

  [DetachProperty]
  // Solution values, first grouped by gh branch ("{0;0;1}")
  public List<DataTreeBranch> SolutionDataTree;

}

class DataTreeBranch : Base {

  // Assert order, since path strings are not always fetched alphabetically
  public int BranchOrder;

  public int BranchPath;

  [DetachProperty]
  // Individual values within the data tree branch
  public List<DataTreeValue> DataTreeValues;

}

class DataTreeValue : Base {

  // Needed by client
  public string Type;

  // Needed by client
  public string Description;
  
  // NOT needed by client. v beeg
  public Base Geometry;

}

The docs tend to refer to Detach as an operation for sharing references to the same object and reducing duplicate writes. But, for me, it appears to be the way to open up the ability to query lists of children objects. If I’ve understood the structure and the API correctly, this would allow me to do queries like:

stream(streamId)
  object(solutionId) # DocumentSolutionData
    children(query) # PortSolutionData: optionally query by node or port id, or get many at once
      objects
        children(orderBy) # DataTreeBranch: preserve branch order
          objects
            children(select: ["Type", "Description"]) # DataTreeValue: get subset of values
              objects {
                data # The thing I actually want
              }

The access to queries and cursors on the port, branch, and value list levels appears to be super powerful here. As solution sets grow, I can paginate at the appropriate level. Or, if I know exactly the values I want, I can fetch it directly instead of loading the whole solution.

The reason I’m asking for a vibe check is because I came across two things that lowered my confidence a bit that I was understanding things correctly:

Nothing ever seemed to talk about Detach as a tool for allowing complex queries.
When testing, I found myself having to do a lot of speckle_type = Type.Expected.Here with C#-style namespace-y types. I’d see values I didn’t expect to see at that “depth” of the object.
The docs said that any usage of query and orderBy is expensive, but it was hard to tell if it would be more expensive than trying to load a full stream all the time.

Thanks in advance for any guidance. That turned into a bit of an essay. Most important thing for me to say is that I’m loving Speckle even more the deeper I dive into things. Everything seems possible and I can’t wait to show off the kickflips this thing can do now that I’m using your tech.

jonathon · 8 July 2023 19:57

Given a “node” at any particular depth of the data tree, are you looking to query by criteria that individual node but retrieve its entire path, or is the node itself enough? the same question for any arbitrary set of nodes fulfilling said criteria.
Detaching is described in principle in terms of the reusability of data because that is the most general concept to grasp. At the core, you are right; any detached object is separately serialized and, therefore, can be individually addressed and … searched for and retrieved.
TBD, which is the most expensive, you’ll still need to commit the whole unless you have something incredibly smart to handle keeping track and not zombifying singleton objects.

By the time SpeckleCon comes around, we’ll have you singing.

vwb · 9 July 2023 08:36

Hey Chuck! I’ve stumbled on a similar place before. In my case, I stopped looking too much to the objectId, which in my mental model became something that Speckle owns it to manage immutability and etc, and I started to make more use of the applicationId, which is something that my application can own to control objects across multiple commits. You might have more flexibility going this route.

cdriesler.iv · 9 July 2023 11:38

To start, I expect to only need the node’s own object’s info asap (node id, port id, info about results but not the results). I can defer loading the rest of the tree below it (the branches of actual solution values) and will be able to ask for those “by name” when the given node/port needs to see them.
Roger, ty for clarifying!
Definitely not in a place to try being clever yet, but looking forward to seeing how this goes. Will probably be back once I’m grappling with more Real Data.

Absolutely can’t wait to fanboy talk about this stuff at SpeckleCon. Hopeful I can start putting it in front of people again much sooner. (and ty!)

cdriesler.iv · 9 July 2023 11:39

I saw that coming through and didn’t realize that’s what it was! Thank you, will start thinking about things this way and give it a try.

dimitrie · 9 July 2023 20:45

I’d really need to understand better the end-user behaviour you’re looking to facilitate to properly give you sane answers! For example, it’s much faster to load the whole tree in the viewer (or sub parts of a tree, as defined by detachability) than to load individual small objects.

Server side queries are fast via the GQL api, but if you’re looking for fast raw access to data and you’re doing 1000s of them, they become slow in aggregate because of the overhead incurred. Another example, FE1 uses gql queries for each “unfold” click in the object explorer - leaner memory wise, fetches the data when needed, less dom elements etc. So if those queries are user interaction driven, it’s perfect! If not, and you end doing many of them… you might want to have the whole thing in memory, where it will be faster.

I think so. We’re happy to help along the way. One small vibe check from my end: DataTreeValues of DataTreeBranch, and DataTreeBranch itself look similarly to our collection classes. If you’re interested in compatibility with the mainline frontend2 viewer, ping. Not sure it is (or should be even) on the agenda.

It fundamentally is that: it basically tells Speckle that that specific object becomes a database row, and this opens up things to the magic of db-bound queries in the case of the speckle server. I probably wrote the docs with that angle because at the time was getting “but do you diff” questions. @jonathon said it better.

This is a bit weird. We do produce by side effects extra values (like chunks), but i’d be keen to know more.

cdriesler.iv · 10 July 2023 19:40

I’d really need to understand better the end-user behaviour you’re looking to facilitate to properly give you sane answers!

Totally fair. At the moment, I’m looking as the display of three particular things:

I want to see a 3D model of the results!
I need to keep wire style up to date with the latest data tree structure (single, list, tree) asap. This does not require loading all of the values, but I at least need “metadata” about the tree.
When a user hovers over a param, I want to show the values preview. This can be deferred until that interaction.

The FE1 unfold is a good example to measure against. I’ll draw my battle lines based on where the interactions are and where the always-and-a-lot lines are. Exactly the kind of feedback I was after, ty.

If you’re interested in compatibility with the mainline frontend2 viewer, ping.

This is the first I’m hearing about a new viewer. The current one is solving all my problems beautifully but you might hear from me about this later. I also haven’t come across the Collections class yet, will see what that’s about.

It fundamentally is that: it basically tells Speckle that that specific object becomes a database row, and this opens up things to the magic of db-bound queries in the case of the speckle server.

Understood! To be clear I didn’t mean anything as a critique of the docs. It just seemed like I might have missed a better way to what I was after.

This is a bit weird. We do produce by side effects extra values (like chunks), but i’d be keen to know more.

If I see it again I’ll be back with more specific examples.

Mountains of thanks for the guidance here.

cdriesler.iv · 13 July 2023 16:24

@dimitrie I’m back with more info about “unexpected values” found in children: it appears to just be children-of-children.

So, say I have an object with a detached property List<T>, and then each of those children have a detached property List<T>. The children gql query on the top-level object appears to be returning everything all the way down the tree that it can find. Does that track? I had expected (for no reason, really) that children could only return that first “level” of detached values.

If it was critical, it looks like I could limit things to the expected type with a query like speckle_type = Name.Of.Type, but this smells like I may be using the concept incorrectly? In my case, I will always get data of the same (known) shape at each depth, but, if I understand correctly, this isn’t something speckle enforces?

dimitrie · 13 July 2023 16:53

Ah, i see - it’s actually by design that I’m returning the full “all the way” down list of children. That’s because, on deserialisation in things like .NET, you can’t leave fields hanging (e.g., you cannot partially deserialize an object - you need its full tree there).

But you’re in luck. Have you checked the depth parameter of the query? In theory, if you pass in depth=1 we will stop earlier.

Also, do note, this is something i’d also like to deprecate

I do not guarantee that the depth is correctly respected in every case as it’s the duty of the data producer to not take shortcuts here. For example, in the case of ifc imports, I am taking shortcuts and the depth of every child is something bogus (i don’t remember, but maybe 0?).

If you’re using Core, you should be fine though.

cdriesler.iv · 13 July 2023 17:30

Stellar stuff. Makes good sense, and depth did it (totally missed it). Thank you! Won’t lean on it too hard, though, given that it’s on its way out. Feeling much more caught up.

cdriesler.iv · 23 July 2023 22:19

Where things ended up, for the curious: C# flavor / ts flavor

Really impressed by the flexibility here and looking forward to putting it through its paces. One query for the “top-level” metadata I need, then one query for specific values arbitrarily deep in the tree. That I can paginate. luv u [DetachProperty]

dimitrie · 24 July 2023 08:37

We should put this on our next set of swag stickers/t-shirts/etc.

cdriesler.iv · 24 July 2023 18:20