I’ve been playing with the GraphQL API all afternoon and am in love. I’d like to say what I’m planning to do out loud in order to make sure I’m understanding the system’s constraints right and see if you experts would go about it in the same way.
TLDR: How might I structure deeply nested data with dynamic keys if I want to be able to perform queries by values at multiple depths?
Existing Conditions and Problems
I am producing dozens (hundreds!) of Grasshopper data trees per script execution. By default, they’re organized as a set of nested dictionaries that look something like this:
{
[nodeInstanceId: string]: { // grasshopper component id
[portInstanceId: string]: { // grasshopper parameter id
[branchPath: string]: { // grasshopper data tree path like "{0;0;1}"
type: string // grasshopper goo type like "curve"
description: string // value summary like "Trimmed Surface"
geometry: unknown // result from `ConvertToSpeckle()`
}[]
}
}
}
This is a comfortable way for me to group and then access the data throughout the app. Incredibly conveniently, writing it as one object to my speckle stream is enough to load it into the speckle viewer and see all of values in all those deeply-nested geometry
properties.
Even while testing locally, I’m immediately running into size and performance issues, though. In the current “solve” loop, I:
- Send down a grasshopper document as json
- Convert it into a gh document with rhino compute
- Solve the document and format the values like above
- Write the result to a speckle stream
- Return the result as json, so the client can use it and the speckle viewer can load the objects
So, there’s a few immediate bottlenecks:
(1) I will very quickly hit assorted limits for JSON response size if I keep trying to serialize all those speckle objects.
(2) I am “downloading” the solution once via the response, and then redundantly again when calling viewer.loadObject()
.
I want to be smarter about using Speckle’s capabilities, though. It feels wrong to send the stream over JSON when I have such great tools available for querying what’s in the stream after a commit.
Requirements
At the moment, all I need to do is:
(1) Provide the speckle viewer an object id to load from the most recent commit.
(2) Provide the client the type
and description
values of a result. Not the speckle geometry
.
The problems I’m facing above are a result of doing the simplest available thing. Which has been good. But it’s redundant and not suitable for anything more than the smallest and simplest of scripts.
Alternatives Attempted
I could add an endpoint that reads the whole stream and strips out the JSON data in C# or javascript land, but then I have to load the entire stream every request. This could do some caching, but do we ever really want to deal with caching? At the very least it seems premature and is definitely a slow solution.
I also tried to flag the geometry
properties as [JsonIgnore]
. This successfully chopped the response payload down to size, but it also prevented the values from being written to the stream. If there is a way to conditionally [JsonIgnore]
in a way that plays nice with speckle serialization, I would definitely reach for that first!
Solution
I’d like to make heavy use of the query
and select
and orderBy
arguments when querying a stream object’s children
. It should simplify my life, reduce the size of my payloads, and allow myself to defer loading.
Assume, to address requirement 1, I only return the stream object id once a solution has been committed. This is enough to begin loading the geometry in the viewer. It’s also enough to begin to construct a GraphQL query for some subset of solution values. The client knows every node/port instance id in play.
If the solution object is one massive item formatted like I described above, then it appears that I can’t do a complex transformation of the result to omit those deeply nested geometry
values or fetch specific properties. Every id is dynamic, and it appears that the query
and select
operations can only work with top-level properties of the given object.
So, it seems like I need to restructure the shape of my solutions to make them more speckle-y.
All of that context is for my only real direct question: does this set of structures make sense?
// Top level container. The first object we load or query by id.
class DocumentSolutionData : Base {
public string Id;
[DetachProperty]
// Flat list of solution data per-port (param)
public List<PortSolutionData> PortSolutionData;
}
class PortSolutionData : Base {
public string NodeInstanceId;
public string PortInstanceId;
[DetachProperty]
// Solution values, first grouped by gh branch ("{0;0;1}")
public List<DataTreeBranch> SolutionDataTree;
}
class DataTreeBranch : Base {
// Assert order, since path strings are not always fetched alphabetically
public int BranchOrder;
public int BranchPath;
[DetachProperty]
// Individual values within the data tree branch
public List<DataTreeValue> DataTreeValues;
}
class DataTreeValue : Base {
// Needed by client
public string Type;
// Needed by client
public string Description;
// NOT needed by client. v beeg
public Base Geometry;
}
The docs tend to refer to Detach
as an operation for sharing references to the same object and reducing duplicate writes. But, for me, it appears to be the way to open up the ability to query lists of children objects. If I’ve understood the structure and the API correctly, this would allow me to do queries like:
stream(streamId)
object(solutionId) # DocumentSolutionData
children(query) # PortSolutionData: optionally query by node or port id, or get many at once
objects
children(orderBy) # DataTreeBranch: preserve branch order
objects
children(select: ["Type", "Description"]) # DataTreeValue: get subset of values
objects {
data # The thing I actually want
}
The access to queries and cursors on the port, branch, and value list levels appears to be super powerful here. As solution sets grow, I can paginate at the appropriate level. Or, if I know exactly the values I want, I can fetch it directly instead of loading the whole solution.
The reason I’m asking for a vibe check is because I came across two things that lowered my confidence a bit that I was understanding things correctly:
- Nothing ever seemed to talk about
Detach
as a tool for allowing complex queries. - When testing, I found myself having to do a lot of
speckle_type = Type.Expected.Here
with C#-style namespace-y types. I’d see values I didn’t expect to see at that “depth” of the object. - The docs said that any usage of
query
andorderBy
is expensive, but it was hard to tell if it would be more expensive than trying to load a full stream all the time.
Thanks in advance for any guidance. That turned into a bit of an essay. Most important thing for me to say is that I’m loving Speckle even more the deeper I dive into things. Everything seems possible and I can’t wait to show off the kickflips this thing can do now that I’m using your tech.