Let’s Talk Null: Because Sometimes Nothing Really Matters

jonathon · 26 August 2024 14:33

As we gear up for the next generation of connectors, we’re rethinking how we handle data—and one key aspect of that is how we deal with null values.

Speckle is rapidly becoming the go-to AEC data hub, and with that growth comes an ever-increasing volume of data being sent and received. While storing all that data is no problem, we all know that no one wants to wait hours while their data transfers—especially when a chunk of that data might be of questionable value.

At Speckle, we’re walking a tightrope between being purists and taking a more opinionated stance. The purist approach says, “Send everything, because there might be a needle in that haystack for someone.” The opinionated view says, “Let’s just send the needles we think matter.”

Here’s where we need your @Insiders input:

Who and What Should We Prioritise?
- Which data types are essential to your workflows? Should we focus on specific types or use cases where completeness is crucial?
Is an Incomplete Data Picture Acceptable?
- If cutting out some data could save time, is that a fair trade-off? Where do you draw the line between speed and completeness?
Workflows That Rely on Nulls:
- Some workflows—especially in web apps or data visualisations—depend on null values to keep things like table structures or schemas intact. Which of your workflows need these nulls, and how critical are they?
Sending Low-Value Data:
- Take the readOnly status of a Revit parameter—if it’s not used in your analysis, should we even bother sending it? What other low-value data can we skip to save time and space?

Why Does This Matter?

Some eye-opening stats from our recent analyses:

Example 1: A recent 450MB Revit model with comprehensive parameter data results in a 8GB sends. By excluding non-essential null values and redundant parameters, we reduced the size much closer to the original size, a 60-85% reduction that translates to significantly faster transfer times.
Example 2: In a complex project send, null values (including metadata noise) constituted 35% of the total data payload. Filtering out these unnecessary nulls reduced the data transfer time from 1 hour to 15 minutes.
Counter Example 3: Web application developers have reported that maintaining null values for specific fields ensured consistent data schemas for validation but admitted that many nulls were superfluous, leading to bloated datasets and slower app performance without cleansing on receive.

Let’s Get This Right—Together

We’re committed to building not just functional but smart connectors—prioritising what’s truly important without bogging down your processes with unnecessary data. But to do that, we need your insights.

What’s your view on the future of nulls in Speckle? How should we balance the need for completeness with the practicalities of data size and transmission time?

Share your thoughts below—your feedback will help shape the future of how we handle data in Speckle.

jonathon · 26 August 2024 14:45

jonathon · 26 August 2024 14:47

@fmeijer @samberger @Eleron96 @dimitrie

ashdotio · 26 August 2024 15:15

Hi there,

What I’m thinking is that having data schemas for different purposes could make sense. We already have practices that require specific parameters to achieve their goals—those could fall under this kind of optimization. It seems like there could be also both generic and customized approaches, with some being more defined and others more flexible. I haven’t really fleshed this out, though. It’s kind of like I’m trying to piece together a puzzle but only have a few pieces right now. There might be some pitfalls in viewing it this way, but at the same time, it feels like it’s part of the foundation for how ORMs work fast(?) nowadays.

Just some thoughts—still working it out!

dimitrie · 26 August 2024 15:30

From where i’m looking, there are valid workflows that require sending nulls across, eg:

BIM: eg, revit > excel > edit params in exel > send to revit > update params in revit)
BIM-adjacent: grasshopper driven ones - memory is vague here, input needed!
others?

There are also endless reasons why not to send null params/values across, as described above.

Where I’m slowly landing is that supporting those valid workflows is probably done in a better way than blindly sending null params out. For us to understand that, we’ll need more input from y’all, especially around specific workflows. The more specific you can get, the more speckle points we’ll send your way

(do not ask me what speckle points are, i’ve just made them up!)

teocomi · 26 August 2024 15:39

Just like our connectors allow you to selectively decide what geometry/elements to publish, they could also let you determine what properties to include.

Several people asked about excluding confidential properties in the past.

Although I don’t know how this could be done, in a simple and straightforward way that doesn’t look like IFC madness…

haitheredavid · 26 August 2024 16:23

Woof, this is a tough one! But if it gets me a few more speckle points, I’ll throw some ideas out there.

My first thought is around where this type of interaction would happen? From the developer side I could see a clear benefit of controlling schemas like you would with graphql.

type SpecklePoint {
     Recipient: String!
}

But realistically, I imagine the most important interaction for this type of logic would be from the user as they should know which data is necessary to ship around. I imagine there is a default set of parameters that are always pushed with certain objects but then the users can select certain properties ,set their nullability, then use that list as a filter to grab and convert those props. That does sound like a few extra steps and level of complexity that most users don’t care to know about. Just thinking about the first example, which really hits home, AEC expects to push everything in their models with very few having a slight interests into what makes a heavy model. I would imagine that most users would rather have speed over completeness, or at least to start.

My second thought is how a user will need to be informed on something being null? For example, I remember we had a script jumping between architects and landscape teams and were file types of rhino/revit/autocad. Landscape needed to have the reference point for the entry way from the architects and the architects needed the survey points to verify the orientation or some shit like that. For the script to work we created a point at 0,0,0 while the input was null, and once they passed it in it just worked . With that lil hack we bypassed informing the user on the null input and let the script run, which ended up in moments of confusion when the user didn’t realize something was missing.

zoomer · 26 August 2024 17:59

I am just a normal Mac, CAD, BIM and 3D user.
So very likely I have no real clue about the Nulls you are discussing here.

But for me this sounds like a good idea.
Once one of my CAD/BIM Apps get finally a Connector, I would
exchange between my CAD/BIM and Blender. Later maybe to
Unreal or Twinmotion. Or from client’s RVTs to Blender.
Interested in Geometry, Materials, Hierarchies, Saved Views
and such things.

So if my Connector had options to skip certain data types like
Block Annotations or other things that I don’t think to need or
don’t even understand - and my Models seem still complete for
my purposes - but I can save bandwidth for Speckle this way -
I would do so.

samberger · 26 August 2024 22:29

Yes, interesting topic!

Before I might come up with a more sophisticated opinion, I already wanted to add some thought to this topic…

If there ever will be a choice on which parameters to upload and which to spare, I probably would prefer to NOT do that in the connector, but rather as a seetting of the model/branch/stream (online). The admins/BIM managers/project leaders probably have the best overview of what kind of data of the model will be subsequently consumed through apps or Speckle Automate. The (different) colleagues doing the actual upload might not. Of course, this also depends on the number of people and hierarchies and departments involved.
… This might even be the same case for what categories to upload??? But I think this one deserves indeed more flexibility.
… Maybe upload everything, but decide “locally” what to download?
reverse direction: Sending null or nothing might not change the amount of content, but is about the data structure of the model… This gets especially relevant if you subsequently want to modify the model, which I assume will be a more and more relevant usecase for Speckle? For instance in our Parameters App, we need to know which parameters exist in the model/element and which not. This again, depend on the source software: Revit does have a structure / rules about which parameters apply to which elements/categories, and importing parameters back to the model will not be able to assign those parameters (and that’s good). Rhino on the other hand is completely free, so here I wouldn’t really care which (empty) parameter is originally assigned to the model. If I need it, I create it.
What does null even mean? Another thing I ran into with our Parameter App: Existing but empty parameters are currently uploaded as null. So, sending data back, I assumed that “sending null” in return means “clear value”, but actually this parameter is skipped and remains unchanged in the model. Beside the fact, that apparently the Revit API is not able to reset a parameter once it had a value, this shows: sending a parameter with value null is not the same as not sending it at all. So leaving defined, but empty parameters we will loose some information.
… The last one might be irrelevant for 90% of the intended workflow, but also means you get stuck with your workflow in 10% of the cases / or on the last 10% of the workflow. That’s actually my problem with many BIM apps, that you can do something, but in the end there is always something, which doesn’t let you go the entire way and you have to switch software. So I wouldn’t like that approach.

Speed doesn’t replace completeness, if it prevents you from reaching your goal at all? Does it?

So actually… I might have my opinion now: Either the users can decide for their use case on the “global” branch level (but as @teocomi said, this might be hard to define and manage), or the full model should be uploaded, because (I think) nobody would ever be able to decide which data might be “relevant” for the user and which not.

But, depending on the source application, you still might be able to reduce the sent data. I’m not so sure about the other softwares, but for Revit you could define the parameters for the element categories (mirroring the rules Revit has), and assigning parameter values only where they exist. For Rhino you can just define and assign where they exist, because there are no rules about which parameters apply to which category/layer etc. … BUT it would make it more complex to work with and every Speckle model would be even more different and more source application specific.
Maybe have a list of parameter definitions (there will always be a lot of repetition) and each element just has: “parameterID” : { “value”: value, “definitionID”: UUID } ?
Maybe just leave any parameter object property (isTypeParameter, value, applicationUnit, applicationId, isShared, isReadonly, totalChildrenCount) which is 0, null or false? Besides that totalChildrenCount, applicationId is always 0/null and applicationInternalName is always the parameter objects key.

Not sure how much the last two would reduce the overall MBs but there is definitely some redundant data.

I’m sorry it’s too late to re-read my own post, so I hope I didn’t spam too much.

chris.welch · 26 August 2024 22:52

Potentially some of this could be solved by uploading the schema for each data table once and then only uploading data that differs from that schema.

So

table: { 
   "name": {"type":"string","default":""}
   "age": {"type":"int","default": -999} //as an example of a "weird AEC default"
   "hobby": {"type":"string","default":"", "optional":true}
}

with (name: "Gavin", age: -999) becoming {"name":"Gavin"}

But you could even just rehydrate the entire table from the schema on pull with a bit of client side logic, so the table can still be reconstituted.

chris.welch · 26 August 2024 23:07

Rethinking instancing in general might also be a part of this puzzle, as blocks, families, datatables, instances are all the same pot - I posted elsewhere on the forum about the amount of time spent diffing an array of instances ( Many long calls to /diff/ endpoint? - Help / Developers - Speckle Community).

Potentially, allowing an object on the server to be a list you can index into (server.url.com/projects/projectId/models/individualObjectId/key) , so you could keep the benefit of isolating individual objects, while minimising the pushing/pulling/diffing?

JoostGevaert · 27 August 2024 02:47

When I first read Jonathon’s post, this was my first thought.

Being able to filter / select what data you want to receive in Rhino/Grasshopper would’ve been handy in many cases.

rklaschka · 27 August 2024 07:22

At the risk of sounding like a heretic, this sounds rather like ICF model view definitions which arguably are one of the things that made IFC more reliable for interop rather than just data exchange.

Perhaps a lesson one could take from that is that the “all the data just in case” model can become so complex and unwieldy that its nature is archival rather that interoperable. Maybe that’s naive “meat-think” from me in the age of intelligent machines.

FWIW I think everyone of my uses of speckle including what may at first glance appear to be send everything dependant could be paired back substantially.

rklaschka · 27 August 2024 07:51

Couple of thoughts from me here:

Send all is a very easy choice for a contributor, are there some alternative very easy versions like send all geometry only? Or some workflow specific options, eg. if I am sending Revit to Blender to do some rendering send a traceable ID and the material information needed only. If a summary of what was sent and what wasn’t was recorded at time of sending this starts to sound like a more detailed suitability record for a commit too which would be a very valuable thing.

Another thought is that smaller companies, perhaps with what you might call lower data hygiene standards probably send a great deal of crap that was just in the template that came out of the box. Working in measured survey there is also a great deal of stuff that simply results from commercial BIM formats having all of the attributes that might be relevant which aren’t ever touched. Maybe in that sense a sort of discipline specific commit could be something to consider too. It could be a starting point from which to customise, or just be used out of the box to leave out attribute data that isn’t within a disciplines remit. the result for the sender is a sort of tailored easy button.

My third thought, something I have mentioned to @jonathon is that I am looking at creating geometry directly in speckle from digitised coordinates. Taking a wall created this way as an example, it has almost no metadata (does that mean all the required fields are null?) and is really just a place holder for a wall in Revit, ArchiCAD, Rhino etc. What does that look like when it appears for the first time on Revit or any other application? Is there are sort of “fill all the defaults” - Perhaps this is a reverse of the null issue. I’d rather a door created this way had a fire rating of null in a receiving application for reasons I hope are obvious.

vwb · 27 August 2024 12:47

This is a difficult one. But here’s an idea. Every now and then, I get warned by my phone, that my storage capacity is close to be reached and that I should delete things.

Most of the time, I ignore it and let them handle it. I paid a lot for this phone, so I hope it can handle storage for me. Now… Sometimes I go on a trip, so I want to download a movie, or a season of a TV Show. So, I will address the issue myself, and I know nothing about null values or storage.

But if you show me a big chunk of my storage is being occupied by memes. I will surely toggle that off.

I believe one key design consideration you guys should take as a product is: architects and engineers know nothing about storage (they will make several copies of half gb files on a drive in a blink of an eye) and nothing about nulls (maybe if you are BIM oriented).

Look to be transparent, and don’t make it nerdy. The first step is to show where the sends are going nuts.

haitheredavid · 27 August 2024 14:58

This guy knows the pain of explaining why your revit model is massive