What was the argument for postgres?

maxmodugen · 29 August 2022 16:11

Hey folks at the speckle team!

I am wondering, mainly out of curiosity and trying to challenge our own architectural decisions: Why did you choose SQL / PostgreSQL as the basis of data management on the server?

The speckle objects themselves are mainly nested jsons, right?
- using mongoDB objects could be queried a lot more flexibly and into depth
- I saw, a while ago in the graphQL query explorer, there is the ability to query the data on a commit which evaluated the JSON field on the postgres by some query (which is not database native for SQL, right?).
relationships between data are fairly simple
- i.e. commits relate to each other and contain objects
- git, after all, uses and object based storage implementation as well

Looking forward to hear your arguments

And as always,
Keep up the great work of making this AEC a bit more nerdy

gergo · 29 August 2022 18:50

Hey @maxmodugen!

This is an interesting topic and I think we’re all happy to start some convo around it. Mostly @dimitrie can give you the OG reasons.

But here are some pro PostgreSQL points:

While the Speckle base objects are primarily structured as nested models, our object model abstraction kit and (de)composition API-s add a lot of extra optimizations, that are geared towards solving the 3D data communications.
Detached attributes are stored as separate objects to make all of our de-duplication logic work.
These above make it so, that most real world project data would not fit the single document model or size constraint (16 MB iirc ), that MongoDB has and we would need our own custom logic on top to be able to query data in a meaningful way.
The JSON(b) support of PostgreSQL is actually a lot more powerful than most ppl give it credit for.
Thanks to detaching we can also do some quite parallelized object data streaming from the server to clients instead of reading and serving 1 big document.
Extra benefit, that our clients can keep a local cache of objects using SQLite in the same structure as the server does so we can leverage that, to reduce the amount of data that we need to constantly transfer over the wire.

I’m sure there are more pros and are many cons to this approach too. I’m keen to hear them from you

maxmodugen · 30 August 2022 06:58

Hi @gergo ,

thanks for this neat summary!
I did not have the decomposition in mind, but it makes a lot of sense. Also the other reasons are quite reasonable.

Don’t have anything just yet that could “disprove” your point or enhance the discussion

Best,
Max

dimitrie · 30 August 2022 10:48

Thanks @gergo - I honestly don’t have much to add besides some colour/context!

2.0 was started with performance optimisations in mind as well as more freedom to structure data arbitrarily. Main 1.0 limitation was that everything needed to be a flat list of objects - hence the 2.0 decomposable tree approach.

I chose Postgres for the Speckle Server because it’s fast, stable, and generally an amazing piece of reliable tech that’s been around for ages. All cloud providers have it as a vanilla DBaaS offering, so self-hosting your own instance shouldn’t be a problem or more costly than it needs to be, even at large scales. It’s a shark, not a dinosaur

I do want to emphasise one thing though - PostgreSQL is the db of choice for the Speckle Server. Transports can write to/from any other DB or medium.

PS: I’m also a big (aspiring) fan of https://boringtechnology.club/

bfortunato · 30 August 2024 20:55

I’ve been debating wether to go with MongoDB vs PostgreSQL for a BIM app we are working on. It sounds like the document restraints were a serious limiting factor and its better to go with postGreSQL and then just store any unstructured data in a JSON field.

The issue that I was having is that there is so much unique data for each object, especially since we are writing product spec data which changes for each type of product, that there really is no effective way to store this in a traditional SQL database.

haitheredavid · 30 August 2024 21:12

this fucking rocks. i died a bit at:
"I hate Donald Rumsfeld, and I hope he fries. But he’s associated with the following, which is thoroughly relevant to our subject. So I feel like I have to acknowledge his demonic presence long enough to distance myself from him"

gergo · 31 August 2024 04:56

Yeah, the internet is such a lovely place sometimes

bfortunato · 14 September 2024 15:00

Curious if working with DAG’s might change this assumption at all. THat was one of the reasons we were looking at MongoDB since it was easier to create graph like data structures. PostgreSQl does have the iTree extension which we were looking into . We are working on a plugin Rhino that is going to organize CAD data into a hierachry, similar to the DOM for HTML, or similar to some of the part assembly relationships you see in Solidworks or Catia. Every object would have a parent or child. CAD data is suprisingly unstructured. Any experience working with DAG’s or graph like structures?

dimitrie · 15 September 2024 16:15

Each domain abstraction has its needs, so perhaps something that gives you a more graph like abstraction is something you’re going to enjoy. Not sure if relevant or if accurate, but @chuck might have more intel on this from working on nodepen!

chuck · 18 September 2024 10:48

I think one particularly fun idea here is overlapping DAG representations of the same data.

“CAD data is surprisingly unstructured” => “Let’s give it hierarchy” => “Hierarchy to/for who?”

As for “how would I represent a DAG” - nodepen is still v simple and represents nodes as a flat list with references in each node to source nodes by id