Data Replication

We have dev, test and production instances provisioned in Azure. We also have a development community that is building custom applications. Our objective is to allow production to run as a normal design collaboration instance but the data is copied to a custom development instance. This will allow custom developers to use production data and not have an impact on production users. We are after direction what would be the best way to achieve this. Our set up uses Azure Kubernetes with Postgres Flexible Server backend. Can we just replicate the Postgres database instance? Thank you for your help

Hi @shiangoli

Short answer:
Copying databases to a different environment (a server with a different domain) is not a use-case we explicitly consider and would require some testing by your team to ensure it works. Even if it does work for the current Speckle server version, we couldn’t guarantee that future versions of Speckle server would not introduce breaking changes to this use case.

Longer answer:
There are a few things to consider before deciding to clone production data into a development environment:

  • are there security, commercial non-disclosure agreements, or contractual obligations which might apply should you expose production data to a different (development) system with different levels of control & expectations about the data?
  • presumably there will be a different set of users, i.e. developers, with different permissions, e.g. admin rights. Is it ok to give them access to data they would not normally have on the production system?
  • could users on the production system upload confidential or commercially sensitive data without realising that it is also exposed to different users on the development system?
  • could development affect security controls, and is there a risk they could be accidentally dropped or lessened, accidentally exposing production data or risking a data breach?

If the risks of having production data in a development environment is something your organisation can accommodate, there are some further technical considerations:

  • will the data in the development server be amended and deviate from the production server? This might include new users, or changes to user permissions. Or it might be different uploaded files or comments, as well as 3D data.
  • does any new/amended data in the development server have to be retained? (i.e. it’s not feasible to delete the development server data periodically?)
  • will there be database migrations or similar changes to the underlying database schema in the production server?

If yes to any of the above, then a periodic database replication might not be suitable and a different approach may be required.

Hope this helps,

Iain

I imagine the scenario would be a particular project needs to be replicated from the production database (along with the associated user access information?) So the question then becomes how to correctly select that data from the production environment and create a publication for replication.

GPT thinks that bi-directional replication is not possible natively in Azure postGRES, so would require some more investigation of options IF you were to allow change made in dev to migrate back to production. Might be simplest to set an expection that this isn’t to be done.

From our recent experience, replication is not something most cloud providers offer on their managed solutions. You’d be much better off with a vanilla managed postgres; but we’re biased towards being close to the metal usually :sweat_smile:

There might be a process solution you could consider: if your dev team needs read only access, they can be added to production projects as such - so they won’t be able to cause any damage, but will have access to production data. In this scenario you won’t need to duplicate deployments, providing you provision the production server to be reliable enough.

Another option is running an ETL without the T to the custom development instance using the API’s as the integration. The scope being limited to the requirement of what’s being developed. Assuming this bounded method of copying the data is sufficient for the custom work

This is where I’d usually start—checking if a snapshot of the latest data is enough for your needs, rather than doing a full, faithful DB replication. With the API approach, you can scope it more tightly to what’s actually needed for the custom work. You’ll have a better sense of the exact requirements, so it should give you more flexibility without the complexity of full replication.

Let us know how it goes!

1 Like