3 Services in crash loop in Azure Kubernetes Service

Hello community!

  • Objective: Host an in-network Speckle Server on Azure Kubernetes Service

  • Issue: I’m using the latest helm chart from the repository, with minor modifications using Kustomize to mount secrets from KeyVault to deployments that need them. Overall, the deployment is working, however the File Import, Preview and Webhook Services are all failing their readiness probes, as /tmp/last_successful_query does not exist.

  • Logs:

Events:
Type     Reason     Age                    From     Message
----     ------     ----                   ----     -------
  Warning  BackOff    14m (x10883 over 4d)   kubelet  Back-off restarting failed container main in pod speckle-webhook-service-6b5886f76d-k78hc_speckle(8d5d2959-13c5-4235-97ca-c2d531e1a7bb)
  Normal   Pulled     9m49s (x901 over 4d)   kubelet  Container image "speckle/speckle-webhook-service:2.20.5" already present on machine
  Warning  Unhealthy  3m58s (x2707 over 4d)  kubelet  Liveness probe failed: node:internal/fs/utils:356
    throw err;
    ^

Error: ENOENT: no such file or directory, open '/tmp/last_successful_query'
    at Object.openSync (node:fs:596:3)
    at Object.readFileSync (node:fs:464:35)
    at [eval]:1:41
    at runScriptInThisContext (node:internal/vm:143:10)
    at node:internal/process/execution:100:14
    at [eval]-wrapper:6:24
    at runScript (node:internal/process/execution:83:62)
    at evalScript (node:internal/process/execution:114:10)
    at node:internal/main/eval_string:30:3 {
  errno: -2,
  syscall: 'open',
  code: 'ENOENT',
  path: '/tmp/last_successful_query'
}

Node.js v18.20.3

Hi @wmacbmcd

Glad to hear that the critical components are working on Azure for you.

Are you able to view the first log messages produced by one of the containers? This may provide a better understanding of what is occurring.

A couple of things to look for:
Are the file import service etc., able to write to the /tmp/last_successful_query path within the running container image?
Is the file import service etc. able to connect to the postgres database?

Iain

Hi @iainsproat,

Thanks for the advice! You were correct, the logs indicated that the services could not connect to the Postgres database. The secret was being read properly, but it turned out that i just needed to patch the PGSSLMODE env variable set to required, like the server has, into those three deployments.

This appears to have resolved my issue!

Thanks again!
Will

1 Like

Glad that solved it.

If there are any changes you would suggest to the Helm Chart / Kubernetes manifests to better support Azure or external secret providers, please do let me know.

Iain

Definitely! For us, we use the Secrets Store CSI Driver, and add a secretProviderClass template that can pull in external secrets and add them as k8s secrets. To actually sync the secrets, you need to add a volumeMount and volume to one deployment, but once the secrets are created any deployment can use them.

Let me know if this makes sense or if you want any more info!

Will