Robots.txt for self-hosted Speckle server

Hi,

we have a self-hosted Speckle Server deployed via the Docker image.
Since our server must be available publicly and TLS-encrypted we have a DNS entry pointing to the server. However the server’s frontend is not intended to be used by anybody outside of our company.

But we now noticed that Google started crawling our Speckle Server and its frontend endpoints. Since we don’t want the server to be indexed by Google or any other search engine when searching for our company name, we would like to know what you think is the easiest approach to make the Speckle Server serve a robots.txt file. Regularly request Google (and other search engines) to remove the server from their index or using a proxy server would also be possible solutions of course, but we are looking for something less maintenance-intensive, that can be done on the server itself.

Thanks for your help!

Best regards
Sven

1 Like

Hi @vsx-sieber - thanks for the input. I think we could make this easier and we have a ticket in our backlog to address part of your requirements (providing a robots.txt), but for now it requires amending the nginx configuration.

This post on StackOverflow points to a mechanism by which a robots.txt can be served which disallows all engines: How to set robots.txt globally in nginx for all virtual hosts - Server Fault

The nginx configuration used by the Docker compose deployment of Speckle server can be found https://github.com/specklesystems/speckle-server/blob/92d9dbd94805582ba6f0ddff88f5ace9ec60e9f1/utils/docker-compose-ingress/nginx/templates/nginx.conf.template . Modifying this file will require the docker image to be built and hosted in your own image repository, and the Docker compose file updated to point to your custom image.

Iain

3 Likes

Ok, thanks for the quick response and the instructions.

I was already suggesting that this would be the only solution for now, but I hoped there would be an - yet undocumented - config parameter already. But now that you confirmed that this is part of your backlog we may just wait for it to be officially included in your deployment image.

For now we just manually remove the domain of our Speckle server from the Google, Bing, etc. indexes.