Uploads Max File Size - why is that value chosen?

peter.grainger · 9 November 2021 20:02

I’m wondering if there is a reason file uploads are set to a max of 50Mb? Is there an issue with bigger files? @tlmn is trying to send a 100Mb string with the qgis connector.

https://github.com/specklesystems/speckle-server/blob/5b90e8783141663961bd4e1419b614e7a77b751f/packages/server/modules/core/rest/upload.js#L14

dimitrie · 9 November 2021 21:31

Not really, but it trickles down throughout several layers. For one, a large upload can keep a connection busy, thus restricting other users from getting their requests through. This can be felt on a low powered server deployment where you don’t have many instances running. It can also cause spikes in memory usage, which again would require more powerful infra provisions for the server, and require some extra hacks around node’s max memory limit. This can also be felt on the clients as they would eat up more ram too, but that’s felt less on desktops usually. In serverless environments it can be a PITA - we’d ideally load small chunks of data at a time, process, and dump them out of memory. For the server side of things, @cristi has more info (and can correct me) as he benchmarked our production server and we fine-tuned various parts of the ecosystem based on those system behaviours.

To the problem at hand (sorry for the long segway), I really would treat that 100mb string as we do with a point cloud’s coordinate array. As we don’t chunk strings by default, this can be resolved via some object-model space tricks, like we do for large arrays for meshes, etc. In .net we’re relying on OnDeserialised hacks, and hiding away real values. I’m not sure how we’d do that in python (where if i got it right @tlmn’s hacking).

peter.grainger · 9 November 2021 21:56

Thanks @dimitrie very detailed answer!

In summary you can up the limit but not recommended. Uploading large things through APIs is a bit of an anti-pattern.

@tlmn can we split it up?

tlmn · 10 November 2021 09:18

Hey @dimitrie, @peter.grainger,

thx, it does work now.
I had to decrease the size of the chunks, they were way too big…

Although I have the feeling the maximum size of a chunk is 10485760 bytes?!

Would it be recommend to use that size? Or should I rather use smaller chunks?

dimitrie · 10 November 2021 09:35

@cristi / @gergo - correct me if i’m wrong - it’s probably the default nginx config. We probably have two guards - so if you bump nginx to infinite, we’re going to draw the line in node at 50 @tlmn, you’re probably fine leaving your chunk size somewhere under 10!

cristi · 10 November 2021 10:06

@tlmn I think even splitting into 1 MB objects is actually better, but 10 MB is the limit.

I’ll try to explain the reasoning for the 10 MB object limit:

One delicate issue is the database driver prefetch count: When streaming multiple objects to users (like when getting a large stream), the server prefetches a configurable number of objects from the database so they’re available in Node’s memory for passing to the connected client.

If the client has a fast connection and is getting a lot of small objects (most realworld streams), we want to prefetch many objects so the receive operation is as fast as possible.
If the client has a slow connection and is getting Large objects, they will temporarily occupy server’s memory until the client receives them.

So we actually saw server memory stay very low most of the time, but with some GB-level spikes when receiving a stream with artificially large objects.

So to solve it we had the following options:

leave no limit on object size. But given that the server has to store some individual objects in memory when a client is receiving, this would mean unpredictable memory usage and possibly crashes from OutOfMemory in small environments
prefetch a small number of objects when a client is receiving (optimizing for the rare cases of large objects). This would help with the memory usage, but heavily impacts the most common usecase, when a client with a good connection downloads a lot of small objects
Set a limit on object size. This is what we did in order to control the memory usage of the server so that it’s stable when working with any stream data and also to be as fast as possible when downloading data.

We find the ideal performance is with objects of max 1 MB, but up to 10 MB was still acceptable performance and memory usage, so we put the 10 MB limit.

So clients can and should just chunk large values, either directly if it’s a list, or transform into a list if it’s a string or some binary data, to work with multiple smaller objects.