Does the HTTP/WebDav spec allow this client-server dialog?
- client: I want to PUT data to /user1/foo.mkv which has this hash sum: HASH
- server: OK, PUT was successful, you don't need to send the data since I already know the data with this hash sum.
Note: This PUT is an initial upload. It is not an update.
If this is possible, a way faster file syncing could be implemented.
Use case: The WebDAV server hosts a directory for each user. The favorite video foo.mkv gets uploaded by several users. In this example the favorite video is already stored at this location: /user2/myfoo.mkv. The second and following uploads don't need to send any data, since the server already knows the content. This would reduce a lot of network load.
Preconditions:
- Client and server would need to agree on the hash algorithm beforehand.
- The server needs to store the hash-value of already known files.
It would be very easy to implement this in a custom client and server. But that's not what I want.
My question: Is there an RFC or other standard that allows such a dialog?
If there is no standard yet, then how to proceed to get this dream come true?
Security consideration
With the above dialog it would be able to access the content of know hashes. Example an evil client knows that there is a file with the hash sum of 1234567.... He could do the above two steps and after that the client could use a GET to download the data.
A way around this to extend the dialog:
- client: I want to PUT data which has this hash sum: HASH
- server: OK, PUT would be successful, but to be sure that you have the data, please send me the bytes N up to M. I need this to be sure you have the hash-sum and the data.
- client: Bytes N up to M of the data are
abcde... - server: OK, your bytes match mine. I trust you. Upload successful, you don't need to send the data any more.
How to get this done?
Since it seems that there is not spec yet, this part of the question remains:
How to proceed to get this dream come true?
From what you described, it seems like ETags should be used.
It was specifically designed to associate a tag (usually an MD5 hash, but can be anything) with a resource's content (and/or location) so you can later tell whether the resource has changed or not.
PUT requests are supported by ETags and are commonly used with the
If-Matchheader for optimistic concurrency control.However, your use case is slightly different as you are trying to prevent a PUT to a resource with the same content, whereas the
If-Matchheader is used to only allow the PUT to a resource with the same content.In your case, you can instead use the
If-None-Matchheader:WebDAV also supports Etags though how it's used may depend on the implementation:
If you are implementing your own client, I would do something like this:
ETagIf-None-MatchesheaderUPDATE
From your updated question, it now seems clear that when a PUT request is received, you want to check ALL resources on the server for the absence of the same content before the request is accepted. That means also checking resources which are in a different location than what was specified as the destination to the PUT request.
AFAIK, there's no existing spec to specifically handle this case. However, the ETag mechanism (and the HTTP protocol) was designed to be generic and flexible enough to handle many cases and this is one of them.
Of course, this just means you can't take advantage of standard HTTP server logic -- you'd need to custom code both the client and server side.
Assumptions
Before I get into possible implementations, there are some assumptions that need to be made.
Possible implementations
These have been ordered from simplest to increasing complexity if the simple case doesn't work for you.
Possible implementation 1
This assumes your server implementation allows you to read the request headers and respond before the entire request is received.
If-None-Matchcontaining the ETag and continue sending the body normally.Possible implementation 2
This is slightly more complex, but better adheres to the HTTP spec. Also, this MIGHT work if your server architecture doesn't allow you to read the headers before the entire request is received.
If-None-Matchcontaining the ETag and anExpect: 100-continueheader. The request body is NOT yet sent at this point.Possible implementation 3
This implementation probably requires the most work but should be broadly compatible with all major libraries / architectures. There's a small risk of another client uploading a file with the same contents in between the two requests though.
/check-etag/<etag>where<etag>is the ETag. This checks whether the ETag already exists at the server./check-etag/*checks to see if a resource with that ETag already exists.Considerations
Although the implementation is up to you, here are some points to consider:
Notes
Also, DO NOT close the connection from the server side without sending any status codes, as the client will most likely retry the request: