Wednesday, July 13, 2005

Almost done with HTTP

PyFileServer

Completed:
+ HTTP functions: OPTIONS, HEAD, GET, POST(modded GET), PUT, DELETE, TRACE (not supported)
+ Partial Ranges for GET (single range only - which is most servers and browsers for download resume)
+ Conditional GETs, PUTs and DELETEs

Working on:
+ GZip support: I realize that the GzipMiddleware did not work well with partial ranges since Content-Range comes on the compressed portion. Back-to-the-drawing-board with regard to GZip support. Also need to know the compressed file size in advance for Range headers, while trying to avoid any sort of buffering of the whole file (compressed or not).

+ Authentication - this is starting on WebDAV, since it requires compulsory authentication - looking it up.

On the shelf:
+ MD5, encryption. Partial and Conditional support for PUT. Content-Encodings for PUT

CVS and SVN repositories updated

2 comments:

Ian Bicking said...

It wouldn't make sense if the Range headers applied to the gzipped content. Since it's optional for the server to gzip the content (the client merely indicates that it is capable of receiving gzip with the Accept header), the client may or may not receive a gzipped response. So if the Range header applied to the gzipped content, you'd get two different ranges depending on what the server decided to do, and that would make sense.

Gzipping is just an *encoding* -- that is, it's applied to the body, but doesn't effect the content of the "real" body, just the bytes that are sent across the wire. So any headers that apply to the real body (like Range) apply before gzipping.

Well... at least, I think so. Some things like Content-Length seem like they should apply after gzipping. The HTTP/1.1 spec doesn't really seem to offer any answers either. Maybe Apache's mod_gzip would indicate some of how this should work.

cwho said...

That was what I had thought initially, but I searched the internet (googling was very tedious since "HTTP" matched everything possible), and it seemed to indicate that Content-Encoding applies before ranges are specified. Two below:

http://www.and.org/texts/server-http
Under "Range and Accept-Encoding headers": The Range specifies a content range within the encoded entity.

http://www.squid-cache.org/mail-archive/squid-dev/200308/0145.html


One way to get around the problem would be to modify the Etag header to include the gzip identifier (does this make sense). e.g. Etag = gzip-<> vs Etag = <>, ensuring that the client will not mix/match zipped and unzipped content.

Another issue raised in all the internet searching was that most servers enable gzipping only for specific types of files (text) since the zip format + range can break some application data streams. Will add a configuration item for it.