[email protected]
[Top] [All Lists]

Re: [Haskell-cafe] PROPOSAL: Web application interface

Subject: Re: [Haskell-cafe] PROPOSAL: Web application interface
From: Michael Snoyman
Date: Sat, 23 Jan 2010 21:31:47 +0200
Just as an update, I've made the following changes to my WAI git repo (http://github.com/snoyberg/wai):

* I removed the RequestBody(Class) bits, and replaced them with "IO (Maybe ByteString)". This is a good example of tradeoffs versus the enumerator approach (see below).
* This might just be bikeshedding, but renamed RequestMethod to Method to make names slightly shorter and more consistent.
* I implemented Mark's suggestions of adding support for arbitrary request methods and information on HTTP version.

I've been having some off-list discussions about WAI, and have a few issues to bring up. The first is relatively simple: what do we do about consuming the entire request body? Do we leave that as a task to the application, or should the server ensure that the entire request body is consumed?

Next, I have made the ResponseBodyClass typeclass specifically with the goal of allowing optimizations for lazy bytestrings and sending files. The former seems far-fetched; the latter provides the ability to use a sendfile system call instead of copying the file data into memory. However, in the presence of gzip encoding, how useful is this optimization?

Finally, there is a lot of discussion going on right now about enumerators. The question is whether the WAI protocol should use them. There are two places where they could replace the current offering: request body and response body.

In my opinion, there is no major difference between the Hyena definition of an enumerator and the current response body sendByteString method. The former provides two extra features: there's an accumulating parameter passed around, and a method for indicating early termination. However, the accumulating parameter seems unnecesary to me in general, and when needed we can accomplish the same result with MVars. Early termination seems like something that would be unusual in the response context, and could be handled with exceptions.

For the request body, there is a significant difference. However, I think that the current approach (called imperative elsewhere) is more in line with how most people would expect to program. At the same time, I believe there is no performance issue going either way, and am open to community input.

Michael

On Mon, Jan 18, 2010 at 1:48 PM, Michael Snoyman <[email protected]> wrote:
Mark, thanks for the response, it's very well thought out. Let me state two things first to explain some of my design decisions.

Firstly, I'm shooting for lowest-common-denominator here. Right now, I see that as the intersection between the CGI backend and a standalone server backend; I think anything contained in both of those will be contained in all other backends. If anyone has a contrary example, I'd be happy to see it.

Secondly, the WAI is *not* designed to be "user friendly." It's designed to be efficient and portable. People looking for a user-friendly way to write applications should be using some kind of frontend, either a framework, or something like hack-frontend-monadcgi.

That said, let's address your specific comments.


On Mon, Jan 18, 2010 at 8:54 AM, Mark Lentczner <[email protected]> wrote:
I like this project! Thanks for resurrecting it!

Some thoughts:

Methods in HTTP are extensible. The type RequestMethod should probably have a "catchall" constructor
   Â| Method B.ByteString

Seems logical to me.
Â
Other systems (the WAI proposal on the Wiki, Hack, etc...) have broken the path into two parts: scriptName and pathInfo. While I'm not particularly fond of those names, they do break the path into "traversed" and "non-traversed" portions of the URL. This is very useful for achieving "location independence" of one's code. While this API is trying to stay agnostic to the web framework, some degree of traversal is pretty universal, and I think it would benefit being in here.

Going to the standalone vs CGI example: in a CGI script, scriptName is a well defined variable. However, it has absolutely no meaning to a standalone handler. I think we're just feeding rubbish into the system. I'm also not certain how one could *use* scriptName in any meaningful manner, outside of trying to reconstruct a URL (more on this topic below).
Â
The fields serverPort, serverName, and urlScheme are typically only used by an application to "reconstruct" URLs for inclusion in the response. This is a constant source of bugs in many web sites. It is also a problem in creating modular web frameworks, since the application can't be unaware of its context (unless the server interprets and re-writes HTML and other content on the fly - which isn't realistic.) Perhaps a better solution would be to pass a "URL generating" function in the Request and hide all this. Of course, web frameworks *could* use these data to dispatch on "virtual host" like configurations. Though, perhaps that is the provenance of the server side of the this API? I don't have a concrete proposal here, just a gut that the inclusion of these breaks some amount of encapsulation we'd like to achieve for the Applications.

I think it's impossible to ever reconstruct a URL for a CGI application. I've tried it; once you start dealing with mod_rewrite, anything could happen. Given that I think we should encourage users to make pretty URLs via mod_rewrite, I oppose inserting such a function. When I need this kind of information (many of my web apps do), I've put it in a configuration file.

However, I don't think it's a good idea to hide information that is universal to all webapps. urlScheme in particular seems very important to me; for example, maybe when serving an app over HTTPS you want to use a secure static-file server as well. Frankly, I don't have a use case for serverName and serverPort that don't involve reconstructing URLs, but my gut feeling is better to leave it in the protocol in case it does have a use case.
Â
The HTTP version information seems to have been dropped from Request. Alas, this is often needed when deciding what response headers to generate. I'm in favor of a simple data type for this:
   Âdata HttpVersion = Http09 | Http10 | Http11

I had not thought of that at all, and I like it. However, do we want to hard-code in all possible HTTP versions? In theory, there could be more standards in the future. Plus, isn't Google currently working on a more efficient approach to HTTP that would affect this?
Â
Using ByteString for all the non-body values I find awkward. Take headers, for example. The header names are going to come from a list of about 50 well known ones. It seems a shame that applications will be littered with expressions like:

   Â[(B.pack "Content-Type", B.pack "text/html;charset=UTF-8")]

Seems to me that it would be highly beneficial to include a module, say Network.WAI.Header, that defined these things:

   Â[(Hdr.contentType, Hdr.mimeTextHtmlUtf8)]

This approach would make WAI much more top-heavy and prone to becoming out-of-date. I don't oppose having this module in a separate package, but I want to keep WAI itself as lite as possible.
Â
Further, since non-fixed headers will be built up out of many little String bits, I'd just as soon have the packing and unpacking be done by the server side of this API, and let the applications deal with Strings for these little snippets both in the Request and the Response.

As I stated at the beginning of this response, there should be a framework or frontend sitting between WAI and the application. And given that the actual data on the wire will be represented as a stream of bytes, I'd rather stick with that.

For header names, in particular, it might be beneficial (and faster) to treat them like RequestMethod and make them a data type with nullary constructors for all 47 defined headers, and one ExtensionHeader String constructor.

Same comment of top-heaviness.
Â
Finally, note that HTTP/1.1 actually does well define the character encoding of these parts of the protocol. It is a bit hard to find in the spec, but the request line, status line and headers are all transmitted in ISO-8859-1, (with some restrictions), with characters outside the set encoded as per RFC 2047 (MIME Message Header extensions). Mind you, I believe that most web servers *don't* do the 2047 decoding, and only either a) pass the strings as ISO-8859-1 strings, or decode that to native Unicode strings.

Thanks for that information, I was unaware. However, I think it still makes sense to keep WAI as low-level as possible, which would mean a sequence of bytes.

Michael

_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe
<Prev in Thread] Current Thread [Next in Thread>