使用PyWSGI混合WEB组件 ::-- ZoomQuiet [2006-08-24 04:49:56]
Contents
1. Mix and match Web components with Python WSGI
Learn about the Python standard for building Web applications with maximum flexibility
- developerWorks
Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthougth, Inc.
22 Aug 2006
- Learn to create and reuse components in your Web server using Python. The Python community created the Web Server Gateway Interface (WSGI), a standard for creating Python Web components that work across servers and frameworks. It provides a way to develop Web applications that take advantage of the many strengths of different Web tools. This article introduces WSGI and shows how to develop components that contribute to well-designed Web applications.
The main reason for the success of the Web is its flexibility. You find almost as many ways to design, develop, and deploy Web sites and applications as there are developers. With a huge wealth of choices, a Web developer often chooses a unique combination of Web design tools, page style, content language, Web server, middleware, and DBMS technology, using different implementation languages and accessory toolkits. To make all of these elements work together to offer maximum flexibility, Web functionality should be provided through components as much as possible. These components should perform a limited number of focused tasks competently and work well with each other. This is easy to say, but in practice it's very difficult to achieve because of the many different approaches to Web technology.
The best hope to keep your sanity is the growth of standards for Web component interoperability. Some of these important standards are already developed, and the most successful Web development platforms have them as their backbone. Prominent examples include the Java servlet API and the Ruby on Rails framework. Some languages long popular for Web programming are only recently being given the same level of componentization and have learned from the experience of preceding Web framework component standards. One example is the Zend Framework for PHP (see Resources). Another is Web Server Gateway Interface (WSGI) for Python.
Many people have complained the popular Python programming language has too many Web frameworks, from well-known entrants such as Zope to under-the-radar frameworks such as SkunkWeb. Some have argued this diversity can be a good thing, as long as there is some underpinning standardization. Python and Web expert Phillip J. Eby went about the task of such standardization. He authored Python Enhancement Proposal (PEP) 333, which defines WSGI.
The goal of WSGI is to allow for greater interoperability between Python frameworks. WSGI's success brings about an ecosystem of plug-in components you can use with your favorite frameworks to gain maximum flexibility. In this article, I'll introduce WSGI, and focus on its use as a reusable Web component architecture. In all discussions and sample code, I'll assume that you're using Python 2.4 or a more recent version.
1.1. The basic architecture of WSGI
WSGI was developed under fairly strict constraints, but most important was the need for a reasonable amount of backward compatability with the Web frameworks preceding it. This constraint means WSGI unfortunately isn't as neat and transparent as Python developers are used to. Usually the only developers who have to deal directly with WSGI are those who build frameworks and reusable components. Most regular Web developers will pick a framework for its ease of use and be insulated from WSGI details.
If you want to develop reusable Web components, you have to understand WSGI, and the first thing you need to understand about it is how Web applications are structured in the WSGI world view. Figure 1 illustrates this structure.
Figure 1. Illustration of how HTTP request-response passes through the WSGI stack
The WSGI stack
The Web server, also called the gateway, is very low-level code for basic communication with the request client (usually the user's browser). The application layer handles the higher-level details that interpret requests from the user and prepare response content. The application interface to WSGI itself is usually just the more basic layer of an even higher level of application framework providing friendly facilities for common Web patterns such as Ajax techniques or content template systems. Above the server or gateway layer lies WSGI middleware. This important layer comprises components that can be shared across server and application implementations. Common Web features such as user sessions, error handling, and authentication can be implemented as WSGI middleware.
1.2. Code in the middle
WSGI middleware is the most natural layer for reusable components. WSGI middleware looks like an application to the lower layers, and like a server to the higher layers. It watches the state of requests, responses, and the WSGI environment in order to add some particular features. Unfortunately, the WSGI specification offers a very poor middleware example, and many of the other examples you can find are too simplistic to give you a feel for how to quickly write your own middleware. I'll give you a feel for the process WSGI middleware undertakes with the following broad outline. It ignores matters that most WSGI middleware authors won't need to worry about. In Python, where I use the word function, I mean any callable object.
- Set-up phase. A set-up phase occurs once each time the Web server starts up. It accepts an instance of the middleware, which wraps the application function.
- Handling a client request. Handling a client request occurs each time the Web server receives a request.
- Server calls the middleware function with the environment and server.start_response parameters.
- Middleware processes the environment and calls the application callable, passing on the environment and a wrapped function middleware.start_response.
- The application executes; first it prepares the response headers, then it calls middleware.start_response.
- Middleware processes response headers and calls server.start_response.
- Server passes control back to the middleware and then back to the application, which starts yielding response body blocks (as strings).
- For each response, body block middleware makes any modifications and passes on some corresponding string to the server.
- Once all blocks from the application have been processed, middleware returns control to the server, finished for the current request.
1.3. A bold step toward XHTML
Many component technologies are rather complex, so the best examples for instruction are simple throwaway toys. This isn't the case with WSGI, and, in fact, I'll present a very practical example. Many developers prefer to serve XHTML Web pages because XML technologies are easier to manage than "tag soup" HTML, and emerging Web trends favor sites that are easier for automatons to read. The problem is that not all Web browsers support XHTML properly. Listing 1 (safexhtml.py) is a WSGI middleware module that checks incoming requests to see if the browser supports XHTML and, if not, translates any XHTML responses to plain HTML. You can use such a module so all of your main application code produces XHTML and the middleware takes care of any needed translation to HTML. Review Listing 1 carefully and try to combine it with the general outline of WSGI middleware execution from the previous section. I've provided enough comments so you can identify the different stages in the code.
Listing 1 (safexhtml.py). WSGI middleware to translate XHTML to HTML for browsers unable to handle it
import cStringIO from xml import sax from Ft.Xml import CreateInputSource from Ft.Xml.Sax import SaxPrinter from Ft.Xml.Lib.HtmlPrinter import HtmlPrinter XHTML_IMT = "application/xhtml+xml" HTML_CONTENT_TYPE = 'text/html; charset=UTF-8' class safexhtml(object): """ Middleware that checks for XHTML capability in the client and translates XHTML to HTML if the client can't handle it """ def __init__(self, app): #Set-up phase self.wrapped_app = app return def __call__(self, environ, start_response): #Handling a client request phase. #Called for each client request routed through this middleware #Does the client specifically say it supports XHTML? #Note saying it accepts */* or application/* will not be enough xhtml_ok = XHTML_IMT in environ.get('HTTP_ACCEPT', '') #Specialized start_response function for this middleware def start_response_wrapper(status, response_headers, exc_info=None): #Assume response is not XHTML; do not activate transformation environ['safexhtml.active'] = False #Check for response content type to see whether it is XHTML #That needs to be transformed for name, value in response_headers: #content-type value is a media type, defined as #media-type = type "/" subtype *( ";" parameter ) if ( name.lower() == 'content-type' and value.split(';')[0] == XHTML_IMT ): #Strip content-length if present (needs to be #recalculated by server) #Also strip content-type, which will be replaced below response_headers = [ (name, value) for name, value in response_headers if ( name.lower() not in ['content-length', 'content-type']) ] #Put in the updated content type response_headers.append(('content-type', HTML_CONTENT_TYPE)) #Response is XHTML, so activate transformation environ['safexhtml.active'] = True break #We ignore the return value from start_response start_response(status, response_headers, exc_info) #Replace any write() callable with a dummy that gives an error #The idea is to refuse support for apps that use write() def dummy_write(data): raise RuntimeError('safexhtml does not support the deprecated write() callable in WSGI clients') return dummy_write if xhtml_ok: #The client can handle XHTML, so nothing for this middleware to do #Notice that the original start_response function is passed #On, not this middleware's start_response_wrapper for data in self.wrapped_app(environ, start_response): yield data else: response_blocks = [] #Gather output strings for concatenation for data in self.wrapped_app(environ, start_response_wrapper): if environ['safexhtml.active']: response_blocks.append(data) yield '' #Obey buffering rules for WSGI else: yield data if environ['safexhtml.active']: #Need to convert response from XHTML to HTML xhtmlstr = ''.join(response_blocks) #First concatenate response #Now use 4Suite to transform XHTML to HTML htmlstr = cStringIO.StringIO() #Will hold the HTML result parser = sax.make_parser(['Ft.Xml.Sax']) handler = SaxPrinter(HtmlPrinter(htmlstr, 'UTF-8')) parser.setContentHandler(handler) #Don't load the XHTML DTDs from the Internet parser.setFeature(sax.handler.feature_external_pes, False) parser.parse(CreateInputSource(xhtmlstr)) yield htmlstr.getvalue() return
The class safexhtml is the full middleware implementation. Each instance is a callable object because the class defines the special call method. You pass an instance of the class to the server, passing the application you are wrapping to the initializer init. The wrapped application might also be another middleware instance if you are chaining safexhtml to other middleware. When the middleware is invoked as a result of a request to the server, the class first checks the Accept headers sent by the client to see whether it includes the official XHTML media type. If so (the xhtml_ok flag), it's safe to send XHTML and the middleware doesn't do anything meaningful for that request.
When the client can't handle XHTML, the class defines the specialized nested function start_response_wrapper whose job it is to check the response headers from the application to see whether the response is XHTML. If so, the response needs to be translated to plain HTML, a fact flagged as safexhtml.active in the environment. One reason to use the environment for this flag is because it takes care of scoping issues in communicating the flag back to the rest of the middleware code. Remember that start_response_wrapper is called asynchronously at a time the application chooses, and it can be tricky to manage the needed state in the middleware.
Another reason to use the environment is to communicate down the WSGI stack the content has been modified. If the response body needs to be translated, not only does the start_response_wrapper set the safexhtml.active, but it also changes the response media type to text/html and removes any Content-Length header because the translation will almost certainly change the length of the response body, and it will have to be recalculated downstream, probably by the server.
Once the application starts sending the response body, if translation is needed, it gathers the data into the response_blocks list. The application might send the response in chunks, but, for simplicity of the code, it chooses to run the translation mechanism only against a complete XHTML input. WSGI rules, however, stipulate the middleware must pass on something to the server every time the application yields a block. It's okay to pass on an empty string and that's what it does. Once the application is finished, it stitches together the response body and runs it through the translation code, and then yields the entire output in one last string.
Listing 2 (wsgireftest.py) is server code to test the middleware. It uses wsgiref, which includes a very simple WSGI server. The module will be included in the Python 2.5 standard library.
Listing 2 (wsgireftest.py). Server code for testing Listing 1
import sys from wsgiref.simple_server import make_server from safexhtml import safexhtml XHTML = open('test.xhtml').read() XHTML_IMT = "application/xhtml+xml" HTML_IMT = "text/html" PORT = 8000 def app(environ, start_response): print "using IMT", app.current_imt start_response('200 OK', [('Content-Type', app.current_imt)]) #Swap the IMT used for response (alternate between XHTML_IMT and HTML_IMT) app.current_imt, app.other_imt = app.other_imt, app.current_imt return [XHTML] app.current_imt=XHTML_IMT app.other_imt=HTML_IMT httpd = make_server('', PORT, safexhtml(app)) print 'Starting up HTTP server on port %i...'%PORT # Respond to requests until process is killed httpd.serve_forever()
Listing 2 reads a simple XHTML file, given in Listing 3 (test.xhtml), and serves it up with alternating media types. It uses the standard XHTML media type for the first request, the HTML media type for the second, back to XHTML for the third, and so on. This exercises the middleware's capability to leave a response alone if it isn't flagged as XHTML.
Listing 3 (test.xhtml). Simple XHTML file used by the sample server in Listing 2
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" > <head> <title>Virtual Library</title> </head> <body> <p>Moved to <a href="http://vlib.org/">vlib.org</a>.</p> </body> </html>
You should be able to see the effect of this middleware if you run Listing 2 and view it in an XHTML-aware browser like Firefox and then an XHTML-challenged browser like Microsoft Internet Explorer. Make the request twice in a row for each browser to see the effect of the response media type on the operation of the middleware. Use View Source to see the resulting response body and the Page Info feature to see the reported response media type. You can also test the example using the command-line HTTP tool cURL: curl -H 'Accept: application/xhtml+xml,text/html' http://localhost:8000/ to simulate an XHTML-savvy browser, and curl -H 'Accept: text/html' http://localhost:8000/ to simulate the opposite case. If you want to see the response headers, use the -D <filename> and inspect the given file name after each cURL invocation.
1.4. Wrap-up
You've now learned about Python's WSGI and how to use it to implement a middleware service that you can plug into any WSGI server and application chain. You could easily chain this article's example middleware with middleware for caching or debugging. These all become components that let you quickly add well-tested features into your project regardless of what WSGI implementations you choose.
WSGI is a fairly young specification, but compatible servers, middleware, and utilities are emerging rapidly to completely revamp the Python Web frameworks landscape. The next time you have a major Web project to develop in Python, be sure to adopt WSGI by using existing WSGI components, and perhaps creating your own either for private use or for contribution back to your fellow Web developers.