Python Hypertext Preprocessor

Contents




Overview

PyHP allows use of the Python programming language in scripts similar to PHP. PyHP uses the Python 2.0 language to parse elements embedded in an HTML page.

Why another one?

I wanted a small, fast, hypertext preprocessor that used the world's greatest scripting language, Python, as its base. There was another program out there that did this,
PMZ, but I quickly found what I believed to be shortcomings in it.
Namespace pollution
My main problem with PMZ is that it does not execute its programs in a separate name space. If you define a variable name 'fields' in the global namespace then PMZ will crap out because there is a variable of that name that is of vital importance to the interpreter itself. I did not find this to be acceptable, and Python 2.0's InteractiveInterpreter class made it easy to get around this in PyHP.
Speed
PyHP was designed for use with Apache's mod_python. It will run as a just fine as a standalone program, but it has been designed to be accelerated under mod_python by automatically caching modules that have been imported and supporting the manual caching of variables as well
Header specification
PMZ has a fixed set of headers that cannot be modified by a PMZ script. PyHP allows script writers to specify what headers are attached to their documents, even to do so in the middle of the HTML file.



Language Sections

Code sections

Code sections occur between '<% ' and '%>' (Note that it is important the opening '<%' not be followed by an equal sign or a hyphen). They are parsed by the Python parser as a normal code section

Statement sections

Sections between '<%=' and '%>' are statement sections. A statement section contains exactly one Python statement which must have a return value. The string representation (given by the str() call in Python) of this statement is printed.

Comment sections

Sections between '<%--' and '%>' are ignored completely and do not appear in the generated HTML.

Text sections

Sections outside of the above sections are printed exactly onto the page.


Special Methods

There are only two objects in your program's namespace that are not in the normal Python namespace: The builtin module 'sys' which is needed since statement sections write directly to sys.stdout; and a special object 'pyhp' which is used to manipulate items and add headers.

pyhp.addheader(string)

takes a HTTP header string (of the form 'Key: value') and adds it to the header section. This can be done anywhere in the page.

pyhp.clearall()

clears all data output either to the stdout or to the headers up to this point, starting as if no code had been executed and no text had been processed. This is useful for generating error pages.

pyhp.include(file)

includes file (a file name, not a file object) as a PyHP file. This takes place in a fresh namespace and does not inherit variables from the current namespace. Text and headers generated by the PyHP file are inserted at the present place in the document body and headers respectively. This procedure is at least an order of magnitude slower than a python 'import' statement, so should be used sparingly

pyhp.staticvars

Python dictionary object for permanent storage in mod_python. If the 'PythonOption AllowStatic' directive is set in Apache then this dictionary is placed in permanent storage and saves data between instantiation of the code. This does take memory and thus should be used only if you know what you are doing (but can offer a significant speed gain if you do). (Note: If this is run as a standalone program or the AllowStatic option is not set then this variable is an empty dictionary to allow for code portability)

pyhp._req

Request object for the present request. Allows modification of Apache's state. Only available in mod_python and should only be used by mod_python experts.


Download

Download the latest version.
You will also need to go download mod_python if you wish to run the accelerated version.

Install

After downloading the program there are two options for installation. For the standalone version you simply make your scripts call the binary. On Unix this is done by starting your scripts with "#!/path/to/pyhp".

For the accelerated version you must first install mod_python and Apache, then as root install pyhp_module.py in /usr/lib/python2.0/site-packages (or where ever your Python site-packages directory is). After doing this add the following to your Apache configuration file (or to a .htaccess file in a directory where you have the AllowOverride FileInfo directive set)

AddHandler python-program .pyhp
PythonHandler pyhp_module
PythonOption AllowStatic True
The last line is only required if you wish to allow the use of the static variable array. This can be used to optimize code, but could contribute to memory bloat in the server program if users abuse it.


Sample code

The following is an example and the output it would give if the cgi variable 'testvar' were set to 'foo'

Code

#!/usr/bin/pyhp
<%-- The first line is ignored if it begins with a '#' character 
This section is also ignored because it occurs inside comment brackets
--%>

<HTML>
<BODY>
<P> This section is printed as part of the body.

<% 

import cgi
form = cgi.FieldStorage() # note that call to the OS environment still work

if form.has_key('testvar'):
 testvar = form['testvar'].value
else:
 testvar = 'Not Defined'

%>

<P>
If you passed in a variable named 'testvar' in your CGI it will be 
printed here: <%= testvar %>

<% 

#Note that using the internal pyhp method you can modify the headers in
#mid-page.

from Cookie import Cookie

c = Cookie()
c['cvar'] = 'This is a cookie value'
pyhp.addheader(c)

%>

</BODY>
</HTML>



Output

Set-Cookie:  cvar="This is a cookie value";
Content-type: text/html




<HTML>
<BODY>
<P> This section is printed as part of the body.



<P>
If you passed in a variable named 'testvar' in your CGI it will be 
printed here: foo



</BODY>
</HTML>