2013-04-10

Password protect a PDF

The othe day, I wrote a CherryPy application that add a password to the PDF, and run it behind Apache2 with the mod_wsgi module.  The actual adding of password is done by pdftk console tool.

The entire CherryPy application is in one file main.py as below.

The index() method presents the UI to the web browser, asking the user to select the PDF and enter a password.

The upload() method shells out to the console and calls pdftk to add the password, and then returns the password-protected file back to the browser, letting the user to save the PDF to local storage (e.g. the user's hard disk).

The last part of the file deals with hooking it up with Apache2/mod_wsgi.  If main.py is run from the command line, then the CherryPy built-in HTTP server will start, otherwise, it assumes it is run behind Apache2 and a WSGI application is created.

# main.py
# requires cherrypy
# requires pdftk (command line tool)
# dkf 130401 creation

import os
import tempfile

import cherrypy

class PdfPass(object):
    def index(self):
        return """
        <html><body>
            <h2>Add password to PDF</h2>
            <form action="upload" method="post" enctype="multipart/form-data">
            <table>
            <tr>
            <td>Select PDF:</td>
            <td><input type="file" name="pdf" size="60"/></td>
            </tr>
            <tr>
            <td>Password:</td>
            <td><input type="password" name="pass1" value="" size="20" maxlength="40"/></td>
            </tr>
            <tr>
            <td>Password again:</td>
            <td><input type="password" name="pass2" value="" size="20" maxlength="40"/></td>
            </tr>
            <tr>
            <td colspan="2"><input type="submit" value="Add password"/></td>
            </tr>
            </table>
            </form>
        </body></html>
        """
    index.exposed = True

    def upload(self, pdf, pass1, pass2):
        if not pass1:
            return "Password cannot be emply!<br/>Please go back and correct."

        if pass1 != pass2:
            return "Passwords do not match!<br/>Please go back and correct."

        # read in the user uploaded pdf
        temp1 = tempfile.mktemp()
        with open(temp1, "wb") as f:
            f.write(pdf.file.read())
            f.close()

        # call pdftk to add password to pdf
        temp2 = tempfile.mktemp()
        os.system('pdftk %s output %s user_pw %s' % (temp1, temp2, pass1))
        with open(temp2, "rb") as f:
            data = f.read()
            f.close()

        # clean up temp files
        os.remove(temp1)
        os.remove(temp2)

        # deliver the password protected pdf to the user
        cherrypy.response.headers['Content-Type'] = "application/pdf"
        cherrypy.response.headers['Content-Disposition'] = 'attachment, filename="%s"' % pdf.filename
        return data
    upload.exposed = True

if __name__ == '__main__':
    cherrypy.quickstart(PdfPass())
else:
   # cherrypy.config.update({'environment': 'embedded'})
    application = cherrypy.Application(PdfPass(), script_name=None, config=None)


On the Apache side (Apache2 in Debian6), need to modify the /etc/apache2/sites-available/default, adding the following lines inside the <VirtualHost *.80> block.  Then the CherryPy application will be available at http://localhost/<url>.

# must run wsgi in daemon mode otherwise content-disposition may not work
WSGIDaemonProcess cherrypy processes=2 threads=15 display-name=%{GROUP}
WSGIProcessGroup cherrypy
WSGIScriptAlias /<url> /full/path/to/main.py
<Directory /full/path/to>
    Order allow,deny
    allow from all
</Directory>


-End-