Building a PDF Splitter Application

Introduction

I recently had the need to take a couple pages out of a PDF and save it to a new
PDF. This is a fairly simple task but every time I do it, it takes some time
to figure out the right command line parameters to make it work. In addition, my
co-workers wanted similar functionality and since they are not comfortable on the
command line, I wanted to build a small graphical front end for this task.

One solution is to use Gooey which is a really good option that I cover
in my prior article. However, I wanted to try out another library
and decided to give appJar a try. This article will walk through an example
of using appJar to create a GUI that allows a user to select a PDF, strip
out one or more pages and save it to a new file. This approach is simple, useful
and shows how to integrate a GUI into other python applications you create.

The State of GUI’s in Python

One of the most common questions on the python subreddit is something
along the lines of “What GUI should I use?” There are no shortage of options but
there’s a pretty steep learning curve for many of them. In addition, some work
to varying degrees on different platforms and many have been dormant for quite a while.
It is not an easy question to answer.

From a high level, the big GUI categories are:

  • Qt
  • WxWindows
  • Tkinter
  • Custom libraries (Kivy, Toga, etc)
  • Web technology based solutions (HTML, Chrome-based, etc)

In addition to this ecosystem, there are several types of wrapper and helper
apps to make development simpler. For example, Gooey is a nice way to leverage

argparse

to build a WxPython GUI for free. I have had a lot of success using this
approach to enable end users to interact with my python scripts. I highly recommend
it, especially since wxWindows will now work on python 3.

The downside to Gooey is that there is limited ability to construct an application
outside of the “Gooey way.” I wanted to see what else was out there that met the
following requirements:

  • Is simple to use for a quick and dirty script
  • Provides more interaction options than a typical command line approach
  • Works and looks decent on Windows
  • Is easy to install
  • Is actively maintained
  • Works on python 3
  • Runs quickly
  • Cross-platform on Linux is a plus

It turns out the appJar fits my criteria pretty well.

What is appJar

appJar was developed by an educator, who wanted a simpler GUI creation process for
his students. The application provides a wrapper around Tkinter (which ships by default with python)
and takes away a lot of the challenging boilerplate of creating an application.

The application is under active development. In fact, a new release was made
as I pulled this article together. The documentation is extensive and has
pretty good examples. It only took me a couple of hours of playing around with the
code to get a useful application up and running. I suspect I will use this
final application on a frequent basis when I need to pull select pages out of
a pdf document. I may also expand it to allow concatenation of multiple documents
into a new one.

Before I go much further, I want to address Tkinter. I know that Tkinter has a really
bad reputation for not looking very modern. However, the newer ttk themes do look
much better and I think that the final app looks pretty decent on Windows. On linux,
it’s not a work of art, but it does work. At the end of the day, this blog is
about helping you create solutions that are quick and powerful and get the job done.
If you want a really polished GUI that looks native on your OS, you may need to
investigate some of the more full featured options. If you want to get something
done quickly, that works; then appJar is worth considering.

In order to give you a sense of how it looks, here is the final app running
on Windows:

Vista Pic

It’s pretty good looking in my opinion.

Solving the Problem

The goal of this program is to make it quick and easy to take a subset of pages
out of a PDF file and save it into a new file. There are many programs that
can do this in Windows but I have found that many of the “free” ones have ads or
other bloated components. The command line works but sometimes a GUI is much simpler –
especially when navigating lots of file paths or trying to explain to less technical users.

In order to do the actual PDF manipulation, I’m using the pypdf2 library. The python pdf toolkit
ecosystem is kind of confusing but this library seems to have been around
a long time and more recently has seen an uptick of activity on github. The
other nice aspect is that PyPDF2 is covered in Automate The Boring Stuff so
there is a body of additional examples out there.

Here’s the start of a simple script that has a hardcoded input, output and page range.

from PyPDF2 import PdfFileWriter, PdfFileReader

infile = "Input.pdf"
outfile = "Output.pdf"

page_range = "1-2,6"

Next, we instantiate the
PdfFileWriter

and
PdfFileReader

objects
and create the actual Output.pdf file:

output = PdfFileWriter()
input_pdf = PdfFileReader(open(infile, "rb"))
output_file = open(outfile, "wb")

The most complicated aspect of the code is splitting up the
page_range

into
a sequential python list of pages to extract. Stack Overflow to the rescue!

page_ranges = (x.split("-") for x in page_range.split(","))
range_list = [i for r in page_ranges for i in range(int(r[0]), int(r[-1]) + 1)]

The final step is to copy the page from the input and save to the output:

for p in range_list:
    # Subtract 1 to deal with 0 index
    output.addPage(input_pdf.getPage(p - 1))
output.write(output_file)

That is all pretty simple and is yet another example of how powerful python can be
when it comes to solving real world problems. The challenge is that this approach
is not very useful when you want to let other people interact with it.

Building the appJar GUI

Now we can walk through integrating that code snippet into a GUI that will:

  • Allow user to select a PDf file using a standard file explorer GUI
  • Select an output directory and file name
  • Type in a custom range to extract pages
  • Have some error checking to make sure users enter the right information

The first step is to install appJar with
pip install appjar

.

The actual coding starts with importing all the components we need:

from appJar import gui
from PyPDF2 import PdfFileWriter, PdfFileReader
from pathlib import Path

Next, we can build up the basic GUI app:

# Create the GUI Window
app = gui("PDF Splitter", useTtk=True)
app.setTtkTheme("default")
app.setSize(500, 200)

The first 3 lines set up the basic structure of the app. I have decided to set

useTtk=True

because the app looks a little better when this is enabled.
The downsides are that Ttj is still in beta but for this simple app,
it works well for me.

I also chose to set the theme to default in this article. On a Windows system,
I set it to ‘vista’ which looks better in my opinion.

If you want to see all the themes available on a system use
app.getTtkThemes()

and experiment with those values. Here is a summary of how the different themes look
on Windows and Ubuntu.

Example Themes

Some of the distinctions are subtle so feel free to experiment and see what you prefer.

The next step is to add the labels and data entry widgets:

# Add the interactive components
app.addLabel("Choose Source PDF File")
app.addFileEntry("Input_File")

app.addLabel("Select Output Directory")
app.addDirectoryEntry("Output_Directory")

app.addLabel("Output file name")
app.addEntry("Output_name")

app.addLabel("Page Ranges: 1,3,4-10")
app.addEntry("Page_Ranges")

For this application, I chose to explicitly call out the Label, then the Entry.
appJar also support a combined widget called
LabelEntry

which puts everything
on one line. In my experience, the choice comes down to ascetics so play around
with the options and see which ones look good in your application.

The most important thing to remember at this point is that the text enclosed
in the
Entry

variables will be used to get the actual value entered.

The next step is to add the buttons. This code will add a “Process” and “Quit”
button. When either button is pressed, it will call the
press

function:

# link the buttons to the function called press
app.addButtons(["Process", "Quit"], press)

Finally, make the application go:

# start the GUI
app.go()

This basic structure accomplishes most of the GUI work. Now, the program needs to read
in any input, validate it and execute the PDF splitting (similar to the example above).
The first function we need to define is
press.

This function will be called when
either of the buttons is pressed.

def press(button):
    if button == "Process":
        src_file = app.getEntry("Input_File")
        dest_dir = app.getEntry("Output_Directory")
        page_range = app.getEntry("Page_Ranges")
        out_file = app.getEntry("Output_name")
        errors, error_msg = validate_inputs(src_file, dest_dir, page_range, out_file)
        if errors:
            app.errorBox("Error", "n".join(error_msg), parent=None)
        else:
            split_pages(src_file, page_range, Path(dest_dir, out_file))
    else:
        app.stop()

This function takes one parameter,
button

which will be defined as either “Process”
or “Quit”. If the user selects quit, then
app.stop()

will shut down the app.

If the process button is clicked, then the input values are retrieved using

app.getEntry()

. Each value is stored and then validated by calling the

validate_inputs

function. If there are errors, we can display them using
a popup box –
app.errorBox

. If there are no errors, we can split
the file up using
split_pages

.

Let’s look at the
validate_inputs

function.

def validate_inputs(input_file, output_dir, range, file_name):
    errors = False
    error_msgs = []

    # Make sure a PDF is selected
    if Path(input_file).suffix.upper() != ".PDF":
        errors = True
        error_msgs.append("Please select a PDF input file")

    # Make sure a range is selected
    if len(range) < 1:
        errors = True
        error_msgs.append("Please enter a valid page range")

    # Check for a valid directory
    if not(Path(output_dir)).exists():
        errors = True
        error_msgs.append("Please Select a valid output directory")

    # Check for a file name
    if len(file_name) < 1:
        errors = True
        error_msgs.append("Please enter a file name")

    return(errors, error_msgs)

This function executes a couple of checks to make sure there is data in the fields
and that it is valid. I do not claim this will stop all errors but it does give
you an idea of how to check everything and how to collect errors in a list.

Now that all the data is collected and validated, we can call the split function
to process the input file and create an output file with a subset of the data.

def split_pages(input_file, page_range, out_file):
    output = PdfFileWriter()
    input_pdf = PdfFileReader(open(input_file, "rb"))
    output_file = open(out_file, "wb")

    # https://stackoverflow.com/questions/5704931/parse-string-of-integer-sets-with-intervals-to-list
    page_ranges = (x.split("-") for x in page_range.split(","))
    range_list = [i for r in page_ranges for i in range(int(r[0]), int(r[-1]) + 1)]

    for p in range_list:
        # Need to subtract 1 because pages are 0 indexed
        try:
            output.addPage(input_pdf.getPage(p - 1))
        except IndexError:
            # Alert the user and stop adding pages
            app.infoBox("Info", "Range exceeded number of pages in input.nFile will still be saved.")
            break
    output.write(output_file)

    if(app.questionBox("File Save", "Output PDF saved. Do you want to quit?")):
        app.stop()

This function introduces a couple of additional appJar concepts. First, the

app.InfoBox

is used to let the user know when they enter a range that includes more
pages than in the document. I have made the decision to just process through the end of the file
and let the user know.

Once that file is saved, the program uses the
app.questionBox

to ask the user
if they want to continue or not. If so, then we use
app.stop()

to gracefully exit.

The Complete Code

All of the code will be stored on github but here is the final solution:

from appJar import gui
from PyPDF2 import PdfFileWriter, PdfFileReader
from pathlib import Path

# Define all the functions needed to process the files


def split_pages(input_file, page_range, out_file):
    """ Take a pdf file and copy a range of pages into a new pdf file

    Args:
        input_file: The source PDF file
        page_range: A string containing a range of pages to copy: 1-3,4
        out_file: File name for the destination PDF
    """
    output = PdfFileWriter()
    input_pdf = PdfFileReader(open(input_file, "rb"))
    output_file = open(out_file, "wb")

    # https://stackoverflow.com/questions/5704931/parse-string-of-integer-sets-with-intervals-to-list
    page_ranges = (x.split("-") for x in page_range.split(","))
    range_list = [i for r in page_ranges for i in range(int(r[0]), int(r[-1]) + 1)]

    for p in range_list:
        # Need to subtract 1 because pages are 0 indexed
        try:
            output.addPage(input_pdf.getPage(p - 1))
        except IndexError:
            # Alert the user and stop adding pages
            app.infoBox("Info", "Range exceeded number of pages in input.nFile will still be saved.")
            break
    output.write(output_file)

    if(app.questionBox("File Save", "Output PDF saved. Do you want to quit?")):
        app.stop()


def validate_inputs(input_file, output_dir, range, file_name):
    """ Verify that the input values provided by the user are valid

    Args:
        input_file: The source PDF file
        output_dir: Directory to store the completed file
        range: File A string containing a range of pages to copy: 1-3,4
        file_name: Output name for the resulting PDF

    Returns:
        True if error and False otherwise
        List of error messages
    """
    errors = False
    error_msgs = []

    # Make sure a PDF is selected
    if Path(input_file).suffix.upper() != ".PDF":
        errors = True
        error_msgs.append("Please select a PDF input file")

    # Make sure a range is selected
    if len(range) < 1:
        errors = True
        error_msgs.append("Please enter a valid page range")

    # Check for a valid directory
    if not(Path(output_dir)).exists():
        errors = True
        error_msgs.append("Please Select a valid output directory")

    # Check for a file name
    if len(file_name) < 1:
        errors = True
        error_msgs.append("Please enter a file name")

    return(errors, error_msgs)


def press(button):
    """ Process a button press

    Args:
        button: The name of the button. Either Process of Quit
    """
    if button == "Process":
        src_file = app.getEntry("Input_File")
        dest_dir = app.getEntry("Output_Directory")
        page_range = app.getEntry("Page_Ranges")
        out_file = app.getEntry("Output_name")
        errors, error_msg = validate_inputs(src_file, dest_dir, page_range, out_file)
        if errors:
            app.errorBox("Error", "n".join(error_msg), parent=None)
        else:
            split_pages(src_file, page_range, Path(dest_dir, out_file))
    else:
        app.stop()

# Create the GUI Window
app = gui("PDF Splitter", useTtk=True)
app.setTtkTheme("default")
app.setSize(500, 200)

# Add the interactive components
app.addLabel("Choose Source PDF File")
app.addFileEntry("Input_File")

app.addLabel("Select Output Directory")
app.addDirectoryEntry("Output_Directory")

app.addLabel("Output file name")
app.addEntry("Output_name")

app.addLabel("Page Ranges: 1,3,4-10")
app.addEntry("Page_Ranges")

# link the buttons to the function called press
app.addButtons(["Process", "Quit"], press)

# start the GUI
app.go()

Summary

Experienced python users are not afraid of using the command
line to control their applications. However, there are many times when it is
useful to have a simple GUI on the front end of the application. In the python
world, there are many options for creating a GUI. This article has shown that it
is relatively simple to create a GUI using appJar that will run on multiple systems
and provide an intuitive way for users to interact with a python program. In
addition, appJar has many other features that can be incorporated in more complex applications.

I hope this example has given you some ideas that you can use for your own apps.
I also think this particular app is handy and hope a few people might find it
useful as well. It should also serve as a good starting point for other
PDF manipulation tools.


Source From: pbpython.com.
Original article title: Building a PDF Splitter Application.
This full article can be read at: Building a PDF Splitter Application.

Advertisement
directory-software-online-bussiness-script


Random Article You May Like

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*