PDFBox Splitter for Creating Multiple PDFs from One PDF Document
To split a PDF document into multiple PDF files in Java, use the org.apache.pdfbox.multipdf.Splitter class from Apache PDFBox. The split() method accepts a loaded PDDocument and returns a List<PDDocument>, where each item is one output PDF.
In this tutorial, we shall learn how to split a PDF into separate page files, split a PDF at a fixed page interval, and split only a selected page range. The examples use PDFBox 2.x style loading with PDDocument.load(file). A PDFBox 3.x loading note is included after the main examples.
Useful PDFBox references for this topic are the PDFBox Splitter Javadocs, the PDFBox 2.0 command-line tools, and the PDFBox 3.0 migration guide.
PDFBox Dependency and Output Folder Setup for Splitting PDFs
If you are using Maven, add the PDFBox dependency to your project. The existing examples below are written for PDFBox 2.x APIs.
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.36</version>
</dependency>
The sample Java programs save the split PDFs under /home/tk/pdfs/. Create this folder before running the examples, or change the output path to a folder available on your system.
mkdir -p /home/tk/pdfs
How PDFBox Splitter Decides the PDF Page Groups
The Splitter class gives you three important controls for common PDF split operations.
splitter.split(document)splits the loaded source PDF and returns the generated PDF documents.splitter.setSplitAtPage(n)sets the number of pages in each output PDF. The default value is1, so every page becomes a separate PDF.splitter.setStartPage(n)andsplitter.setEndPage(n)limit the split operation to a 1-based page range.
For example, if the input PDF has 5 pages and you call setSplitAtPage(2), PDFBox creates 3 output files: pages 1-2, pages 3-4, and page 5.
- Split each page in PDF document to different PDF
- Split PDF at a specified interval
- Split only selected PDF pages
- PDFBox 3.x loading change for split examples
- Split PDFs from command line with PDFBox
PDFBox Example 1 – Split Every PDF Page into a Separate PDF File
In this example, we will take PDF with multiple pages, and split this PDF document to multiple PDFs where each resulting PDF document contains only one page from the source document.
SplitPDFExample.java
import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.Iterator;
public class SplitPDFExample {
public static void main(String[] args) throws IOException {
File file = new File("/home/tk/sample_pdf.pdf");
// load pdf file
PDDocument document = PDDocument.load(file);
// instantiating Splitter
Splitter splitter = new Splitter();
// split the pages of a PDF document
List<PDDocument> Pages = splitter.split(document);
// Creating an iterator
Iterator<PDDocument> iterator = Pages.listIterator();
// saving splits as pdf
int i = 0;
while(iterator.hasNext()) {
PDDocument pd = iterator.next();
// provide destination path to the PDF split
pd.save("/home/tk/pdfs/sample_part_"+ ++i +".pdf");
System.out.println("Saved /home/tk/pdfs/sample_part_"+ i +".pdf");
}
System.out.println("Provided PDF has been split into multiple.");
document.close();
}
}
Output
Saved /home/tk/pdfs/sample_part_1.pdf
Saved /home/tk/pdfs/sample_part_2.pdf
Saved /home/tk/pdfs/sample_part_3.pdf
Saved /home/tk/pdfs/sample_part_4.pdf
Saved /home/tk/pdfs/sample_part_5.pdf
Saved /home/tk/pdfs/sample_part_6.pdf
Provided PDF has been split into multiple.
The file names are generated with a counter: sample_part_1.pdf, sample_part_2.pdf, and so on. In production code, close every split PDDocument after saving it so that file handles and memory are released.
PDFBox Example 2 – Split PDF Pages in Groups with setSplitAtPage()
Following is a step by step guide to split a PDF document into multiple PDF documents which have been split at a particular interval in source document.
In the following program, splitter.setSplitAtPage(2) creates output PDFs with two pages each, except the last output PDF when the source document has an odd number of pages.
SplitPDFAtPageExample.java
import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.Iterator;
public class SplitPDFAtPageExample {
public static void main(String[] args) throws IOException {
File file = new File("/home/tk/sample_pdf.pdf");
// load pdf file
PDDocument document = PDDocument.load(file);
// instantiating Splitter
Splitter splitter = new Splitter();
splitter.setSplitAtPage(2);
// split the pages of a PDF document
List<PDDocument> Pages = splitter.split(document);
// Creating an iterator
Iterator<PDDocument> iterator = Pages.listIterator();
// saving splits as pdf
int i = 0;
while(iterator.hasNext()) {
PDDocument pd = iterator.next();
pd.save("/home/tk/pdfs/sample_part_"+ ++i +".pdf");
System.out.println("Saved /home/tk/pdfs/sample_part_"+ i +".pdf");
}
// close the document
document.close();
}
}
Output
Saved /home/tk/pdfs/sample_part_1.pdf
Saved /home/tk/pdfs/sample_part_2.pdf
Saved /home/tk/pdfs/sample_part_3.pdf
By default, splitAtPage is set to 1. Pass a value greater than zero. Passing 0 or a negative value results in an invalid split size.
PDFBox Example 3 – Split Only a Selected PDF Page Range
Use setStartPage() and setEndPage() when you do not want to split the entire source PDF. The start and end page values are 1-based. The following example takes pages 3 to 8 from the source PDF and creates output PDFs with two pages each.
SplitPDFPageRangeExample.java
import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;
import java.io.File;
import java.io.IOException;
import java.util.List;
public class SplitPDFPageRangeExample {
public static void main(String[] args) throws IOException {
File file = new File("/home/tk/sample_pdf.pdf");
try (PDDocument document = PDDocument.load(file)) {
Splitter splitter = new Splitter();
splitter.setStartPage(3);
splitter.setEndPage(8);
splitter.setSplitAtPage(2);
List<PDDocument> parts = splitter.split(document);
int partNumber = 1;
for (PDDocument part : parts) {
try (PDDocument output = part) {
String outputPath = "/home/tk/pdfs/range_part_" + partNumber + ".pdf";
output.save(outputPath);
System.out.println("Saved " + outputPath);
partNumber++;
}
}
}
}
}
Output
Saved /home/tk/pdfs/range_part_1.pdf
Saved /home/tk/pdfs/range_part_2.pdf
Saved /home/tk/pdfs/range_part_3.pdf
This is useful when you need only a chapter, invoice range, report section, or selected pages from a larger PDF. Always make sure that the end page is not greater than the number of pages in the source PDF.
PDFBox 3.x Loading Change for Split PDF Java Examples
In PDFBox 3.x, PDF loading moved from PDDocument.load(...) to the org.apache.pdfbox.Loader class. If you are using PDFBox 3.x, replace the loading line in the examples with Loader.loadPDF(file) and import org.apache.pdfbox.Loader.
import org.apache.pdfbox.Loader;
import org.apache.pdfbox.pdmodel.PDDocument;
File file = new File("/home/tk/sample_pdf.pdf");
try (PDDocument document = Loader.loadPDF(file)) {
// Use Splitter here
}
The Splitter usage remains the same for the examples shown here; the main difference is how the source PDF is loaded.
Split PDFs from Command Line with PDFBox PDFSplit
If you only need to split a PDF file and do not need custom Java logic, PDFBox also provides the PDFSplit command-line tool through the standalone PDFBox app JAR.
java -jar pdfbox-app-2.0.36.jar PDFSplit -split 2 --outputPrefix /home/tk/pdfs/sample_part /home/tk/sample_pdf.pdf
The -split 2 option means each generated PDF should contain two pages. You can also use -startPage and -endPage with PDFSplit when you want to split only part of the source document.
PDFBox Split PDF Troubleshooting Notes
- Output folder missing: create the destination folder before calling
save(), otherwise Java may throw a file path related exception. - Encrypted PDF: load the PDF with the correct password first.
Splitterworks on a successfully loadedPDDocument. - Large PDF files: close the source document and each split document after saving. For very large PDFs, test memory usage with realistic input files.
- Unexpected number of output files: check
setSplitAtPage(),setStartPage(), andsetEndPage(). Start and end page values are 1-based. - PDFBox 3.x compile error for
PDDocument.load: useLoader.loadPDF(file)instead.
PDFBox Split PDF FAQs
How do I split a PDF into multiple PDFs using PDFBox?
Load the source file into a PDDocument, create a Splitter, call splitter.split(document), and save each returned PDDocument as a separate PDF file.
How can I split every page of a PDF into a separate file in Java?
Do not set a custom split interval. The default splitAtPage value is 1, so PDFBox creates one output document for each page in the source PDF.
How do I split a PDF every 2 pages with Apache PDFBox?
Call splitter.setSplitAtPage(2) before calling splitter.split(document). Each output PDF will contain two pages, except the last one if the remaining page count is less than two.
Can PDFBox split only pages 5 to 10 of a PDF?
Yes. Use splitter.setStartPage(5) and splitter.setEndPage(10). You may also set setSplitAtPage() if you want the selected range divided into smaller output PDFs.
Why does PDDocument.load(file) not compile in PDFBox 3.x?
PDFBox 3.x removed the old PDDocument.load(...) loading methods. Use Loader.loadPDF(file) from org.apache.pdfbox.Loader when working with PDFBox 3.x.
QA Checklist for PDFBox Split PDF Java Tutorial
- Confirm that the tutorial clearly distinguishes PDFBox 2.x loading from PDFBox 3.x loading.
- Verify that every new code block uses a PrismJS-compatible language class or the
outputclass. - Check that
setSplitAtPage()is explained as the number of pages per output PDF, not a single page number. - Check that
setStartPage()andsetEndPage()are described as 1-based page range controls. - Ensure that the output folder exists before running the examples.
- Confirm that split
PDDocumentobjects are closed in any production-ready example.
PDFBox Split PDF Tutorial Summary
In this PDFBox Tutorial, we have learnt to split a PDF document into multiple PDFs using the Splitter class. We covered splitting every page into a separate file, splitting by a fixed number of pages, splitting a selected page range, using the PDFBox 3.x loader, and running a PDF split from the command line.
TutorialKart.com