Get Coordinates, Location, and Displayed Size of Images in PDF Using PDFBox
In this tutorial, we shall learn how to get co-ordinates or location and size of images in pdf from all the pages using PDFStreamEngine. The example reads every page in a PDF, detects image drawing operations, and prints the image name, X/Y position, raw pixel size, and displayed size in PDF user space units.
This approach is useful when you need to inspect where images are placed in a PDF, compare the embedded image size with the displayed size, or build a PDF analysis tool that reports image positions page by page.
How PDFBox Finds Image Coordinates Inside a PDF Page
The class org.apache.pdfbox.contentstream.PDFStreamEngine handles and executes some of the operations in processing a PDF document by providing a callback interface.
To get co-ordinates or location and size of images in pdf, we shall extend this PDFStreamEngine class, intercept and implement processOperator( Operator operator, List<COSBase> operands) method.
COSBase is the base class that all objects in the PDF document will extend.
For each object in the PDF document, the above mentioned method processOperator() is called in PDFStreamEngine.processPage(page). For each of the object in PDF document, we shall check if the object is an image object and get its properties like (X,Y) co-ordinates and size.
In PDF content streams, images are commonly painted with the Do operator. In the example below, PDFBox checks the object referenced by that operator. If the object is a PDImageXObject, we read the image dimensions and the current transformation matrix. If the object is a PDFormXObject, the code calls showForm(form) so that images nested inside form XObjects are also processed.
PDFBox Dependency for the Image Location Example
The complete Java example below follows the Apache PDFBox 2.x API style. If you are creating a Maven project, include PDFBox in your project dependencies and use a version that matches your application.
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.30</version>
</dependency>
If you use PDFBox 3.x, review the PDFBox migration notes because document loading and a few supporting APIs differ from older 2.x examples. For PDFBox 3.x, PDF loading is typically done through Loader.loadPDF(…).
import org.apache.pdfbox.Loader;
PDDocument document = Loader.loadPDF(new File(fileName));
Steps to Get Coordinates and Size of Images in PDF
Following is a step by step process to get co-ordinates or location and size of images in PDF.
1. Extend PDFStreamEngine for PDF Image Processing
Create a Java Class and extend it with PDFStreamEngine.
public class GetImageLocationsAndSize extends PDFStreamEngine
The custom class becomes a content stream processor. It receives PDF drawing operations while each page is processed.
2. Call processPage() for Every PDF Page
For each of the pages in PDF document, call the method processPage(page).
for( PDPage page : document.getPages() ) {
pageNum++;
printer.processPage(page);
}
This is where PDFBox walks through the page content stream and calls the operators registered in the stream engine.
3. Override processOperator() to Intercept Image Drawing
For each of the object in PDF page, processOperator is called in processPage(). We shall override processOperator().
@Override
protected void processOperator( Operator operator, List operands) throws IOException{
. . .
}
The image detection logic is placed inside this method. The code checks whether the current operator name is Do, which is used to draw external objects such as images and form XObjects.
4. Check Whether the PDF XObject Is an Image
Check if the object that has been sent to processOperator() is an image object.
if( xobject instanceof PDImageXObject){
. . .
}
When the XObject is a PDImageXObject, PDFBox can return the embedded image’s raw pixel width and height. The displayed width and height are taken from the current transformation matrix, not from the raw image pixels.
5. Print PDF Image X/Y Location and Displayed Size
If the object is an image object, print the locations and size of the image.
The X and Y values printed by this example come from ctmNew.getTranslateX() and ctmNew.getTranslateY(). The displayed size comes from ctmNew.getScalingFactorX() and ctmNew.getScalingFactorY(). These values are in PDF user space units, not pixels.
Example 1 – Get location and size of images in PDF
In this example, we will take a PDF containing images, and get the position/location and size of the image.
GetImageLocationsAndSize.java
import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.util.Matrix;
import org.apache.pdfbox.contentstream.operator.DrawObject;
import org.apache.pdfbox.contentstream.operator.Operator;
import org.apache.pdfbox.contentstream.PDFStreamEngine;
import java.io.File;
import java.io.IOException;
import java.util.List;
import org.apache.pdfbox.contentstream.operator.state.Concatenate;
import org.apache.pdfbox.contentstream.operator.state.Restore;
import org.apache.pdfbox.contentstream.operator.state.Save;
import org.apache.pdfbox.contentstream.operator.state.SetGraphicsStateParameters;
import org.apache.pdfbox.contentstream.operator.state.SetMatrix;
/**
* This is an example on how to get the x/y coordinates of image location and size of image.
*/
public class GetImageLocationsAndSize extends PDFStreamEngine
{
/**
* @throws IOException If there is an error loading text stripper properties.
*/
public GetImageLocationsAndSize() throws IOException
{
// preparing PDFStreamEngine
addOperator(new Concatenate());
addOperator(new DrawObject());
addOperator(new SetGraphicsStateParameters());
addOperator(new Save());
addOperator(new Restore());
addOperator(new SetMatrix());
}
/**
* @throws IOException If there is an error parsing the document.
*/
public static void main( String[] args ) throws IOException
{
PDDocument document = null;
String fileName = "apache.pdf";
try
{
document = PDDocument.load( new File(fileName) );
GetImageLocationsAndSize printer = new GetImageLocationsAndSize();
int pageNum = 0;
for( PDPage page : document.getPages() )
{
pageNum++;
System.out.println( "\n\nProcessing page: " + pageNum +"\n---------------------------------");
printer.processPage(page);
}
}
finally
{
if( document != null )
{
document.close();
}
}
}
/**
* @param operator The operation to perform.
* @param operands The list of arguments.
*
* @throws IOException If there is an error processing the operation.
*/
@Override
protected void processOperator( Operator operator, List<COSBase> operands) throws IOException
{
String operation = operator.getName();
if( "Do".equals(operation) )
{
COSName objectName = (COSName) operands.get( 0 );
// get the PDF object
PDXObject xobject = getResources().getXObject( objectName );
// check if the object is an image object
if( xobject instanceof PDImageXObject)
{
PDImageXObject image = (PDImageXObject)xobject;
int imageWidth = image.getWidth();
int imageHeight = image.getHeight();
System.out.println("\nImage [" + objectName.getName() + "]");
Matrix ctmNew = getGraphicsState().getCurrentTransformationMatrix();
float imageXScale = ctmNew.getScalingFactorX();
float imageYScale = ctmNew.getScalingFactorY();
// position of image in the pdf in terms of user space units
System.out.println("position in PDF = " + ctmNew.getTranslateX() + ", " + ctmNew.getTranslateY() + " in user space units");
// raw size in pixels
System.out.println("raw image size = " + imageWidth + ", " + imageHeight + " in pixels");
// displayed size in user space units
System.out.println("displayed size = " + imageXScale + ", " + imageYScale + " in user space units");
}
else if(xobject instanceof PDFormXObject)
{
PDFormXObject form = (PDFormXObject)xobject;
showForm(form);
}
}
else
{
super.processOperator( operator, operands);
}
}
}
Output
Processing page: 1
---------------------------------
Image [X0]
position in PDF = 36.506977, 695.3907 in user space units
raw image size = 429, 175 in pixels
displayed size = 214.69952, 87.58139 in user space units
Image [X1]
position in PDF = 36.506977, 617.8186 in user space units
raw image size = 300, 300 in pixels
displayed size = 75.06976, 75.06976 in user space units
Image [X2]
position in PDF = 36.506977, 138.37305 in user space units
raw image size = 600, 383 in pixels
displayed size = 496.96182, 317.29486 in user space units
Processing page: 2
---------------------------------
Image [X0]
position in PDF = 36.506977, 495.70514 in user space units
raw image size = 600, 383 in pixels
displayed size = 496.96182, 317.29486 in user space units
Image [X1]
position in PDF = 245.20093, 307.53027 in user space units
raw image size = 212, 146 in pixels
displayed size = 106.0986, 73.0679 in user space units
Processing page: 3
---------------------------------
Processing page: 4
---------------------------------
Download the pdf document here apache.pdf [icon name=”file-pdf-o” class=”” unprefixed_class=””] if you would like use the same PDF file. Else you may assign the fileName in the Java program with your PDF file path.
How to Read the PDFBox Image Coordinate Output
The line position in PDF gives the image placement in user space units. For an unrotated page with the default coordinate system, the origin is at the bottom-left of the page. Therefore, a higher Y value usually means the image is closer to the top of the page.
The line raw image size gives the embedded image dimensions in pixels. The line displayed size gives the size at which the image is drawn on the PDF page. These two values do not have to match because a PDF can scale an image while drawing it.
Raw Size vs Displayed Size
The size of image displayed in the pdf could be different from the actual size of original (or raw) image.
| Value printed by the program | Meaning |
| raw image size | Actual embedded image width and height in pixels, returned by image.getWidth() and image.getHeight(). |
| displayed size | Width and height used when drawing the image on the PDF page, taken from the current transformation matrix. |
| position in PDF | X and Y translation values that indicate where the image is placed in PDF user space. |
For example, a 600 × 383 pixel image may be displayed as 496.96182 × 317.29486 user space units. This means the image is scaled when it is painted on the page.
(X,Y) location of image in PDF
Left bottom corner of image is the (X,Y) location that we get from PDFBox tool.

PDF Coordinate Units, Page Rotation, and Crop Box Notes
PDF coordinates are not the same as screen pixels. PDFBox reports the image position and displayed size in PDF user space units. In many PDFs, one user space unit corresponds to one point, and 72 points equal one inch. However, page rotation, crop boxes, media boxes, and transformations can change how the coordinates appear when viewed in a PDF reader.
If the coordinates look different from what you see on screen, check the page rotation and page boxes. A PDF viewer may display the page after applying rotation or crop settings, while the content stream values are based on the PDF’s internal coordinate system and transformation matrices.
Why Images Inside Form XObjects Need showForm(form)
Some PDFs do not draw images directly on the page. Instead, they place images inside form XObjects. The example handles this case with the following branch:
else if(xobject instanceof PDFormXObject)
{
PDFormXObject form = (PDFormXObject)xobject;
showForm(form);
}
This tells PDFBox to process the form’s content stream as well. Without this step, the program may miss images that are nested inside forms, templates, or reusable page elements.
Common Issues When PDFBox Image Coordinates Look Wrong
- Comparing PDF units with pixels: Raw image size is in pixels, but displayed size and position are in PDF user space units.
- Ignoring page rotation: A rotated page can make the visual position in a viewer differ from the internal coordinate values.
- Missing nested images: Images inside PDFormXObject objects require processing the form content stream.
- Using only top-level resources: Walking only the page resources can find image objects, but it does not reliably tell where each image is drawn.
- Expecting top-left coordinates: Many PDF coordinate examples use a bottom-left origin, while screen and image tools often use a top-left origin.
QA Checklist for PDFBox Image Location Tutorial Review
- Confirm that the Java code is tested with a PDFBox 2.x dependency, or update loading syntax for PDFBox 3.x before compiling.
- Verify that the output reports both raw image pixel size and displayed PDF user space size.
- Test a PDF containing images inside a form XObject to confirm that showForm(form) is needed.
- Check one PDF with page rotation to explain any difference between viewer coordinates and printed coordinates.
- Confirm that the tutorial does not describe image coordinates as screen pixels.
Useful PDFBox References for Image Coordinates and PDF Processing
For additional API details, refer to the Apache PDFBox project, the PDFBox example class PrintImageLocations.java, and the PDFBox 3.0 migration guide if you are adapting the example for PDFBox 3.x.
FAQs on Getting Image Coordinates in PDF Using PDFBox
How do I get X and Y coordinates of an image in a PDF using PDFBox?
Extend PDFStreamEngine, process each page with processPage(page), intercept the Do operator in processOperator(), check whether the referenced XObject is a PDImageXObject, and read the current transformation matrix using getGraphicsState().getCurrentTransformationMatrix(). The translation values give the image position in PDF user space units.
How do I check the image size in a PDF with PDFBox?
Use image.getWidth() and image.getHeight() to get the raw embedded image size in pixels. Use the current transformation matrix scaling values to get the displayed width and height on the PDF page.
Why is the displayed image size different from the raw image size?
A PDF can draw an image at a different scale from its original pixel dimensions. The raw image size describes the embedded image file, while the displayed size describes how large the image is painted on the PDF page.
Why are PDFBox image coordinates not matching the position in my PDF viewer?
The PDF viewer may apply page rotation, crop box settings, zoom, or screen coordinate conversion. PDFBox reports values from the PDF content stream and transformation matrix, usually in PDF user space units.
Can PDFBox find images that are inside form XObjects?
Yes. When the XObject is a PDFormXObject, call showForm(form) so that PDFBox processes the form’s content stream. This helps detect images nested inside reusable form objects.
Conclusion
In this Apache PDFBox Tutorial, we have learnt to get co-ordinates or location and size of images in pdf document and also learnt what x and y coordinates mean for an image in a pdf.
TutorialKart.com