Editing Existing PDF Files in Java

news2024/11/23 11:17:42

Editing Existing PDF Files in Java

1. Overview

In this article, we’ll see how to edit the content of an existing PDF file in Java. First, we’ll just add new content. Then, we’ll focus on removing or replacing some pre-existing content.

2. Adding the iText7 Dependency

We’ll use the iText7 library to add content to the PDF file. Later on, we’ll use the pdfSweep add-on to remove or replace content.

Note that iText is licensed under AGPL, which might limit the distribution of a commercial application: iText License Model.

First, let’s add these dependencies to our pom.xml:

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itext7-core</artifactId>
    <version>7.2.3</version>
    <type>pom</type>
</dependency>
<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>cleanup</artifactId>
    <version>3.0.1</version>
</dependency>

3. File Handling

Let’s understand the steps for handling our PDF with iText7:

  • First, we open a PdfReader to read the content of the source file. This throws an IOException if an error occurs at any time while reading the file.
  • Then, we open a PdfWriter to the destination file. If this file doesn’t exist or can’t be created, a FileNotFoundException is thrown.
  • After that, we’ll open a PdfDocument which uses our PdfReader and PdfWriter.
  • Finally, closing the PdfDocument closes both the underlying PdfReader and PdfWriter.

Let’s write a main() method that runs our whole treatment. For the sake of simplicity, we’ll just rethrow any Exception that could occur:

public static void main(String[] args) throws IOException {
    PdfReader reader = new PdfReader("src/main/resources/baeldung.pdf");
    PdfWriter writer = new PdfWriter("src/main/resources/baeldung-modified.pdf");
    PdfDocument pdfDocument = new PdfDocument(reader, writer);
    addContentToDocument(pdfDocument);
    pdfDocument.close();
}

In the following section, we’ll complete step-by-step the addContentToDocument() method in order to fill our PDF with new content. The source document’s a PDF file that only contains the text “Hello Baeldung*“* on the top left. The destination file will be created by the program.

请添加图片描述

4. Adding Content to the File

We’ll now add various types of content to the file.

4.1. Adding a Form

We’ll start by adding a form to the file. Our form will be very simple and contain a unique field called name.

Furthermore, we need to tell iText where to place the field. In this case, we’ll put it at the following point: (35,400). The coordinates (0,0) refer to the bottom left of the document. Lastly, we’ll set the dimension of the field to 100×30:

PdfFormField personal = PdfFormField.createEmptyField(pdfDocument);
personal.setFieldName("information");
PdfTextFormField name = PdfFormField.createText(pdfDocument, new Rectangle(35, 400, 100, 30), "name", "");
personal.addKid(name);
PdfAcroForm.getAcroForm(pdfDocument, true)
    .addField(personal, pdfDocument.getFirstPage());

Additionally, we’ve explicitly specified iText to add the form to the first page of the document.

请添加图片描述

4.2. Adding a New Page

Let’s now have a look at how we can add a new page to the document. We’ll use the addNewPage() method.

This method can accept the index of the added page if we want to specify it. For instance, we can add a new page at the beginning of the document:

pdfDocument.addNewPage(1);

请添加图片描述

4.3. Adding an Annotation

We’ll now want to add an annotation to the document. Concretely, an annotation looks like a squared comic bubble.

We’ll add it on top of the form that’s now located on the second page of the document. Consequently, we’ll place it on the coordinates (40,435). Additionally, we’ll give it a simple name and content. These will only show up when hovering over the annotation:

PdfAnnotation ann = new PdfTextAnnotation(new Rectangle(40, 435, 0, 0)).setTitle(new PdfString("name"))
    .setContents("Your name");
pdfDocument.getPage(2)
    .addAnnotation(ann);

Here’s how the middle of our second page now looks:

请添加图片描述

请添加图片描述

4.4. Adding an Image

From now on, we’ll add layout elements to the page. In order to do this, we won’t be able to manipulate the PdfDocument directly anymore. We’ll rather create a Document from it and work with that. Moreover, we’ll need to close the Document in the end. Closing a Document automatically closes the base PdfDocument. So we could remove the part where we closed the PdfDocument earlier:

Document document = new Document(pdfDocument);
// add layout elements
document.close();

Now, to add the image, we’ll need to load it from its location. We’ll do this using the create() method of the ImageDataFactory class. This throws a MalformedURLException if the passed file URL can’t be parsed. In this example, we’ll use an image of Baeldung’s logo placed in the resources directory:

ImageData imageData = ImageDataFactory.create("src/main/resources/baeldung.png");

The next step will be to set the image’s properties in the file. We’ll set its size to 550×100. We’ll put it on the first page of our PDF, at the (10,50) coordinates. Let’s see the code to add the image:

Image image = new Image(imageData).scaleAbsolute(550,100)
    .setFixedPosition(1, 10, 50);
document.add(image);

The image is automatically rescaled to the given size. So here’s how it looks in the document:

请添加图片描述

请添加图片描述

4.5. Adding a Paragraph

The iText library brings some tools to add text to the file. The font can be parameterized on the pieces themselves, or directly on the Paragraph element.

For instance, let’s add the following sentence on top of the first page: This is a demo from Baeldung tutorials. We’ll set the font size of the beginning of this sentence to 16 and the global font size of Paragraph to 8:

Text title = new Text("This is a demo").setFontSize(16);
Text author = new Text("Baeldung tutorials.");
Paragraph p = new Paragraph().setFontSize(8)
    .add(title)
    .add(" from ")
    .add(author);
document.add(p);

请添加图片描述

4.6. Adding a Table

Last but not least, we can also add a table to the file. For example, we’ll define a double-entry table with two cells and two headers on top of them. We won’t specify any position. So it’ll be naturally added on top of the document, right after the Paragraph we just added:

Table table = new Table(UnitValue.createPercentArray(2));
table.addHeaderCell("#");
table.addHeaderCell("company");
table.addCell("name");
table.addCell("baeldung");
document.add(table);

Let’s see the beginning of the first page of the document now:

请添加图片描述

请添加图片描述

package org.example.pdf;

import com.itextpdf.forms.PdfAcroForm;
import com.itextpdf.forms.fields.PdfFormField;
import com.itextpdf.forms.fields.PdfTextFormField;
import com.itextpdf.io.image.ImageData;
import com.itextpdf.io.image.ImageDataFactory;
import com.itextpdf.kernel.geom.Rectangle;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfReader;
import com.itextpdf.kernel.pdf.PdfString;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.kernel.pdf.annot.PdfAnnotation;
import com.itextpdf.kernel.pdf.annot.PdfTextAnnotation;
import com.itextpdf.layout.Document;
import com.itextpdf.layout.element.Image;
import com.itextpdf.layout.element.Paragraph;
import com.itextpdf.layout.element.Table;
import com.itextpdf.layout.element.Text;
import com.itextpdf.layout.properties.UnitValue;

import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;

/**
 * @author: guanglai.zhou
 * @date: 2023/12/13 13:38
 */
public class EditPdfMain {

    public static void main(String[] args) throws IOException {
        File inFile = getFileFromResources("baeldung.pdf");
        PdfReader reader = new PdfReader(inFile);
        File outFile = getFileFromResources("baeldung-modified.pdf");
        PdfWriter writer = new PdfWriter(outFile);
        PdfDocument pdfDocument = new PdfDocument(reader, writer);
        addContentToDocument(pdfDocument);
        pdfDocument.close();
        System.out.println(inFile.getPath());
        System.out.println(outFile.getPath());
    }

    /**
     * 直接读取 src/main/resources目录下的文件
     */
    public static File getFileFromResources(String fileName) {
        String path = OpenPdfMain.class.getResource("/" + fileName).getPath();
        return new File(path);
    }

    private static void addContentToDocument(PdfDocument pdfDocument) throws MalformedURLException {
        // 4.1. add form 添加一个表格到PDF文档当中
        PdfFormField personal = PdfFormField.createEmptyField(pdfDocument);
        personal.setFieldName("information");
        /**
         * 坐标0,0指的是左边底部
         */
        PdfTextFormField name = PdfFormField.createText(pdfDocument,
                new Rectangle(35, 400, 100, 30), "name", "");
        personal.addKid(name);
        /**
         * 明确指定将表格添加到第一页
         */
        PdfAcroForm.getAcroForm(pdfDocument, true)
                .addField(personal, pdfDocument.getFirstPage());

        // 4.2. add new page 通过index指定添加到文档的开头
        pdfDocument.addNewPage(1);

        // 4.3. add annotation 在第一步新增表格的上面添加注释
        PdfAnnotation ann = new PdfTextAnnotation(new Rectangle(40, 435, 0, 0))
                .setTitle(new PdfString("name"))
                .setContents("Your name");
        pdfDocument.getPage(2).addAnnotation(ann);

        /**
         * From now on, we’ll add layout elements to the page.
         * In order to do this, we won’t be able to manipulate the *PdfDocument* directly anymore.
         * We’ll rather create a *Document* from it and work with that.
         * Moreover, we’ll need to close the *Document* in the end.
         * **Closing a Document automatically closes the base PdfDocument.**
         * So we could remove the part where we closed the *PdfDocument* earlier:
         */
        // create document form pdf document
        Document document = new Document(pdfDocument);

        // 4.4. add an image 设置图片大小为 550*100
        final File file = getFileFromResources("baeldung.png");
        ImageData imageData = ImageDataFactory.create(file.toURI().toURL());
        Image image = new Image(imageData).scaleAbsolute(550, 100)
                // 存放在第一页 坐标为 10*50
                .setFixedPosition(1, 10, 50);
        document.add(image);

        // 4.5. add a paragraph 添加文件片段
        Text title = new Text("This is a demo").setFontSize(16);
        Text author = new Text("Baeldung tutorials.");
        Paragraph p = new Paragraph().setFontSize(8)
                .add(title)
                .add(" from ")
                .add(author);
        document.add(p);

        // 4.6. add a table 添加表格
        Table table = new Table(UnitValue.createPercentArray(2));
        table.addHeaderCell("#");
        table.addHeaderCell("company");
        table.addCell("name");
        table.addCell("baeldung");
        document.add(table);

        // close the document
        // this automatically closes the pdfDocument, which then closes automatically the pdfReader and pdfWriter
        document.close();
    }

}

5. Removing Content From the File

Let’s now see how we can remove content from the PDF file. To keep things simple, we’ll write another main() method.

Our source PDF file will be the baeldung-modified.pdf file and the destination will be a new baeldung-cleaned.pdf file. We’ll work directly on the PdfDocument object. From now on, we’ll use iText’s pdfSweep add-on.

5.1. Removing Text From the File

To remove a given text from the file, we’ll need to define a cleanup strategy. In this example, the strategy will simply be to find all text matching Baeldung. The last step is to call the autoSweepCleanUp() static method of PdfCleaner. This method will create a custom PdfCleanUpTool which will throw an IOException if any error happens during file handling:

CompositeCleanupStrategy strategy = new CompositeCleanupStrategy();
strategy.add(new RegexBasedCleanupStrategy("Baeldung"));
PdfCleaner.autoSweepCleanUp(pdfDocument, strategy);

As we can see, the occurrences of the Baeldung word in the source file are overlayed with a black rectangle in the result file. This behavior is suitable, for instance, for data anonymization:

请添加图片描述

5.2. Removing Other Content From the File

Unfortunately, it’s very difficult to detect any non-text content in the file. However, pdfSweep offers the possibility to erase the content of a portion of the file. Thus, if we know where the content we want to remove is located, we’ll be able to take advantage of this possibility.

As an example, we’ll erase the content of the rectangle of size 100×35 located at (35,400) on the second page. This means we’ll get rid of all the content of the form and the annotation. Furthermore, we’ll erase the rectangle of size 90×70 located at (10,50) of the first page. This basically removes the B from Baeldung’s logo. Using the PdfCleanUpTool class, here’s the code to do all that:

List<PdfCleanUpLocation> cleanUpLocations = Arrays.asList(new PdfCleanUpLocation(1, new Rectangle(10, 50, 90,70)), new PdfCleanUpLocation(2, new Rectangle(35, 400, 100, 35)));
PdfCleanUpTool cleaner = new PdfCleanUpTool(pdfDocument, cleanUpLocations, new CleanUpProperties());
cleaner.cleanUp();

We can now see the following image in baeldung-cleaned.pdf:

请添加图片描述

请添加图片描述

6. Replacing Content in the File

In this section, we’ll do the same work as earlier, except that we’ll replace the former text with a new text instead of only erasing it.

For more clarity, we’ll use a new main() method again. Our source file will be the baeldung-modified.pdf file. Our destination file will be a new baeldung-fixed.pdf file.

Earlier we saw that the removed text was overlayed with a black background. However, this color is configurable. As we know the background of the text is white in our file, we’ll force the overlay to be white. The beginning of the treatment will be similar to what we did earlier, except that we’ll search for the text Baeldung tutorials.

However, after calling autoSweepCleanUp(), we’ll query the strategy to get the location of the removed code. We’ll then instantiate a PdfCanvas which will contain the replacement text HIDDEN. Additionally, we’ll remove the top margin to have it a bit better aligned with the original text. The default alignment is indeed not so good. Let’s look at the resulting code:

CompositeCleanupStrategy strategy = new CompositeCleanupStrategy();
strategy.add(new RegexBasedCleanupStrategy("Baeldung").setRedactionColor(ColorConstants.WHITE));
PdfCleaner.autoSweepCleanUp(pdfDocument, strategy);

for (IPdfTextLocation location : strategy.getResultantLocations()) {
    PdfPage page = pdfDocument.getPage(location.getPageNumber() + 1);
    PdfCanvas pdfCanvas = new PdfCanvas(page.newContentStreamAfter(), page.getResources(), page.getDocument());
    Canvas canvas = new Canvas(pdfCanvas, location.getRectangle());
    canvas.add(new Paragraph("HIDDEN").setFontSize(8).setMarginTop(0f));
}

And we can have a look at the file:

请添加图片描述

7. Conclusion

In this tutorial, we’ve seen how to edit the content of a PDF file. We’ve seen that we can add new content, remove existing content, and even replace text in the original file with a new one.

As always, the code for this article can be found over on GitHub.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1307085.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

基于Java SSM框架实现大学生校园兼职系统项目【项目源码+论文说明】计算机毕业设计

基于java的SSM框架实现大学生兼职系统演示 摘要 随着科学技术的飞速发展&#xff0c;社会的方方面面、各行各业都在努力与现代的先进技术接轨&#xff0c;通过科技手段来提高自身的优势&#xff0c;大学生校园兼职系统当然也不能排除在外。大学生校园兼职系统是以实际运用为开…

低压无功补偿在分布式光伏现场中的应用

摘要&#xff1a;分布式光伏电站由于建设时间短、技术成熟、收益明显而发展迅速&#xff0c;但光伏并网引起用户功率因数异常的问题也逐渐凸显。针对分布式光伏电站接入配电网后功率因数降低的问题&#xff0c;本文分析了低压无功补偿装置补偿失效的原因&#xff0c;并提出了一…

LeetCode(57)合并两个有序链表【链表】【简单】

目录 1.题目2.答案3.提交结果截图 链接&#xff1a; 合并两个有序链表 1.题目 将两个升序链表合并为一个新的 升序 链表并返回。新链表是通过拼接给定的两个链表的所有节点组成的。 示例 1&#xff1a; 输入&#xff1a;l1 [1,2,4], l2 [1,3,4] 输出&#xff1a;[1,1,2,3,4…

centos8stream 升级 sqlite3 ,解决 SQLite 3.27 or later is required (found 3.26.0).

服务器环境是centos8stream, 默认的sqlite是 3.26 &#xff0c;因此&#xff0c;需要升级。 sqlite官网&#xff1a;SQLite Download Page 1.从官网下载最新源码包 cd /opt/ wget https://www.sqlite.org/2023/sqlite-autoconf-3440200.tar.gz tar xvf sqlite-autoconf-344020…

Python内置类属性`__cmp__`属性的使用教程

概要 在Python中&#xff0c;__cmp__属性是一个特殊的方法&#xff0c;用于自定义类的实例之间的比较方式。深入了解和熟练运用这一特性&#xff0c;可以使自定义类更加灵活和强大。本教程将详细介绍__cmp__的基本概念、高级用法以及一些注意事项&#xff0c;通过丰富的示例代…

python gdal nc数据转tif 包括如何获取变量及变量属性

文章目录 1 gdal nc转换为tif2 利用netCDF4获取变量及变量属性3 进行gregorian时间类型的转换4 总结 1 gdal nc转换为tif 地理变换是从图像坐标空间&#xff08;行、列&#xff09;&#xff08;也称为&#xff08;像素、线&#xff09;到地理参考坐标空间&#xff08;投影或地…

21.Java程序设计-基于Springboot的校园新闻发布管理系统的设计与实现

摘要&#xff1a; 随着信息时代的发展&#xff0c;校园管理和信息传递对高效的数字化解决方案提出了更高的需求。本研究旨在设计和实现一个基于Spring Boot的校园新闻发布管理系统&#xff0c;以满足学校管理和用户信息获取的日益增长的需求。该系统具备新闻发布、浏览、评论等…

【rabbitMQ】声明队列和交换机

上一篇&#xff1a;springboot整合rabbitMQ模拟简单收发消息 https://blog.csdn.net/m0_67930426/article/details/134904766?spm1001.2014.3001.5501 相关配置环境参考上篇 springAMQP提供了几个类用来声明声明队列&#xff0c;交换机及其绑定关系 声明队列&#xff0c;…

云原生之深入解析Kubernetes的架构及特性

一、kubernetes 架构 从宏观上来看 kubernetes 的整体架构&#xff0c;包括 Master、Node 以及 Etcd。Master 即主节点&#xff0c;负责控制整个 kubernetes 集群&#xff0c;它包括 Api Server、Scheduler、Controller 等组成部分。它们都需要和 Etcd 进行交互以存储数据&…

vs2022番茄助手安装

资源获取&#xff1a; 链接&#xff1a;https://pan.baidu.com/s/1FphMGL692I_JfLW_vqGPYw 提取码&#xff1a;zkw4 安装步骤 1.确保旧版番茄助手插件完全卸载。 2.下载附件“VA_X_Setup2440_0.exe”安装文件&#xff0c;双击安装&#xff0c;Win10以上系统需要【右键-属性】…

推荐一款好用的包含表格识别的OCR网站

在当今数字化的时代&#xff0c;文字和表格识别已经成为了许多行业的关键技术。无论是处理大量的纸质文档&#xff0c;还是从网络上收集数据&#xff0c;OCR&#xff08;光学字符识别&#xff09;技术都扮演着重要的角色。然而&#xff0c;对于许多用户来说&#xff0c;OCR软件…

SpringCloud微服务 【实用篇】| Docker启示录

目录 一&#xff1a;Docker启示录 1. Docker启示录 2. Docker和虚拟机的区别 3. Docker架构 4. Centos7安装Docker 4.1. 卸载 4.2. 安装docker 4.3. 启动docker 4.4. 配置镜像加速 前些天突然发现了一个巨牛的人工智能学习网站&#xff0c;通俗易懂&#xff0c;风趣幽…

HarmonyOS4.0从零开始的开发教程11Video组件的使用

HarmonyOS&#xff08;九&#xff09;Video组件的使用 概述 在手机、平板或是智慧屏这些终端设备上&#xff0c;媒体功能可以算作是我们最常用的场景之一。无论是实现音频的播放、录制、采集&#xff0c;还是视频的播放、切换、循环&#xff0c;亦或是相机的预览、拍照等功能…

06_W5500_DHCP

1.DHCP协议介绍&#xff1a; DHCP&#xff08;Dynamic Host Configuration Protocol&#xff09;是一种用于自动分配IP地址和其他网络配置信息的协议。它允许网络中的设备&#xff08;如计算机、手机、打印机等&#xff09;在连接到网络时自动获取IP地址、子网掩码、默认网关、…

数据入表 | 详解数据资产会计核算与企业应对

从2015年《促进大数据发展行动纲要》到2022年《数据20条》到2023年8月份出台了《企业数据资源相关会计处理暂行规定》&#xff0c;可见国家层面对数据的重视和探索如何进一步挖掘数据价值&#xff0c;发挥数据的应用潜力。一石激起千层浪&#xff0c;面对如此重要的规定&#x…

C++1114新标准——统一初始化(Uniform Initialization)、Initializer_list(初始化列表)、explicit

系列文章目录 C11&14新标准——Variadic templates&#xff08;数量不定的模板参数&#xff09; C11&14新标准——Uniform Initialization&#xff08;统一初始化&#xff09;、Initializer_list&#xff08;初始化列表&#xff09;、explicit 文章目录 系列文章目录1…

python+pytest接口自动化(12)-自动化用例编写思路 (使用pytest编写一个测试脚本)

经过之前的学习铺垫&#xff0c;我们尝试着利用pytest框架编写一条接口自动化测试用例&#xff0c;来厘清接口自动化用例编写的思路。 我们在百度搜索天气查询&#xff0c;会出现如下图所示结果&#xff1a; 接下来&#xff0c;我们以该天气查询接口为例&#xff0c;编写接口测…

【Java 基础】32 定时调度

文章目录 Timer 类创建 Timer注意事项 ScheduledExecutorService 接口创建 ScheduledExecutorService注意事项 选择合适的定时调度方式Timer 的适用场景ScheduledExecutorService 的适用场景 总结 在软件开发中&#xff0c;定时任务是一种常见的需求&#xff0c;用于周期性地执…

了解振弦采集仪:工程质量控制的得力助手

了解振弦采集仪&#xff1a;工程质量控制的得力助手 振弦采集仪是一种专门用于工程质量控制的仪器设备&#xff0c;它可以帮助工程师监测和评估结构物的振动性能。它的工作原理是通过将传感器固定在结构物上的振弦上&#xff0c;然后测量振弦的振动频率、振动幅度等参数&#…

GPTs prompts灵感库:创意无限,专业级创作指南,打造吸睛之作的秘诀

GPTs prompts灵感库&#xff1a;创意无限&#xff0c;专业级创作指南&#xff0c;打造吸睛之作的秘诀 优质prompt展示 1.1 极简翻译 中英文转换 你是一个极简翻译工具&#xff0c;请在对话中遵循以下规则&#xff1a; - Prohibit repeating or paraphrasing any user instru…