Links are not working while Merge PDF using PdfSharpCore

256 views Asked by At

I have the following code which merges two Pdfs using PdfSharpCore In the pdfs I have a couple of buttons with links. My problem is that after the merge of the pdf files, the links are not working anymore.

var pdfDocument = new PdfDocument();

var doc1 = PdfReader.Open(stream1, PdfDocumentOpenMode.Import);
var doc2 = PdfReader.Open(stream2, PdfDocumentOpenMode.Import);

foreach (var page in doc1.Pages)
{
    pdfDocument.AddPage(page);
}

foreach (var page in doc2.Pages)
{
    pdfDocument.AddPage(page);
}

var mergedPdf = new MemoryStream();

pdfDocument.Save(mergedPdf);
    
2

There are 2 answers

0
K J On

Currently this appears to be an Open Issue since before Nov 17, 2022

https://github.com/ststeiger/PdfSharpCore/issues/307

Problem When combining 1 or more PDF documents containing hyperlinks into a new PDF document the links are not clickable in the new document. Only links containing https:// are still clickable.

And looking back at other forum questions

https://forum.pdfsharp.net/viewtopic.php?f=2&t=4300&hilit=hyperlink

The answer is common to many applications that merge without hyperlink restructuring (not always an easy task)

The destination page of the hyperlinks must be updated when combining the documents.

The underlying issue is generic the links will have been specific to the structure of the documents. For example you sayy buttons but without any code showing how used, let us say they goto page 3 in each source document. On merge "at best" both buttons would go to new document page 3 not the original page 3 of either. So in simpler merges they are discarded. A good library may redirect the second button to new page which was old second page 3.

There are work arounds like export the links to a text editor or other correction by addition into new order then re-import to new PDF.

So for bookmarks there is PDFtk, and for hyperlinks it should preserve them when used for merge, but may need tweaking.

There is an interesting comparison with this table here https://tex.stackexchange.com/questions/497624/merging-multiple-pdf-files-without-breaking-hyperlinks

+-----------+---------------------------------------------+-----+------+------+------+
|  Software |                 Command                     | url | ref. | link | file |
+-----------+---------------------------------------------+-----+------+------+------+
|  convert  | convert a.pdf b.pdf tot.pdf                 | ✗   | ✗    | ✗    | ✗    |
|  pdfjam   | pdfjam a.pdf b.pdf -o tot.pdf               | ✗   | ✗    | ✗    | ✗    |
|  gs       | gs -sDEVICE=pdfwrite -o=tot.pdf a.pdf b.pdf | ✗   | ✗    | ✗    | ✗    |
|  pdfunite | pdfunite a.pdf b.pdf tot.pdf                | ✓   | ✗    | ✗    | ✓   |
|  pdftk    | pdftk a.pdf b.pdf cat output tot.pdf        | ✓   | ✓    | ✓   | ✓    |
|  pdfsam   | (it's a gui)                                | ✓   | ✓    | ✓   | ✓    |
| sejda.com | (it's a website)                            | ✓   | ✓    | ✓   | ✓    |
+-----------+---------------------------------------------+-----+------+------+------+

Using GS some may be preserved

If the NoView flag is set and -dPrinted=false is used, then the annotation will be dropped by Ghostscript [otherwise =true some links may be preserved]

0
Mugurel On

Apparently AddPage method accept a second parameter called annotationCopying with default value ShallowCopy which means will lose the links.

So basically the fix is to add a second param to the AddPage method:

pdfDocument.AddPage(page, AnnotationCopyingType.DeepCopy)