Skip to content

[Bug]: Additional encoding or decoding of 'file' parameter of viewer.html breaks certain URLs to PDF files #20137

@NicoMajor

Description

@NicoMajor

Dear pdf.js team, dear @calixteman,

the following code fragment in web/app.js causes issues with certain URLs to PDF files (explanation and examples below).

https://github.com/mozilla/pdf.js/blob/e5922f2e72703d8739357574106f3b37cb976987/web/app.js#L743C5-L747C8

// in "async run" in "app.js"
  try {
    file = new URL(decodeURIComponent(file)).href;
  } catch {
    file = encodeURIComponent(file).replaceAll("%2F", "/");
  }

If I see it correctly, this was introduced recently with the intention to support the # character in PDF file names.

Problem 1 - relative URLs with query string parameters
Lets assume the following relative URL to a PDF file.
/filemanager/handler.ashx?pdfFileId=1&compression=1
It is provided to viewer.html like this:
http://localhost/pdfjs/viewer.html?file=%2Ffilemanager%2Fhandler.ashx%3FpdfFileId%3D1%26compression%3D1
It will fail inside the try-block, so the line in the catch-block will apply. At this point the variable file contains the correct URL to the PDF file
/filemanager/handler.ashx?pdfFileId=1&compression=1,
however it then gets encoded and the slash characters are restored, but the URL is broken because several other significant characters in it remain encoded:
/filemanager/handler.ashx%3FpdfFileId%3D1%26compression%3D1 (broken)

Problem 2 - absolute URLs that themselves contain another layer of parameters with uri-encoded values
Lets assume the following absolute URL to a PDF file. In this example there is a server-side handler that expects kind of an internal path to a PDF file. Since this parameters value contains a "/" it needs to be properly encoded:
http://localhost/filemanager/handler.ashx?pdfFilePath=project%2Afoo.pdf
It is provided to viewer.html like this:
http://localhost/pdfjs/viewer.html?file=http%3A%2F%2Flocalhost%2Ffilemanager%2Fhandler.ashx%3FpdfFilePath%3Dproject%252Ffoo.pdf
This PDF URL can be processed successfully in the try-block, so the code in there applies. Initially the variable file contains the correct URL to retrieve the PDF file:
http://localhost/filemanager/handler.ashx?pdfFilePath=project%2Ffoo.pdf
However, the decodeURIComponent further decodes the value of the pdfFilePath-Parameter:
http://localhost/filemanager/handler.ashx?pdfFilePath=project/foo.pdf
It might still work, because browsers and servers try to fix things for us, but technically it is not correct to have the unencoded "/" there.

Final Thoughts

In my experience decoding and encoding of values should only happen at the very moments of parsing (decoding) or constructing (encoding) URLs, HTML, etc. Doing this "somewhere in the middle" usually is a very tricky thing and causes trouble most of the time.

Thank you for pdf.js and your effort!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions