Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 58 additions & 31 deletions Doc/library/base64.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,10 @@
This module provides functions for encoding binary data to printable
ASCII characters and decoding such encodings back to binary data.
This includes the :ref:`encodings specified in <base64-rfc-4648>`
:rfc:`4648` (Base64, Base32 and Base16)
and the non-standard :ref:`Base85 encodings <base64-base-85>`.
:rfc:`4648` (Base64, Base32 and Base16), the :ref:`Base85 encoding
<base64-base-85>` specified in `PDF 2.0
<https://pdfa.org/resource/iso-32000-2/>`_, and non-standard variants
of Base85 used elsewhere.

There are two interfaces provided by this module. The modern interface
supports encoding :term:`bytes-like objects <bytes-like object>` to ASCII
Expand Down Expand Up @@ -284,19 +286,28 @@ POST request.
Base85 Encodings
-----------------

Base85 encoding is not formally specified but rather a de facto standard,
thus different systems perform the encoding differently.
Base85 encoding is a family of algorithms which represent four bytes
using five ASCII characters. Originally implemented in the Unix
``btoa(1)`` utility, a version of it was later adopted by Adobe in the
PostScript language and is standardized in PDF 2.0 (ISO 32000-2).
This version, in both its ``btoa`` and PDF variants, is implemented by
:func:`a85encode`.

The :func:`a85encode` and :func:`b85encode` functions in this module are two implementations of
the de facto standard. You should call the function with the Base85
implementation used by the software you intend to work with.
A separate version, using a different output character set, was
defined as an April Fool's joke in :rfc:`1924` but is now used by Git
and other software. This version is implemented by :func:`b85encode`.

The two functions present in this module differ in how they handle the following:
Finally, a third version, using yet another output character set
designed for safe inclusion in programming language strings, is
defined by ZeroMQ and implemented here by :func:`z85encode`.

* Whether to include enclosing ``<~`` and ``~>`` markers
* Whether to include newline characters
* The set of ASCII characters used for encoding
* Handling of null bytes
The functions present in this module differ in how they handle the following:

* Whether to include and expect enclosing ``<~`` and ``~>`` markers.
* Whether to fold the input into multiple lines.
* The set of ASCII characters used for encoding.
* Compact encodings of sequences of spaces and null bytes.
* The encoding of zero-padding bytes applied to the input.

Refer to the documentation of the individual functions for more information.

Expand All @@ -307,18 +318,22 @@ Refer to the documentation of the individual functions for more information.

*foldspaces* is an optional flag that uses the special short sequence 'y'
instead of 4 consecutive spaces (ASCII 0x20) as supported by 'btoa'. This
feature is not supported by the "standard" Ascii85 encoding.
feature is not supported by the standard encoding used in PDF.

If *wrapcol* is non-zero, insert a newline (``b'\n'``) character
after at most every *wrapcol* characters.
If *wrapcol* is zero (default), do not insert any newlines.

If *pad* is true, the input is padded with ``b'\0'`` so its length is a
multiple of 4 bytes before encoding.
Note that the ``btoa`` implementation always pads.
*pad* controls whether zero-padding applied to the end of the input
is fully retained in the output encoding, as done by ``btoa``,
producing an exact multiple of 5 bytes of output. This is not part
of the standard encoding used in PDF, as it does not preserve the
length of the data.

*adobe* controls whether the encoded byte sequence is framed with ``<~``
and ``~>``, which is used by the Adobe implementation.
*adobe* controls whether the encoded byte sequence is framed with
``<~`` and ``~>``, as in a PostScript base-85 string literal. Note
that while ASCII85Decode streams in PDF documents *must* be
terminated with ``~>``, they *must not* use a leading ``<~``.

.. versionadded:: 3.4

Expand All @@ -330,10 +345,12 @@ Refer to the documentation of the individual functions for more information.

*foldspaces* is a flag that specifies whether the 'y' short sequence
should be accepted as shorthand for 4 consecutive spaces (ASCII 0x20).
This feature is not supported by the "standard" Ascii85 encoding.
This feature is not supported by the standard Ascii85 encoding used in
PDF and PostScript.

*adobe* controls whether the input sequence is in Adobe Ascii85 format
(i.e. is framed with <~ and ~>).
*adobe* controls whether the ``<~`` and ``~>`` markers are
present. While the leading ``<~`` is not required, the input must
end with ``~>``, or a :exc:`ValueError` is raised.

*ignorechars* should be a :term:`bytes-like object` containing characters
to ignore from the input.
Expand All @@ -356,8 +373,11 @@ Refer to the documentation of the individual functions for more information.
Encode the :term:`bytes-like object` *b* using base85 (as used in e.g.
git-style binary diffs) and return the encoded :class:`bytes`.

If *pad* is true, the input is padded with ``b'\0'`` so its length is a
multiple of 4 bytes before encoding.
The input is padded with ``b'\0'`` so its length is a multiple of 4
bytes before encoding. If *pad* is true, all the resulting
characters are retained in the output, which will always be a
multiple of 5 bytes, and thus the length of the data may not be
preserved on decoding.

If *wrapcol* is non-zero, insert a newline (``b'\n'``) character
after at most every *wrapcol* characters.
Expand All @@ -372,8 +392,7 @@ Refer to the documentation of the individual functions for more information.
.. function:: b85decode(b, *, ignorechars=b'', canonical=False)

Decode the base85-encoded :term:`bytes-like object` or ASCII string *b* and
return the decoded :class:`bytes`. Padding is implicitly removed, if
necessary.
return the decoded :class:`bytes`.

*ignorechars* should be a :term:`bytes-like object` containing characters
to ignore from the input.
Expand All @@ -392,11 +411,12 @@ Refer to the documentation of the individual functions for more information.
.. function:: z85encode(s, pad=False, *, wrapcol=0)

Encode the :term:`bytes-like object` *s* using Z85 (as used in ZeroMQ)
and return the encoded :class:`bytes`. See `Z85 specification
<https://rfc.zeromq.org/spec/32/>`_ for more information.
and return the encoded :class:`bytes`.

If *pad* is true, the input is padded with ``b'\0'`` so its length is a
multiple of 4 bytes before encoding.
The input is padded with ``b'\0'`` so its length is a multiple of 4
bytes before encoding. If *pad* is true, all the resulting
characters are retained in the output, which will always be a
multiple of 5 bytes, as required by the ZeroMQ standard.

If *wrapcol* is non-zero, insert a newline (``b'\n'``) character
after at most every *wrapcol* characters.
Expand All @@ -414,8 +434,7 @@ Refer to the documentation of the individual functions for more information.
.. function:: z85decode(s, *, ignorechars=b'', canonical=False)

Decode the Z85-encoded :term:`bytes-like object` or ASCII string *s* and
return the decoded :class:`bytes`. See `Z85 specification
<https://rfc.zeromq.org/spec/32/>`_ for more information.
return the decoded :class:`bytes`.

*ignorechars* should be a :term:`bytes-like object` containing characters
to ignore from the input.
Expand Down Expand Up @@ -499,3 +518,11 @@ recommended to review the security section for any code deployed to production.
Section 5.2, "Base64 Content-Transfer-Encoding," provides the definition of the
base64 encoding.

`ISO 32000-2 Portable document format - Part 2: PDF 2.0 <https://pdfa.org/resource/iso-32000-2/>`_
Section 7.4.3, "ASCII85Decode Filter," provides the definition
of the Ascii85 encoding used in PDF and PostScript, including
the output character set and the details of data length preservation
using zero-padding and partial output groups.

`ZeroMQ RFC 32/Z85 <https://rfc.zeromq.org/spec/32/>`_
The "Formal Specification" section provides the character set used in Z85.
27 changes: 18 additions & 9 deletions Doc/library/binascii.rst
Original file line number Diff line number Diff line change
Expand Up @@ -133,8 +133,11 @@ The :mod:`!binascii` module defines the following functions:
should be accepted as shorthand for 4 consecutive spaces (ASCII 0x20).
This feature is not supported by the "standard" Ascii85 encoding.

*adobe* controls whether the input sequence is in Adobe Ascii85 format
(i.e. is framed with <~ and ~>).
*adobe* controls whether the encoded byte sequence is framed with
``<~`` and ``~>``, as in a PostScript base-85 string literal. If
*adobe* is true, a leading ``<~`` is optionally accepted, while a
trailing ``~>`` is *required*, and :exc:`binascii.Error` is raised
if it is not found.

*ignorechars* should be a :term:`bytes-like object` containing characters
to ignore from the input.
Expand Down Expand Up @@ -164,12 +167,16 @@ The :mod:`!binascii` module defines the following functions:
after at most every *wrapcol* characters.
If *wrapcol* is zero (default), do not insert any newlines.

If *pad* is true, the input is padded with ``b'\0'`` so its length is a
multiple of 4 bytes before encoding.
Note that the ``btoa`` implementation always pads.
If *pad* is true, the zero-padding applied to the end of the input
is fully retained in the output encoding, as done by ``btoa``,
producing an exact multiple of 5 bytes of output. This is not part
of the standard encoding used in PDF, as it does not preserve the
length of the data.

*adobe* controls whether the encoded byte sequence is framed with ``<~``
and ``~>``, which is used by the Adobe implementation.
*adobe* controls whether the encoded byte sequence is framed with
``<~`` and ``~>``, as in a PostScript base-85 string literal. Note
that while ASCII85Decode streams in PDF documents *must* be
terminated with ``~>``, they *must not* use a leading ``<~``.

.. versionadded:: 3.15

Expand Down Expand Up @@ -213,8 +220,10 @@ The :mod:`!binascii` module defines the following functions:
after at most every *wrapcol* characters.
If *wrapcol* is zero (default), do not insert any newlines.

If *pad* is true, the input is padded with ``b'\0'`` so its length is a
multiple of 4 bytes before encoding.
If *pad* is true, the zero-padding applied to the end of the input
is retained in the output, which will always be a multiple of 5
bytes, and thus the length of the data may not be preserved on
decoding.

.. versionadded:: 3.15

Expand Down
38 changes: 24 additions & 14 deletions Lib/base64.py
Original file line number Diff line number Diff line change
Expand Up @@ -315,16 +315,20 @@ def a85encode(b, *, foldspaces=False, wrapcol=0, pad=False, adobe=False):

foldspaces is an optional flag that uses the special short sequence 'y'
instead of 4 consecutive spaces (ASCII 0x20) as supported by 'btoa'. This
feature is not supported by the "standard" Adobe encoding.
feature is not supported by the standard encoding used in PDF.

If wrapcol is non-zero, insert a newline (b'\\n') character after at most
every wrapcol characters.

pad controls whether the input is padded to a multiple of 4 before
encoding. Note that the btoa implementation always pads.
pad controls whether zero-padding applied to the end of the input
is fully retained in the output encoding, as done by btoa,
producing an exact multiple of 5 bytes of output.

adobe controls whether the encoded byte sequence is framed with <~
and ~>, as in a PostScript base-85 string literal. Note that
while ASCII85Decode streams in PDF documents must be terminated
with ~>, they must not use a leading <~.

adobe controls whether the encoded byte sequence is framed with <~ and ~>,
which is used by the Adobe implementation.
"""
return binascii.b2a_ascii85(b, foldspaces=foldspaces,
adobe=adobe, wrapcol=wrapcol, pad=pad)
Expand All @@ -333,12 +337,14 @@ def a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v',
canonical=False):
"""Decode the Ascii85 encoded bytes-like object or ASCII string b.

foldspaces is a flag that specifies whether the 'y' short sequence should be
accepted as shorthand for 4 consecutive spaces (ASCII 0x20). This feature is
not supported by the "standard" Adobe encoding.
foldspaces is a flag that specifies whether the 'y' short sequence
should be accepted as shorthand for 4 consecutive spaces (ASCII
0x20). This feature is not supported by the standard Ascii85
encoding used in PDF and PostScript.

adobe controls whether the input sequence is in Adobe Ascii85 format (i.e.
is framed with <~ and ~>).
adobe controls whether the <~ and ~> markers are present. While
the leading <~ is not required, the input must end with ~>, or a
ValueError is raised.

ignorechars should be a byte string containing characters to ignore from the
input. This should only contain whitespace characters, and by default
Expand All @@ -358,8 +364,10 @@ def b85encode(b, pad=False, *, wrapcol=0):
If wrapcol is non-zero, insert a newline (b'\\n') character after at most
every wrapcol characters.

If pad is true, the input is padded with b'\\0' so its length is a multiple of
4 bytes before encoding.
The input is padded with b'\0' so its length is a multiple of 4
bytes before encoding. If pad is true, all the resulting
characters are retained in the output, which will always be a
multiple of 5 bytes.
"""
return binascii.b2a_base85(b, wrapcol=wrapcol, pad=pad)

Expand All @@ -379,8 +387,10 @@ def z85encode(s, pad=False, *, wrapcol=0):
If wrapcol is non-zero, insert a newline (b'\\n') character after at most
every wrapcol characters.

If pad is true, the input is padded with b'\\0' so its length is a multiple of
4 bytes before encoding.
The input is padded with b'\0' so its length is a multiple of
bytes before encoding. If pad is true, all the resulting
characters are retained in the output, which will always be a
multiple of 5 bytes, as required by the ZeroMQ standard.
"""
return binascii.b2a_base85(s, wrapcol=wrapcol, pad=pad,
alphabet=binascii.Z85_ALPHABET)
Expand Down
13 changes: 7 additions & 6 deletions Modules/binascii.c
Original file line number Diff line number Diff line change
Expand Up @@ -1057,7 +1057,8 @@ binascii.a2b_ascii85
foldspaces: bool = False
Allow 'y' as a short form encoding four spaces.
adobe: bool = False
Expect data to be wrapped in '<~' and '~>' as in Adobe Ascii85.
Expect data to be terminated with '~>' as in Adobe Ascii85, and
optionally accept leading '<~'.
ignorechars: Py_buffer = b''
A byte string containing characters to ignore from the input.
canonical: bool = False
Expand All @@ -1069,7 +1070,7 @@ Decode Ascii85 data.
static PyObject *
binascii_a2b_ascii85_impl(PyObject *module, Py_buffer *data, int foldspaces,
int adobe, Py_buffer *ignorechars, int canonical)
/*[clinic end generated code: output=09b35f1eac531357 input=dd050604ed30199e]*/
/*[clinic end generated code: output=09b35f1eac531357 input=08eab2e53c62f1a8]*/
{
const unsigned char *ascii_data = data->buf;
Py_ssize_t ascii_len = data->len;
Expand Down Expand Up @@ -1264,7 +1265,7 @@ binascii.b2a_ascii85
wrapcol: size_t = 0
Split result into lines of provided width.
pad: bool = False
Pad input to a multiple of 4 before encoding.
Retain zero-padding bytes at end of output.
adobe: bool = False
Wrap result in '<~' and '~>' as in Adobe Ascii85.

Expand All @@ -1274,7 +1275,7 @@ Ascii85-encode data.
static PyObject *
binascii_b2a_ascii85_impl(PyObject *module, Py_buffer *data, int foldspaces,
size_t wrapcol, int pad, int adobe)
/*[clinic end generated code: output=5ce8fdee843073f4 input=791da754508c7d17]*/
/*[clinic end generated code: output=5ce8fdee843073f4 input=a77e31d63517bf19]*/
{
const unsigned char *bin_data = data->buf;
Py_ssize_t bin_len = data->len;
Expand Down Expand Up @@ -1539,7 +1540,7 @@ binascii.b2a_base85
/
*
pad: bool = False
Pad input to a multiple of 4 before encoding.
Retain zero-padding bytes at end of output.
wrapcol: size_t = 0
alphabet: Py_buffer(c_default="{NULL, NULL}") = BASE85_ALPHABET

Expand All @@ -1549,7 +1550,7 @@ Base85-code line of data.
static PyObject *
binascii_b2a_base85_impl(PyObject *module, Py_buffer *data, int pad,
size_t wrapcol, Py_buffer *alphabet)
/*[clinic end generated code: output=98b962ed52c776a4 input=1b20b0bd6572691b]*/
/*[clinic end generated code: output=98b962ed52c776a4 input=54886d05128d41a8]*/
{
const unsigned char *bin_data = data->buf;
Py_ssize_t bin_len = data->len;
Expand Down
9 changes: 5 additions & 4 deletions Modules/clinic/binascii.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading