construct: add adapter Utf8Adapter to safely interpret utf8 text

Uninitialized Files, File records or fields in a File record or File
usually contain a string of 0xff bytes. This becomes a problem when the
content is normally encoded/decoded as utf8 since by the construct
parser. The parser will throw an expection when it tries to decode the
0xff string as utf8. This is especially a serious problem in pySim-trace
where an execption stops the parser.

Let's fix this by interpreting a string of 0xff as an empty string.

Related: OS#6094
Change-Id: Id114096ccb8b7ff8fcc91e1ef3002526afa09cb7
This commit is contained in:
Philipp Maier
2023-07-26 17:01:37 +02:00
parent fec721fcb1
commit 791f80a44f
3 changed files with 25 additions and 10 deletions

View File

@@ -6,6 +6,7 @@ from construct.core import evaluate, BitwisableString
from construct.lib import integertypes
from pySim.utils import b2h, h2b, swap_nibbles
import gsm0338
import codecs
"""Utility code related to the integration of the 'construct' declarative parser."""
@@ -34,6 +35,18 @@ class HexAdapter(Adapter):
def _encode(self, obj, context, path):
return h2b(obj)
class Utf8Adapter(Adapter):
"""convert a bytes() type that contains utf8 encoded text to human readable text."""
def _decode(self, obj, context, path):
# In case the string contains only 0xff bytes we interpret it as an empty string
if obj == b'\xff' * len(obj):
return ""
return codecs.decode(obj, "utf-8")
def _encode(self, obj, context, path):
return codecs.encode(obj, "utf-8")
class BcdAdapter(Adapter):
"""convert a bytes() type to a string of BCD nibbles."""