Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.
add 0x10000 to get a codepoint. That 20-bit number now appears in
this form:
11101101 1010abcd 10efghij 11101101 1011klmn 10opqrst
The CESU8_RE above matches byte sequences of this form. Then we need
to extract the bits and assemble a codepoint number from them.
"""
if len(input) < 6:
if final:
return sup(input, errors, True)
else:
return '', 0
else:
if CESU8_RE.match(input):
bytenums = bytes_to_ints(input[:6])
codepoint = (
((bytenums[1] & 0x0f) << 16) +
((bytenums[2] & 0x3f) << 10) +
((bytenums[4] & 0x0f) << 6) +
(bytenums[5] & 0x3f) +
0x10000
)
return unichr(codepoint), 6
else:
return sup(input[:3], errors, False)
if len(input) < 6:
if final:
# We found 0xed near the end of the stream, and there aren't
# six bytes to decode. Delegate to the superclass method
# to handle this error.
return sup(input, errors, final)
else:
# We found 0xed, the stream isn't over yet, and we don't know
# enough of the following bytes to decode anything, so consume
# zero bytes and wait.
return '', 0
else:
if CESU8_RE.match(input):
# If this is a CESU-8 sequence, do some math to pull out
# the intended 20-bit value, and consume six bytes.
bytenums = bytes_to_ints(input[:6])
codepoint = (
((bytenums[1] & 0x0f) << 16) +
((bytenums[2] & 0x3f) << 10) +
((bytenums[4] & 0x0f) << 6) +
(bytenums[5] & 0x3f) +
0x10000
)
return unichr(codepoint), 6
else:
# This looked like a CESU-8 sequence, but it wasn't one.
# 0xed indicates the start of a three-byte sequence, so give
# three bytes to the superclass, so it can either decode them
# as a surrogate codepoint (on Python 2) or handle the error
# (on Python 3).
return sup(input[:3], errors, False)