Astral Plane characters in Erlang JSON/RFC4627 implementation

By: on November 16, 2007

Sam Ruby examines support for astral-plane characters in various JSON implementations. His post prompted me to check my Erlang implementation of rfc4627. I found that for astral plane characters in utf-8, utf-16, or utf-32, everything worked properly, but the RFC4627-mandated surrogate-pair “uXXXX” encodings broke. A few minutes hacking later, and:

Eshell V5.5.5  (abort with ^G)
1> {ok, Utf8Encoded, []} =
        rfc4627:decode(""u007au6c34ud834udd1e"").
{ok,<<122,230,176,180,240,157,132,158>>,[]}
2> xmerl_ucs:from_utf8(Utf8Encoded).
[122,27700,119070]
3> rfc4627:encode(Utf8Encoded).
[34,122,230,176,180,240,157,132,158,34]
4> 

Much better.

You can get the updated code from github.com/tonyg/erlang-rfc4627.

FacebookTwitterGoogle+

2 Comments

  1. kunthar says:

    root@testbed2:~/packages/erlang-rfc4627# erl
    Erlang R13B (erts-5.7.1) [source] [64-bit] [smp:3:3] [rq:3] [async-threads:0] [hipe] [kernel-poll:false]

    Eshell V5.7.1 (abort with ^G)
    1> {ok, Utf8Encoded, []} =
    1> rfc4627:decode(””u007au6c34ud834udd1e””).
    * 2: illegal character
    1>

  2. tonyg says:

    By contrast,

    ~/dev/erlang-rfc4627$ erl -pa ebin
    Erlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]
    
    Eshell V5.7.2  (abort with ^G)
    1> rfc4627:decode(""u007au6c34ud834udd1e"").
    {ok,<<122,230,176,180,240,157,132,158>>,[]}
    2> 
    

    I can’t reproduce the problem. Perhaps it’s a cut-and-paste error?

Post a comment

Your email address will not be published.

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>