<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.0.12-alpha" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: How should JSON strings be represented in Erlang?</title>
	<link>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang</link>
	<description>What happens at LShift</description>
	<pubDate>Sat, 22 Nov 2008 00:04:05 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.12-alpha</generator>

	<item>
		<title>by: tonyg</title>
		<link>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-67309</link>
		<pubDate>Wed, 03 Oct 2007 15:13:19 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-67309</guid>
					<description>&lt;p&gt;Thanks all for your comments. I've gone with the middle option - I'm writing a post now announcing the new version of the code.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Thanks all for your comments. I&#8217;ve gone with the middle option - I&#8217;m writing a post now announcing the new version of the code.</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: dda</title>
		<link>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-66802</link>
		<pubDate>Tue, 25 Sep 2007 08:31:57 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-66802</guid>
					<description>&lt;p&gt;&lt;i&gt;I think dda’s suggested representation is a good idea (it would work best if it were adopted as the “official” representation of encoded strings in Erlang).&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;I tend to agree :-) This is why I really think I should get my act together and clean up and publish whatever I have. I think the beauty in that scheme is that it doesn't force you to store and manipulate your strings in one encoding -- or even to &lt;em&gt;care&lt;/em&gt; about the encoding.&lt;/p&gt;

&lt;p&gt;Remember the discussion, for those who were interested and cared to follow it, on the Ruby list where people clamored for Unicode, and Matz promised an all-encompassing scheme [since Japanese tend &lt;em&gt;not to&lt;/em&gt; use UTF]? My background is in East-Asian languages, so my focus in mb was on CJK and their encodings [way too many Asian pages NOT in Unicode], although I did add a bunch of Latin* and Windows codepages encodings.&lt;/p&gt;

&lt;p&gt;The first few times I spoke of string manipulations on Erlang-related forums -- the list and the IRC chat room mostly -- I draw mostly the equivalent of blank stares. I guess most users, especially long-time ones, have zero need for anything but ASCII. Erlang shows its roots, and clearly, as far as TEXT is concerned, it sucks. Whether the Erlang team fixes that will depend on the users making them aware of their needs, and probably of our help. Honestly, I don't want an mb module, I want a new string type and BIFs and code added to the erlang: module.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p><i>I think dda’s suggested representation is a good idea (it would work best if it were adopted as the “official” representation of encoded strings in Erlang).</i></p>
<p>I tend to agree :-) This is why I really think I should get my act together and clean up and publish whatever I have. I think the beauty in that scheme is that it doesn&#8217;t force you to store and manipulate your strings in one encoding &#8212; or even to <em>care</em> about the encoding.</p>
<p>Remember the discussion, for those who were interested and cared to follow it, on the Ruby list where people clamored for Unicode, and Matz promised an all-encompassing scheme [since Japanese tend <em>not to</em> use UTF]? My background is in East-Asian languages, so my focus in mb was on CJK and their encodings [way too many Asian pages NOT in Unicode], although I did add a bunch of Latin* and Windows codepages encodings.</p>
<p>The first few times I spoke of string manipulations on Erlang-related forums &#8212; the list and the IRC chat room mostly &#8212; I draw mostly the equivalent of blank stares. I guess most users, especially long-time ones, have zero need for anything but ASCII. Erlang shows its roots, and clearly, as far as TEXT is concerned, it sucks. Whether the Erlang team fixes that will depend on the users making them aware of their needs, and probably of our help. Honestly, I don&#8217;t want an mb module, I want a new string type and BIFs and code added to the erlang: module.</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Jim Larson</title>
		<link>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-66794</link>
		<pubDate>Tue, 25 Sep 2007 07:39:41 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-66794</guid>
					<description>&lt;p&gt;Bruce,&lt;/p&gt;

&lt;p&gt;Check out &lt;a href="http://www.erlang.org/ml-archive/erlang-questions/200511/msg00193.html" rel="nofollow"&gt;http://www.erlang.org/ml-archive/erlang-questions/200511/msg00193.html&lt;/a&gt;.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Bruce,</p>
<p>Check out <a href="http://www.erlang.org/ml-archive/erlang-questions/200511/msg00193.html" rel="nofollow">http://www.erlang.org/ml-archive/erlang-questions/200511/msg00193.html</a>.</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Bruce</title>
		<link>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-66792</link>
		<pubDate>Tue, 25 Sep 2007 07:17:31 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-66792</guid>
					<description>&lt;p&gt;David: I think you meant: http://www.erlang.org/pipermail/erlang-questions/2007-September/thread.html&lt;/p&gt;

&lt;p&gt;(note change to pipermail in October '06 :-).&lt;/p&gt;

&lt;p&gt;Jim: I can't see any replies on-list. Care to summarise?&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>David: I think you meant: http://www.erlang.org/pipermail/erlang-questions/2007-September/thread.html</p>
<p>(note change to pipermail in October &#8216;06 :-).</p>
<p>Jim: I can&#8217;t see any replies on-list. Care to summarise?</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Jim Larson</title>
		<link>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-66790</link>
		<pubDate>Tue, 25 Sep 2007 06:05:40 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-66790</guid>
					<description>&lt;p&gt;As Joe Armstrong mentions in his reply to me, the standard formatting  of a binary is pretty opaque, while a list of codepoints displays  as a string as long as it maps to the Latin-1 range.&lt;/p&gt;

&lt;p&gt;One additional feature to consider is the selective mapping of JSON object member name strings to Erlang atoms.  This lets objects look  a little more natural as Erlang terms:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;    {obj, [{alpha, 0.123}, {renormalized, false}]}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(Ah, if only Erlang had a native dictionary type.)&lt;/p&gt;

&lt;p&gt;If you're worried about arbitrary JSON data exhausting the atom table, you can first try list&lt;em&gt;to&lt;/em&gt;existing_atom/1 (since the member names your code is prepared to use will be pre-loaded, right?), falling back to the conventional string representation if the conversion fails.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>As Joe Armstrong mentions in his reply to me, the standard formatting  of a binary is pretty opaque, while a list of codepoints displays  as a string as long as it maps to the Latin-1 range.</p>
<p>One additional feature to consider is the selective mapping of JSON object member name strings to Erlang atoms.  This lets objects look  a little more natural as Erlang terms:</p>
<pre><code>    {obj, [{alpha, 0.123}, {renormalized, false}]}
</code></pre>
<p>(Ah, if only Erlang had a native dictionary type.)</p>
<p>If you&#8217;re worried about arbitrary JSON data exhausting the atom table, you can first try list<em>to</em>existing_atom/1 (since the member names your code is prepared to use will be pre-loaded, right?), falling back to the conventional string representation if the conversion fails.</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: David Hopwood</title>
		<link>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-66781</link>
		<pubDate>Tue, 25 Sep 2007 00:29:51 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-66781</guid>
					<description>&lt;p&gt;I think dda's suggested representation is a good idea (it would work best if it were adopted as the "official" representation of &lt;b&gt;encoded&lt;/b&gt; strings in Erlang).&lt;/p&gt;

&lt;p&gt;&lt;i&gt;Cons: Codec needs to perform possibly-redundant Unicode encoding/decoding steps to ensure that the binaries hold UTF8 even if, say, UTF32 were the format to be used on the wire&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;UTF-32 is, to a first approximation, never used on the wire. In principle the argument stands for UTF-16, but UTF-8 is significantly more commonly used in protocols.&lt;/p&gt;

&lt;p&gt;Incidentally, there is a thread about this post on the erlang-questions list, subject "strings, json, and what happens now" (not in the archive yet, but it will be &lt;a href="http://www.erlang.org/ml-archive/erlang-questions/200609/threads.html" rel="nofollow"&gt;here&lt;/a&gt;).&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I think dda&#8217;s suggested representation is a good idea (it would work best if it were adopted as the &#8220;official&#8221; representation of <b>encoded</b> strings in Erlang).</p>
<p><i>Cons: Codec needs to perform possibly-redundant Unicode encoding/decoding steps to ensure that the binaries hold UTF8 even if, say, UTF32 were the format to be used on the wire</i></p>
<p>UTF-32 is, to a first approximation, never used on the wire. In principle the argument stands for UTF-16, but UTF-8 is significantly more commonly used in protocols.</p>
<p>Incidentally, there is a thread about this post on the erlang-questions list, subject &#8220;strings, json, and what happens now&#8221; (not in the archive yet, but it will be <a href="http://www.erlang.org/ml-archive/erlang-questions/200609/threads.html" rel="nofollow">here</a>).</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: dda</title>
		<link>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-66763</link>
		<pubDate>Mon, 24 Sep 2007 16:17:54 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-66763</guid>
					<description>&lt;p&gt;I have worked on and off on a library, called mb for multi-byte, creating a new string type and making manipulating non-ASCII strings easy[~ier], including conversion between encodings, encoding-safe common string manipulation methods [left, mid, right, reverse, replace, etc]. It has been on hold for a while, and should probably made public so that others reuse whatever useful code there is.&lt;/p&gt;

&lt;p&gt;The new string type is a tuple, {Encoding::atom(), String::binary()}, and i/o methods fold and unfold the data to and from the tuple. A little tedious, but seems to work so far. &lt;a href="http://www.sungnyemun.org/wordpress/wp-content/KanjiTest.html" rel="nofollow"&gt;This File&lt;/a&gt; was produced by mb's test framework.&lt;/p&gt;

&lt;p&gt;The advantage I saw in going this route is that the data stays as it is in reality, and mb strings can accept many encodings transparently. The encoding-safe manipulation functions [eg, getNextChar() retrieves codepoints, not bytes], make sure I don't botch the original.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I have worked on and off on a library, called mb for multi-byte, creating a new string type and making manipulating non-ASCII strings easy[~ier], including conversion between encodings, encoding-safe common string manipulation methods [left, mid, right, reverse, replace, etc]. It has been on hold for a while, and should probably made public so that others reuse whatever useful code there is.</p>
<p>The new string type is a tuple, {Encoding::atom(), String::binary()}, and i/o methods fold and unfold the data to and from the tuple. A little tedious, but seems to work so far. <a href="http://www.sungnyemun.org/wordpress/wp-content/KanjiTest.html" rel="nofollow">This File</a> was produced by mb&#8217;s test framework.</p>
<p>The advantage I saw in going this route is that the data stays as it is in reality, and mb strings can accept many encodings transparently. The encoding-safe manipulation functions [eg, getNextChar() retrieves codepoints, not bytes], make sure I don&#8217;t botch the original.</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Daniel Lyons</title>
		<link>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-65822</link>
		<pubDate>Fri, 14 Sep 2007 08:05:45 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/09/13/how-should-json-strings-be-represented-in-erlang#comment-65822</guid>
					<description>&lt;p&gt;It would be interesting to see what the &lt;a href="http://couchdb.org/CouchDB/CouchDBWeb.nsf/Home?OpenForm" rel="nofollow"&gt;CouchDB&lt;/a&gt; guys are doing, since they are storing JSON data in Erlang (Mnesia, I believe) but presenting to the world a JSON over REST interface. Perhaps their &lt;a href="http://www.couchdbwiki.com/index.php?title=Getting_Started_with_Erlang" rel="nofollow"&gt;Getting Started&lt;/a&gt; document will show you everything you need to know; now I think I'd like to see someone store and retrieve a list of ASCII-range integers just to see if it handles it correctly.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>It would be interesting to see what the <a href="http://couchdb.org/CouchDB/CouchDBWeb.nsf/Home?OpenForm" rel="nofollow">CouchDB</a> guys are doing, since they are storing JSON data in Erlang (Mnesia, I believe) but presenting to the world a JSON over REST interface. Perhaps their <a href="http://www.couchdbwiki.com/index.php?title=Getting_Started_with_Erlang" rel="nofollow">Getting Started</a> document will show you everything you need to know; now I think I&#8217;d like to see someone store and retrieve a list of ASCII-range integers just to see if it handles it correctly.</p>
]]></content:encoded>
				</item>
</channel>
</rss>
