From: Christophe Rhodes <csr21@cantab.net>
Date: Fri, 3 Oct 2014 17:03:52 +0000 (+0100)
Subject: fix multibyte decoding protocol bug
X-Git-Url: http://christophe.rhodes.io/gitweb/?a=commitdiff_plain;h=a1e8d6a615cbc31ffe84b17cc07943310917f8fd;p=swankr.git

fix multibyte decoding protocol bug

finally!  The slime connection got out of sync otherwise.  This fix
only works if the locale is a UTF-8 one, mind you.
---

diff --git a/BUGS.org b/BUGS.org
index 5ec0d11..3da8de2 100644
--- a/BUGS.org
+++ b/BUGS.org
@@ -15,7 +15,7 @@
   with a reference to the corresponding source.  Unfortunately, emacs
   only passes the buffer position in bytes (or maybe characters),
   whereas R's srcrefs work with lines and columns.
-* OPEN #4 multibyte characters corrupt slime connection              :NORMAL:
+* RESOLVED #4 multibyte characters corrupt slime connection          :NORMAL:
   Not in all circumstances (e.g. ="Â£"= is OK) but =1:Â£= fails in
   slime-net-read-or-lose.
 * RESOLVED #5 respect visibility of evaluated results        :WISHLIST:FIXED:
@@ -61,7 +61,7 @@
 * OPEN #16 ESS configuration                                          :MINOR:
   sorting out the function regexp at least, but generally reducing
   dependence might be good.
-* OPEN #17 encoding / external-format confusion                      :NORMAL:
+* RESOLVED #17 encoding / external-format confusion                  :NORMAL:
   We declare ourselves capable of handling utf-8-unix encoding, but
   whether we actually do anything close to being correct is unclear.
   (Almost certainly not; I suspect we naÃ¯vely use nchar() in places).
diff --git a/swank.R b/swank.R
index 57ead0a..901f1c1 100644
--- a/swank.R
+++ b/swank.R
@@ -141,19 +141,27 @@ readPacket <- function(io) {
   header <- readChunk(io, 6)
   len <- strtoi(header, base=16)
   payload <- readChunk(io, len)
-  readSexpFromString(payload)
+  sexp <- readSexpFromString(payload)
+  sexp
 }
 
 readChunk <- function(io, len) {
-  buffer <- readChar(io, len)
+  buffer <- readChar(io, len, useBytes=TRUE)
   if(length(buffer) == 0) {
     condition <- simpleCondition("End of file on io")
     class(condition) <- c("endOfFile", class(condition))
     signalCondition(condition)
   }
-  if(nchar(buffer) != len) {
-    stop("short read in readChunk")
-  }
+  ## FIXME: with the useBytes argument to readChar, it is normal for
+  ## the buffer returned to be fewer character than bytes were read,
+  ## given the possibility of multibyte characters.  However, that
+  ## means we canât detect at all the case where there is actually a
+  ## short read (though empirically the readChar call blocks rather
+  ## than returning early).
+  ##
+  ## if(nchar(buffer) != len) {
+  ##   stop("short read in readChunk")
+  ## }
   buffer
 }