From: Christophe Rhodes Date: Fri, 3 Oct 2014 17:03:52 +0000 (+0100) Subject: fix multibyte decoding protocol bug X-Git-Url: http://christophe.rhodes.io/gitweb/?a=commitdiff_plain;h=a1e8d6a615cbc31ffe84b17cc07943310917f8fd;p=swankr.git fix multibyte decoding protocol bug finally! The slime connection got out of sync otherwise. This fix only works if the locale is a UTF-8 one, mind you. --- diff --git a/BUGS.org b/BUGS.org index 5ec0d11..3da8de2 100644 --- a/BUGS.org +++ b/BUGS.org @@ -15,7 +15,7 @@ with a reference to the corresponding source. Unfortunately, emacs only passes the buffer position in bytes (or maybe characters), whereas R's srcrefs work with lines and columns. -* OPEN #4 multibyte characters corrupt slime connection :NORMAL: +* RESOLVED #4 multibyte characters corrupt slime connection :NORMAL: Not in all circumstances (e.g. ="£"= is OK) but =1:£= fails in slime-net-read-or-lose. * RESOLVED #5 respect visibility of evaluated results :WISHLIST:FIXED: @@ -61,7 +61,7 @@ * OPEN #16 ESS configuration :MINOR: sorting out the function regexp at least, but generally reducing dependence might be good. -* OPEN #17 encoding / external-format confusion :NORMAL: +* RESOLVED #17 encoding / external-format confusion :NORMAL: We declare ourselves capable of handling utf-8-unix encoding, but whether we actually do anything close to being correct is unclear. (Almost certainly not; I suspect we naïvely use nchar() in places). diff --git a/swank.R b/swank.R index 57ead0a..901f1c1 100644 --- a/swank.R +++ b/swank.R @@ -141,19 +141,27 @@ readPacket <- function(io) { header <- readChunk(io, 6) len <- strtoi(header, base=16) payload <- readChunk(io, len) - readSexpFromString(payload) + sexp <- readSexpFromString(payload) + sexp } readChunk <- function(io, len) { - buffer <- readChar(io, len) + buffer <- readChar(io, len, useBytes=TRUE) if(length(buffer) == 0) { condition <- simpleCondition("End of file on io") class(condition) <- c("endOfFile", class(condition)) signalCondition(condition) } - if(nchar(buffer) != len) { - stop("short read in readChunk") - } + ## FIXME: with the useBytes argument to readChar, it is normal for + ## the buffer returned to be fewer character than bytes were read, + ## given the possibility of multibyte characters. However, that + ## means we can’t detect at all the case where there is actually a + ## short read (though empirically the readChar call blocks rather + ## than returning early). + ## + ## if(nchar(buffer) != len) { + ## stop("short read in readChunk") + ## } buffer }