There's scope for plenty of improvements to SBCL's support for Unicode:

  • make it easy to update the Unicode data to a new version. Currently it involves thought, sometimes non-trivial amounts of it, in order to get the character database and magic constants related to it right.
  • consider normalizing the names of symbols on creation. (It's perhaps not conformant to do so given non-normalized strings; maybe we should normalize strings too?)
  • speaking of normalization, normalizations could be optimized, using the QuickCheck property (NFC_QC and similar). That would first need to be included in the character database, and then used.
  • we should also try to provide information in the UCD to users, so that they can do their own Unicode-aware processing. (There's some of that in cl-unicode but last I looked there were things substantially missing).
  • we don't currently support any kind of language-aware case-insensitive comparison, nor any collation. That's a bit of a shame. (Does it make sense to think about supporting bidi in format and/or pprint-logical block)?