Unicode, Charsets, Strings, and Binaries

TALK LEVEL: BEGINNER / INTERMEDIATE / ADVANCED

Writing global software means our programs need to speak global human languages, but writing programs that work correctly with non-Western European languages is at best a confusing affair. UTF8, latin1, Unicode?

What do these terms mean and how are they related to one another?

And what does Erlang do?

This talk demystifies the terminology around character encoding, explains how to retrofit your Erlang program for Unicode using Datometry HyperQ as a case study, and gives some best practices to help you break the one-byte/one-character assumption.

THIS TALK IN THREE WORDS

Character sets

Character encoding

Clarity

OBJECTIVES

  • Demystify terminology around character sets and character set encoding.
  • Provide best practices to avoid common pitfalls.

TARGET AUDIENCE

Erlang developers working on internationalised software.