Learn Go← Dashboard

UTF-8, ranging over strings, and the strings package.

UTF-8, Bytes, and Runes

We touched on strings in chapter 3. Here's the full picture.

What a string actually is

A Go string is a read-only sequence of bytes. By convention these bytes encode UTF-8 — but the language doesn't force that on you.

Most of the time you can ignore the distinction and just treat strings as text. You only need to think about bytes vs runes when you're:

  • counting characters,
  • slicing into the middle of a string, or
  • processing one character at a time.

Index vs range

Indexing returns a byte:

s := "café"
fmt.Println(s[0])   // 99 (the byte 'c')
fmt.Println(s[3])   // first byte of 'é' (because UTF-8 encodes é as 2 bytes)

for _, r := range s decodes UTF-8 and gives you runes (Unicode code points):

go playground
Loading...

Converting between bytes, runes, and strings

s := "hello"
bs := []byte(s)        // []byte{'h','e','l','l','o'}
rs := []rune("héllo")  // []rune{'h','é','l','l','o'}  — 5 runes
s2 := string(bs)       // back to string
s3 := string(rs)

These are O(n) operations: they always allocate and copy.

Useful packages for strings

| Package | What for | | ---------------- | ----------------------------------------------- | | strings | Search, split, replace, case (see next lesson). | | strconv | Parsing / formatting numbers. | | unicode | Categorize runes (IsDigit, IsLetter, …). | | unicode/utf8 | Count or decode UTF-8. | | unicode/utf16 | UTF-16, mostly for interop. |

Strings are immutable

s := "hello"
s[0] = 'H'   // compile error — strings cannot be assigned to by index

To "modify" a string, convert it to []byte or []rune, mutate the slice, and convert back. This always allocates a fresh string — the old one is unchanged.

Strings are also comparable with ==, <, >: comparison is byte-by-byte, which equals lexicographic order for ASCII but not necessarily for other scripts.

Why is `for _, r := range s` preferred over `for i := 0; i < len(s); i++ { use(s[i]) }` for non-ASCII text?