Vivak kumar Vivak kumar - 6 months ago 70
Ruby Question

Bytes vs codepoints in ruby

What is the difference between ruby string functions:- codepoints and bytes

=> [97, 98, 99, 100]

=> [97, 98, 99, 100]


bytes returns individual bytes, regardless of char size, whereas codepoints returns unicode codepoints.

s = '日本語'
s.bytes # => [230, 151, 165, 230, 156, 172, 232, 170, 158]
s.codepoints # => [26085, 26412, 35486]
s.chars # => ["日", "本", "語"]

I see where your confusion arises from. Ruby uses utf-8 encoding by default now and utf-8 was specifically designed so that its first codepoints (0-127) are exactly the same as in ASCII encoding. ASCII is an encoding with one-byte chars, so in examples in your questions bytes and codepoints return the same values, coincindentally.

So, if you need to break string into characters, use either chars or codepoints (whichever is appropriate for your use case). Use bytes only when you treat string as an opaque binary blob, not text.