Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Packing and unpacking bytes

Posted on Jul 22 • Originally published at blog.Shalvah.me Goal: A brief exploration of what it means to "pack" and "unpack" Bytes.I've come across Ruby's Array#pack and String#unpack methods, but never had the time to dive into them. While researching another article, I came across this question and decided to stop to explore it.I can't define "packing", but I've gathered that it's a term for representing a series of bytes as a String. And depending on how you do it, you can even do this in fewer bytes than the original. Unpacking is the reverse: recovering the original information.Trying an example based on the Stack Overflow question. I have a bunch of bytes, ie values between 0 (00000000) and 255 (11111111). Supposing I take two at random, maybe 126 and 2.I could represent them in a string by using the JS escape hexadecimal sequence:However, this isn't what I want, as this string has two characters. JavaScript strings are UTF-16 [note 1], so this string has 4 bytes, which is more than the original.This string has two characters of two bytes each: 00 7e and 00 02. I want to pack the bytes so the string has only one character, 7e 02. Here's how:This is a bit of bit arithmetic (haha).So there it is. I started with two bytes, and was able to fit them into a 2-byte character [note 2]. How about unpacking? Some more bitwise magic.Cool, cool.I also found out you can do this packing natively with the TextDecoder API! [note 3]However, unpacking with TextEncoder gives wrong results for this use case, since it only supports UTF-8:Speaking of UTF-8, it's time to try that. But I'm changing some things:Packing in Ruby is pretty similar:The output string here is a single byte "\xD2"...which is simply the original 0D and 02 bytes packed together 😀 Unfortunately, it's not a valid printable character, so printing it shows �, but it's there.As mentioned earlier, Ruby has inbuilt pack and unpack methods, but they can only map byte to byte, so i couldn't use them for this example.But they work with the original UTF-16 example:It may not look like that, but the packed version here ("~\x02") is exactly the same as my manually packed JavaScript version. It contains the exact two bytes, 7E 02. The difference is the encoding; in Ruby, this string is UTF-8, so it's rendered differently. But I can change the encoding and see for myself!Why would you want to pack, though? I'm thinking, perhaps in a constrained environment like gaming over the Internet. If there is a limited number of possible buttons a player can press (say 12), instead of transmitting each button press as one byte, I could:In this, packing serves as a form of compression, to send less data over the network and improve the gaming experience (less data to download, so responses can be faster).I also found this question, from a user who wanted to send a UUID as binary data. This is a valid use, since UUIDs are often rendered as strings, but they're actually a sequence of 16 bytes. Sending them as a string would take 36 bytes, so packing is useful here. You could also do this for other "binary-but-look-like-strings" data, like SHA-512 hashes for instance.Let me know if you can think of any other uses.1. The ECMAScript spec says:When a String contains actual textual data, each element is considered to be a single UTF-16 code unit.So JS strings are UTF-16. However, many modern Web APIs, like Blob and TextEncoder, and even older Node.js ones like Buffer assume (or accept only) UTF-8. My guess is that they expect the string to be from the outside world (reading a file, an API response, etc), in which case, it's most likely UTF-8.2.The only reliable way I found to get the byte length of a native JS string (UTF-16) is Buffer.from(string, 'utf16le').byteLength. Commonly suggested ways I found include TextEncoder and Blob, but they always assume UTF-8.3.For this to work as expected, I had to specify UTF-16 Big Endian (utf-16be) as the encoding. UTF-16 because I want 2-bytes per character, and big-endian because I want the big digits at the end, like I did in the custom packer.Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse G.Dev.Ssomsak - Jul 21 Mandi Walls - Jul 21 Francisco Júnior - Jul 21 LEANDRO - Jul 21 Once suspended, shalvah will not be able to comment or publish posts until their suspension is removed. Once unsuspended, shalvah will be able to comment and publish posts again. Once unpublished, all posts by shalvah will become hidden and only accessible to themselves. If shalvah is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to Shalvah. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag shalvah: shalvah consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging shalvah will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

Packing and unpacking bytes

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×