Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Why does Node.js convert BOM character to 0xFE 0xFF?

Why does Node.js convert BOM character to 0xFE 0xFF?

Problem

I have been working with node's fs.readFileSync(), passing "utf8" as the encoding to read input. When the file contains a BOM character in UTF8 (0xEF 0xBF 0xBB) it converts it to the byte sequence 0xFE 0xFF instead, which is the Unicode encoding.

Why does it do this? Why not keep the origin sequence for BOMs in UTF8?

Problem courtesy of: Rick Eyre

Solution

The BOM is character U+FEFF. 0xEF 0xBB 0xBF is its UTF-8 representation. But by reading with an encoding of utf8, you're decoding UTF-8. At this point it becomes meaningless to talk about a "byte sequence"; you have a string of characters, the first of which is U+FEFF.

Solution courtesy of: hobbs

Discussion

View additional discussion.



This post first appeared on Node.js Recipes, please read the originial post: here

Share the post

Why does Node.js convert BOM character to 0xFE 0xFF?

×

Subscribe to Node.js Recipes

Get updates delivered right to your inbox!

Thank you for your subscription

×