Correct SHA256 implementation with UTF-8 characters

Marcus

I'm running into issues comparing SHA256 hashes generated by different languages/functions.

For example, SHA256("í") either returns:

f3df1f9c358ae8eceb8fce7c00614288d113ad55315f4ebb909774a7daadfc84

-or-

127035a8ff26256ea0541b5add6dcc3ecdaeea603e606f84e0fd63492fbab2c5

Which of the above hash is correct for a string of one character, and what's the correct way of handling UTF-8 strings?

deceze

Which of the above hash is correct for a string of one character

There is no "correct" answer. What's being hashed is the bytes, not the "character". What bytes are hashed exactly depends on the encoding of the string.

"í" in Windows-1252 is byte ED, which hashes as:

f3df1f9c358ae8eceb8fce7c00614288d113ad55315f4ebb909774a7daadfc84

"í" in UTF-8 is bytes C3 AD, which hashes as:

127035a8ff26256ea0541b5add6dcc3ecdaeea603e606f84e0fd63492fbab2c5

"í" in UTF-16LE is bytes ED 00, which hashes as:

430e2ca27910b5ee6e0ec56a12b81325c763376cb8e25a60362dce9444424f95

How exactly that works in various programming languages depends on the languages and the encodings they use for strings.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related