Should source code be saved in UTF-8 format

JARC :

How important is it to save your source code in UTF-8 format?

Eclipse on Windows uses CP1252 character encoding by default. The CP1251 format means non UTF-8 characters can be saved and I have seen this happen if you copy and paste from a Word document for a comment.

The reason I ask is because out of habit I set-up Maven encoding to be in UTF-8 format and recently it has caught a few non mappable errors.

(update) Please add any reasons for doing so and why, are there some common gotchas that should be known?

(update) What is your goal? To find the best practice so when ask why should we use UTF-8 I have a good answer, right now I don't.

McDowell :

What is your goal? Balance your needs against the pros and cons of this choice.

UTF-8 Pros

  • allows use of all character literals without \uHHHH escaping

UTF-8 Cons

  • using non-ASCII character literals without \uHHHH increases risk of character corruption
    • font and keyboard issues can arise
    • need to document and enforce use of UTF-8 in all tools (editors, compilers build scripts, diff tools)
  • beware the byte order mark

ASCII Pros

  • character/byte mappings are shared by a wide range of encodings
    • makes source files very portable
    • often obviates the need for specifying encoding meta-data (since the files would be identical if they were re-encoded as UTF-8, Windows-1252, ISO 8859-1 and most things short of UTF-16 and/or EBCDIC)

ASCII Cons

  • limited character set
  • this isn't the 1960s

Note: ASCII is 7-bit, not "extended" and not to be confused with Windows-1252, ISO 8859-1, or anything else.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

how to read and compare different words in a single line of a file which is saved in utf-8 format? in python?

How to force Javapoet to create UTF-8 Java source code?

Is it safe to use UTF-8 character literals in JavaScript source code?

UTF-8 characters saved as ? on Linux MySQL

How to display UTF-16 const strings if the source code is UTF-8 encoded in win32?

Can I make browsers choose UTF-8 for viewing source code?

perl hdb debugger: browser displays UTF-8 source code in wrong encoding

Switching a Git repository from ISO-8859-1 to UTF-8 encoding for source code files

ActionView::Template::Error (Your template was not saved as valid utf-8

Openfire: Offline UTF-8 encoded messages are saved wrong

R encoding - Saved as UTF-8 with wrong characters (I think)

Maven: Source Encoding in UTF-8 not working?

UTF-8 source files are not supported in avisynth

UTF-8 characters get saved as ?? on insert, but gets saved correctly on update

How to format utf8_encode messages?

Format of v in the JVM modified UTF-8

Parse Notification .net UTF-8 format

Perl read .DAT file with UTF-8 BOM format and write it with UTF-8 format without BOM

Convert Shift_JIS format to UTF-8 format

How can I format documents that are not saved in visual studio code?

Where should I put my source code?

Unicode code point to utf8 and wctomb

utf-8 pictogram for qr code

Should a use utf-8 or "utf-8" as a charset value in an email header?

Software & Updates; should I turn on Canonical Partners source? Source code?

Perl IDE Padre: how to format the source code

StringEscapeUtils.escapeXml is converting utf8 characters which it should not

Strings display problems after converting java source files to utf-8 and setting eclipse to utf-8

Specify utf-8 character encoding in RTF? The text (in UTF-8) format is correctly shown in Sqlite