Welcome to Anh Luân's website: How to export plain text to UTF-8

I explain about how to determine text file encoding:
File contains data: Hello

48 65 6C 6C 6F: This is the traditional ANSI encoding.
48 00 65 00 6C 00 6C 00 6F 00: This is the Unicode (little-endian) encoding with no BOM.
FF FE 48 00 65 00 6C 00 6C 00 6F 00: This is the Unicode (little-endian) encoding with BOM. The BOM (FF FE) serves two purposes: First, it tags the file as a Unicode document, and second, the order in which the two bytes appear indicate that the file is little-endian.
00 48 00 65 00 6C 00 6C 00 6F: This is the Unicode (big-endian) encoding with no BOM. Notepad does not support this encoding.
FE FF 00 48 00 65 00 6C 00 6C 00 6F: This is the Unicode (big-endian) encoding with BOM. Notice that this BOM is in the opposite order from the little-endian BOM.
EF BB BF 48 65 6C 6C 6F: This is UTF-8 encoding. The first three bytes are the UTF-8 encoding of the BOM.
2B 2F 76 38 2D 48 65 6C 6C 6F: This is UTF-7 encoding

Here is a test example code:
---------------------
FileInputStream fileStream = new FileInputStream( "d:\\4.txt" );
byte[] arr = new byte[]{1,2,3};
fileStream.read(arr);
System.out.println(arr[0]);
System.out.println(arr[1]);
System.out.println(arr[2]);

System.out.println("...................");
System.out.println("utf-8:" + (byte)0xEF + " - " + (byte)0xBB + " - " + (byte)0xBF);//EF BB BF
System.out.println("big-endian: " + (byte)0xFE + " - " + (byte)0xFF);//FE FF
System.out.println("little-endian: " + (byte)0xFF + " - " + (byte)0xFE); //FF FE

May 24, 2011

How to export plain text to UTF-8

0 Comment:

Post a Comment

Các bài liên quan

Recent Comments

Xã hội - VnExpress.net

Radio Online

Support Online

Recent post

Followers

Email Subscriptions

List link Web/blog

Visits Online