Tuesday, April 8, 2014

Write a file from Java with Encoding "UTF-8 Without BOM"

The ultimate goal is to write the file with different encoding types (ANSI/UTF-8/UTF-8 without BOM):

The Code which I will be referring through out this post would be below
public static void main(String[] args) throws IOException {
OutputStreamWriter osw = null;
try{
//Example to write a file into file system
//Charset windows1252 = Charset.forName("windows-1252");
String filePath="D:\\temp\\";
filePath = filePath.concat(String.valueOf(new Date().getTime())).concat(".txt");
FileOutputStream fos = new FileOutputStream(filePath,false);
osw = new OutputStreamWriter(fos,"UTF-8");
osw.write("Sample");
osw.write("\uFEFF");
osw.write("File");
//osw.write(Charset.forName("UTF-8").encode("Sample"));
osw.close();
System.out.println("Success");
fos.close();
}
catch(Exception e)
{
System.out.println(e.getMessage());
osw.close();
}
}
Scenario 1: (You want to write a file with "ANSI" encoding) Below url helps


Writing straight forwardly(Simple plain text) would save the file in ANSI format even though your output stream reader has a character set "UTF-8". It is because there are no UTF-8 characters that you are writing. Unless you write UTF-8 characters the file will not change from ANSI. 


But if your requirement is not this, You have simple plain text but you have to write a file in UTF-8 Encoding you have to go through below,

Scenario 2: UTF-8 Encoding is divided into two types 1. With BOM and 2. Without BOM

Following piece of code helps to write a file with BOM,
Here adding \uFEFF character at initial block would make this file as UTF-8 with BOM,

The result of above code is shown below
Now your requirement is still not this, You have a simple plain text but you want to write the file with encoding as "Encoding in UTF-8 without BOM". Please refer scenario 3.

Scenario 3: (Write file using UTF-8 without BOM).

You just have to make sure you write "\uFEFF" character to make the file as "UTF-8" and write it after some simple text to make the encoding as "UTF-8 without BOM".

The result would be as shown below
Note : The encoding what this post is talking about all are taken from Notepad++





6 comments:

  1. hi Kiran,
    It helps , But one question.
    Will it work on existing file ? OR do you know how to convert an existing file to UTF 8 Without BOM

    ReplyDelete
  2. Hi Vinu,

    Obviously it will work on existing file, all you need to open the file using fileOutputStream in appending mode and just add "\uFEFF" character and save the file. Now you will be having the ur previous file as it is but with extra character "\uFEFF" which makes your file as "UTF-8 without BOM"

    I hope I addressed your query.

    ReplyDelete
  3. When a file is generated without BOM with UTF-8 encoding, notepad++ is showing as "ANSI as UTF-8". Is the correct? But when a file is generated with BOM and UTF-8, notepad++ shows UTF-8 as encoding of file. Can you explain this ?

    ReplyDelete