There are multiple ways to convert a byte array to String in Java but most
straight forward way is to use the String constructor which accepts a byte
array i.e. new String(byte []) , but key
thing to remember is character encoding. Since bytes are binary data but String
is character data, its very important to know the original character encoding
of the text from which byte array has created. If you use a different character
encoding, you will not get the original String back. For example, if you have
read that byte array from a file which was encoded in "ISO-8859-1"
and you have not provided any character encoding while converting byte array to
String using new String() constructor
then its not guaranteed that you will get the same text back? Why? because new String() by default uses platform's default encoding (e.g.
Linux machine where your JVM is running), which could be different than
"ISO-8859-1". If its different you may see some garbage characters or
even different characters changing the meaning of text completely and I am not
saying this by reading few books, but I have faced this issue in one of my
project where we are reading data from database which contains some french
characters. In the absent of any specified coding, our platform was defaulted on something
which is not able to convert all those special character properly, I don't
remember exact encoding. That issue was solved by providing "UTF-8"
as character encoding while converting byte array to String. Yes, there is
another overloaded constructor in String class which accepts character encoding
i.e. new String(byte[],
"character encoding").
BTW, if you are new in the world of character encoding and don't understand what is UTF-8 or UTF-16, I recommend you to read my article difference between UTF-8, UTF-16 and UTF-32 encoding. That will not only explain difference but also give you some basic idea about character encoding. Another article, I recommend you to read is about how Java deals with default character encoding. Since many classes which performs conversion between bytes and character cache character encoding, its important to learn how to provided proper encoding at JVM level. If this interests you then here is the link to full article.
How to convert byte
array to String in Java
Everything is 0 and 1 in
computers world, yet we are able to see different things e.g. text, images,
music files etc. The key to convert byte array to String is character encoding.
In simple word, byte values are numeric values and character encoding is map
which provide a character for a particular byte for example in most of
character encoding scheme e.g. UTF-8, if value of byte is 65, character is A,
for 66 it's B. Since ASCII character which includes, numbers, alphabets and
some special characters are very popular they have same value in most of
encoding scheme. But that's not true for every byte value for example -10 can be different
in UTF-8 and Windows-1252 encoding scheme. Now some one can question that,
since byte has 8 bits, it can only represent maximum 255 characters, which is
quite less given so many languages in the world. That's why we have multi byte
character encoding schemes, which can represent a lot many characters. Why we
need to convert bytes to String? one real world example is to display base 64
encoded data as text. In order to do that you need to convert
byte array to hex String as shown in that tutorial.
Java Byte Array to
String Example
Now we know little
bit of theory about how to convert byte array to String, let's see a working
example. In order to make the example simple, I have created a byte array on the program itself and
then converted that byte array into String using different character encoding
e.g. cp1252, which is default character encoding in Eclipse, windows1252
another popular encoding in Windows and UTF-8, which is a default standard
character encoding in world. If you run this program and look at the output you
will notice that most of the characters are same in all three encoding,
they are mostly ASCII characters containing alphabets in both upper and lower
case and numbers, but special characters are rendered differently. This is
where using incorrect character encoding can create trouble. Rest of the
example is pretty straight forward as we already have a byte array and we are
just using overloaded String constructor which also accepts encoding. For a
more complex example, where we read content from an XML file, see this tutorial.
There are also printable and non-printable characters in ASCII, which is
handled differently by different character encoding.
import java.io.UnsupportedEncodingException;
public class ByteArrayToStringDemo {
public static void main(String args[]) throws UnsupportedEncodingException {
byte[] random = new byte[] { 67, 65, 70, 69, 66, 65, 66, 69, -20};
String utf = new String(random, "UTF-8");
String cp1252 = new String(random, "Cp1252");
String windows1252 = new String(random, "Windows-1252");
System.out.println("String created from byte array in
UTF-8 encoding : " + utf);
System.out.println("byte array to
String in Cp1252 encoding : " + cp1252);
System.out.println("byte array to
String in Windows-1252 encoding : " + windows1252);
}
}
Output :
String created from byte array in UTF-8 encoding : CAFEBABE?
byte array to String in Cp1252 encoding : CAFEBABEì
byte array to String in Windows-1252 encoding : CAFEBABEì
That's all about how to convert byte array to String in Java. Always provide character encoding while converting bytes to character and that should be the same encoding which is used in original text. If you don't know then UTF-8 is good default but don't rely on platform's default character encoding because that is subject to change and might not be UTF-8. Better option is to set character encoding for your application at JVM level to have complete control on how byte array gets converted to String.
No comments:
Post a Comment