Java - String

char

16-bit unsigned integers, representing Unicode code points in the Basic Multilingual Plane, encoded with UTF-16, and whose default value is the null code point (\u0000)

Range: 0x0000 - 0xFFFF

Why String is Immutable(Final)

Security: Java class loading mechanism works on class names passed as parameters, Strings are made immutable to prevent malicious manipulations
Performance/Cache-able
Thread-safety: can be share between multiple threads

Encoding

internal encoding: UTF-16, cannot be changed
default encoding: getBytes() uses so-called "default encoding", which may not be UTF-8 http://docs.oracle.com/javase/tutorial/i18n/text/string.html

CharSequence vs String vs StringBuilder vs StringBuffer

String: immutable; StringBuilder/StringBuffer: modifiable
StringBuffer: thread safe(all its methods are declared as "synchronized")
StringBuilder: not thread safe, better performance.
CharSequence: an Interface. String, StringBuffer and StringBuilder all implement CharSequence.

Format

String.format

width 12, right align

String.format("%12s", "my-text")

width 12, left align

String.format("%-12s", "my-text")

width 12(11+%), precision 2, float number with %

String.format("%11.2f%%", rate * 100)

DecimalFormat

Import DecimalFormat first:

jshell> import java.text.DecimalFormat

Note that 0.789 became 0.8 due to the format:

jshell> new DecimalFormat("###,###.#").format(123456.789)
$1 ==> "123,456.8"

This is handy to format the number to dollars:

jshell> new DecimalFormat("$###,###.##").format(123456.789)
$2 ==> "$123,456.79"

SubString

jshell> String s = "{asdf}";
s ==> "{asdf}"

jshell> s.substring(1, s.length() - 1)
$1 ==> "asdf"

Parse ArrayList toString() result

ArrayList's toString() will generate a string like [0, 1, 2]. To parse it:

String trimmed = rawString.substring(1, rawString.length() - 1);
String[] parts = StringUtils.split(trimmed, ",");

`new String()` vs String Literal

Compare:

String s = new String("foo");
String s = "foo";

new String(): creates new object in heap; time and memory consuming.
String literal: creates string literal only once in constant pool.

Checkout String Comparison section for examples.

`==` vs `.equals()`

== only checks if they point to the same object
.equals() or .equalsIgnoreCase(): check if the string content are the same

jshell> String a = new String("foo");
a ==> "foo"

jshell> String b = new String("foo");
b ==> "foo"

jshell> a == b
$3 ==> false

jshell> a.equals(b)
$4 ==> true

However String literals can be tested by ==

jshell> String c = "foo"
c ==> "foo"

jshell> String d = "foo"
d ==> "foo"

jshell> c == d
$7 ==> true

jshell> c.equals(d)
$8 ==> true

jshell> a == c
$9 ==> false

jshell> a.equals(c)
$10 ==> true

More example

jshell> String e = "f" + "oo";
e ==> "foo"

jshell> c == e
$12 ==> true

String Split

This does not work:

jshell> String raw = "1|2|3|4";
raw ==> "1|2|3|4"

jshell> raw.split("|")
$14 ==> String[7] { "1", "|", "2", "|", "3", "|", "4" }

It needs proper escapes:

jshell> raw.split("\\|")
$15 ==> String[4] { "1", "2", "3", "4" }

or use this so | is not interpreted as or

jshell> "1|2|3|4".split("[|]")
$1 ==> String[4] { "1", "2", "3", "4" }

Trailing empty strings will be ignored:

jshell> "/".split("/")
$16 ==> String[0] {  }

jshell> "/a".split("/")
$17 ==> String[2] { "", "a" }

jshell> "//a/".split("/")
$18 ==> String[3] { "", "", "a" }

jshell> "//a".split("/")
$19 ==> String[3] { "", "", "a" }

jshell> "///".split("/")
$20 ==> String[0] {  }

Replace: String.replace() vs String.replaceAll()

Compare the 4 replace functions:

String replace(char oldChar, char newChar)
String replace(CharSequence target, CharSequence replacement)
String replaceAll(String regex, String replacement)
String replaceFirst(String regex, String replacement)

The difference: RegEx: replace() only replaces plain text, while replaceAll and replaceFirst() will take a regular expression

String vs StringBuffer vs CharArray

String: cannot change once defined. StringBuffer: can change.

CharArray

In Java String is immutable. Convert a String to CharArray if necessary.

String s = new String("hello");

// String to CharArray
char[] c = s.toCharArray();

// CharArray to String
String s2 = new String(c);

Convert StringBuffer to CharArray

StringBuffer strBuf = new StringBuffer("hello");

// StringBuffer to String to CharArray
char[] c = strBuf.toString().toCharArray();

// CharArray to String to StringBuffer
StringBuffer strBuf2 = new StringBuffer(new String(c));

Bytes

Convert String to Bytes:

jshell> "abcd".getBytes()
$1 ==> byte[4] { 97, 98, 99, 100 }

Or use Charset:

jshell> import java.nio.charset.Charset;

jshell> Charset.forName("UTF-8").encode("abcd").array()
$2 ==> byte[4] { 97, 98, 99, 100 }

The default Charset is UTF-8:

jshell> Charset.defaultCharset()
$3 ==> UTF-8

Try to use UTF-16. Note that UTF-16 without BE/LE, will prepend BOM(Byte Order Mark), in this case -2, -1, i.e. FE FF

jshell> "abcd".getBytes("UTF-16")
$4 ==> byte[10] { -2, -1, 0, 97, 0, 98, 0, 99, 0, 100 }

With BE or LE there's no BOM

jshell> "abcd".getBytes("UTF-16BE")
$49 ==> byte[8] { 0, 97, 0, 98, 0, 99, 0, 100 }

jshell> "abcd".getBytes("UTF-16LE")
$50 ==> byte[8] { 97, 0, 98, 0, 99, 0, 100, 0 }

However in this case the lengths are different even if they are in same encoding, there are trailing 0s in the 2nd way:

jshell> "你好".getBytes("UTF-8")
$5 ==> byte[6] { -28, -67, -96, -27, -91, -67 }

jshell> Charset.forName("UTF-8").encode("你好").array()
$6 ==> byte[11] { -28, -67, -96, -27, -91, -67, 0, 0, 0, 0, 0 }