Java - String
char
16-bit unsigned integers, representing Unicode code points in the Basic Multilingual Plane, encoded with UTF-16, and whose default value is the null code point (\u0000
)
Range: 0x0000
- 0xFFFF
Why String is Immutable(Final)
- Security: Java class loading mechanism works on class names passed as parameters, Strings are made immutable to prevent malicious manipulations
- Performance/Cache-able
- Thread-safety: can be share between multiple threads
Encoding
- internal encoding: UTF-16, cannot be changed
-
default encoding:
getBytes()
uses so-called "default encoding", which may not be UTF-8 http://docs.oracle.com/javase/tutorial/i18n/text/string.html
CharSequence vs String vs StringBuilder vs StringBuffer
String
: immutable; StringBuilder/StringBuffer: modifiableStringBuffer
: thread safe(all its methods are declared as "synchronized")StringBuilder
: not thread safe, better performance.CharSequence
: an Interface.String
,StringBuffer
andStringBuilder
all implementCharSequence
.
Format
String.format
width 12, right align
String.format("%12s", "my-text")
width 12, left align
String.format("%-12s", "my-text")
width 12(11+%), precision 2, float number with %
String.format("%11.2f%%", rate * 100)
DecimalFormat
Import DecimalFormat first:
jshell> import java.text.DecimalFormat
Note that 0.789 became 0.8 due to the format:
jshell> new DecimalFormat("###,###.#").format(123456.789)
$1 ==> "123,456.8"
This is handy to format the number to dollars:
jshell> new DecimalFormat("$###,###.##").format(123456.789)
$2 ==> "$123,456.79"
SubString
jshell> String s = "{asdf}";
s ==> "{asdf}"
jshell> s.substring(1, s.length() - 1)
$1 ==> "asdf"
Parse ArrayList toString() result
ArrayList
's toString() will generate a string like [0, 1, 2]
. To parse it:
String trimmed = rawString.substring(1, rawString.length() - 1);
String[] parts = StringUtils.split(trimmed, ",");
new String()
vs String Literal
Compare:
String s = new String("foo");
String s = "foo";
new String()
: creates new object in heap; time and memory consuming.- String literal: creates string literal only once in constant pool.
Checkout String Comparison section for examples.
==
vs .equals()
==
only checks if they point to the same object.equals()
or.equalsIgnoreCase()
: check if the string content are the same
jshell> String a = new String("foo");
a ==> "foo"
jshell> String b = new String("foo");
b ==> "foo"
jshell> a == b
$3 ==> false
jshell> a.equals(b)
$4 ==> true
However String literals can be tested by ==
jshell> String c = "foo"
c ==> "foo"
jshell> String d = "foo"
d ==> "foo"
jshell> c == d
$7 ==> true
jshell> c.equals(d)
$8 ==> true
jshell> a == c
$9 ==> false
jshell> a.equals(c)
$10 ==> true
More example
jshell> String e = "f" + "oo";
e ==> "foo"
jshell> c == e
$12 ==> true
String Split
This does not work:
jshell> String raw = "1|2|3|4";
raw ==> "1|2|3|4"
jshell> raw.split("|")
$14 ==> String[7] { "1", "|", "2", "|", "3", "|", "4" }
It needs proper escapes:
jshell> raw.split("\\|")
$15 ==> String[4] { "1", "2", "3", "4" }
or use this so |
is not interpreted as or
jshell> "1|2|3|4".split("[|]")
$1 ==> String[4] { "1", "2", "3", "4" }
Trailing empty strings will be ignored:
jshell> "/".split("/")
$16 ==> String[0] { }
jshell> "/a".split("/")
$17 ==> String[2] { "", "a" }
jshell> "//a/".split("/")
$18 ==> String[3] { "", "", "a" }
jshell> "//a".split("/")
$19 ==> String[3] { "", "", "a" }
jshell> "///".split("/")
$20 ==> String[0] { }
Replace: String.replace() vs String.replaceAll()
Compare the 4 replace functions:
String replace(char oldChar, char newChar)
String replace(CharSequence target, CharSequence replacement)
String replaceAll(String regex, String replacement)
String replaceFirst(String regex, String replacement)
The difference: RegEx: replace()
only replaces plain text, while replaceAll
and replaceFirst()
will take a regular expression
String vs StringBuffer vs CharArray
String: cannot change once defined. StringBuffer: can change.
CharArray
In Java String is immutable. Convert a String
to CharArray
if necessary.
String s = new String("hello");
// String to CharArray
char[] c = s.toCharArray();
// CharArray to String
String s2 = new String(c);
Convert StringBuffer to CharArray
StringBuffer strBuf = new StringBuffer("hello");
// StringBuffer to String to CharArray
char[] c = strBuf.toString().toCharArray();
// CharArray to String to StringBuffer
StringBuffer strBuf2 = new StringBuffer(new String(c));
Bytes
Convert String to Bytes:
jshell> "abcd".getBytes()
$1 ==> byte[4] { 97, 98, 99, 100 }
Or use Charset
:
jshell> import java.nio.charset.Charset;
jshell> Charset.forName("UTF-8").encode("abcd").array()
$2 ==> byte[4] { 97, 98, 99, 100 }
The default Charset is UTF-8
:
jshell> Charset.defaultCharset()
$3 ==> UTF-8
Try to use UTF-16
. Note that UTF-16
without BE/LE
, will prepend BOM(Byte Order Mark), in this case -2, -1
, i.e. FE FF
jshell> "abcd".getBytes("UTF-16")
$4 ==> byte[10] { -2, -1, 0, 97, 0, 98, 0, 99, 0, 100 }
With BE
or LE
there's no BOM
jshell> "abcd".getBytes("UTF-16BE")
$49 ==> byte[8] { 0, 97, 0, 98, 0, 99, 0, 100 }
jshell> "abcd".getBytes("UTF-16LE")
$50 ==> byte[8] { 97, 0, 98, 0, 99, 0, 100, 0 }
However in this case the lengths are different even if they are in same encoding, there are trailing 0s in the 2nd way:
jshell> "你好".getBytes("UTF-8")
$5 ==> byte[6] { -28, -67, -96, -27, -91, -67 }
jshell> Charset.forName("UTF-8").encode("你好").array()
$6 ==> byte[11] { -28, -67, -96, -27, -91, -67, 0, 0, 0, 0, 0 }