Programming Languages - String Interning
By Language
C++
No interning in C++: std::string
is modifiable.
Java
JVM creates a cache pool for strings to avoid frequently creating new string objects.
In Java, when we perform any operation using the intern() method, it returns a canonical representation for the string object. A pool is managed by a String class.
When the intern() method is executed, it checks whether the String equals to this String Object is in the pool. If it is available, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned. It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.
Whenever we create a String Object, two objects will be created i.e. One in the Heap Area and One in the String constant pool and the String object reference always points to the heap area object.
From Java 7 and later, the string pool is in the (normal) heap
jshell> String s1 = new String("foo")
s1 ==> "foo"
jshell> String s2 = "foo"
s2 ==> "foo"
jshell> s1 == s2
$3 ==> false
jshell> s1.intern() == s2
$4 ==> true
Python
Strings will be interned, saved as one object, so is
is returning True
>>> S1 = 'spam'
>>> S2 = 'spam'
>>> S1 == S2, S1 is S2
(True, True)
Another example
>>> S1 = 'alonglongstring'
>>> S2 = 'alonglongstring'
>>> S1 == S2, S1 is S2
(True, True)
String with empty space, is
is returning False
>>> S1 = 'a longer string'
>>> S2 = 'a longer string'
>>> S1 == S2, S1 is S2
(True, False)
Go
Go strings are immutable, so multiple strings can share the same underlying data. However, Go only caches strings during compilation, not the dynamic strings.
package main
import (
"fmt"
"reflect"
"unsafe"
)
// The built-in string is represented internally as a structure containing two fields:
// Data is a pointer to the string data and Len is a length of the string
type StringHeader struct {
Data uintptr
Len int
}
// stringptr returns a pointer to the string data.
func stringptr(s string) uintptr {
return (*reflect.StringHeader)(unsafe.Pointer(&s)).Data
}
s1 := "12"
s2 := "1"+"2"
fmt.Println(stringptr(s1) == stringptr(s2)) // true
// But strings generated at runtime are not interned.
s1 := "12"
s2 := strconv.Itoa(12)
fmt.Println(stringptr(s1) == stringptr(s2)) // false