logo

Programming Languages - String Interning

By Language

C++

No interning in C++: std::string is modifiable.

Java

JVM creates a cache pool for strings to avoid frequently creating new string objects.

In Java, when we perform any operation using the intern() method, it returns a canonical representation for the string object. A pool is managed by a String class.

When the intern() method is executed, it checks whether the String equals to this String Object is in the pool. If it is available, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned. It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.

Whenever we create a String Object, two objects will be created i.e. One in the Heap Area and One in the String constant pool and the String object reference always points to the heap area object.

From Java 7 and later, the string pool is in the (normal) heap

jshell> String s1 = new String("foo")
s1 ==> "foo"

jshell> String s2 = "foo"
s2 ==> "foo"

jshell> s1 == s2
$3 ==> false

jshell> s1.intern() == s2
$4 ==> true

Python

Strings will be interned, saved as one object, so is is returning True

>>> S1 = 'spam'
>>> S2 = 'spam'
>>> S1 == S2, S1 is S2
(True, True)

Another example

>>> S1 = 'alonglongstring'
>>> S2 = 'alonglongstring'
>>> S1 == S2, S1 is S2
(True, True)

String with empty space, is is returning False

>>> S1 = 'a longer string'
>>> S2 = 'a longer string'
>>> S1 == S2, S1 is S2
(True, False)

Go

Go strings are immutable, so multiple strings can share the same underlying data. However, Go only caches strings during compilation, not the dynamic strings.

package main

import (
    "fmt"
    "reflect"
    "unsafe"
)

// The built-in string is represented internally as a structure containing two fields:
// Data is a pointer to the string data and Len is a length of the string
type StringHeader struct {
        Data uintptr
        Len  int
}

// stringptr returns a pointer to the string data.
func stringptr(s string) uintptr {
    return (*reflect.StringHeader)(unsafe.Pointer(&s)).Data
}

s1 := "12"
s2 := "1"+"2"
fmt.Println(stringptr(s1) == stringptr(s2)) // true

// But strings generated at runtime are not interned.
s1 := "12"
s2 := strconv.Itoa(12)
fmt.Println(stringptr(s1) == stringptr(s2)) // false