Skip to content

Latest commit

 

History

History
372 lines (302 loc) · 14.1 KB

File metadata and controls

372 lines (302 loc) · 14.1 KB

Java switch-case internals

Switch-case statements are internally implemented with either tableswitch or lookupswitch bytecode instructions. Both instructions function by popping the stack for an integer value, and selecting a jump offset associated with the popped value.

The tableswitch instruction

The tableswitch instruction is a variable length instruction used for selecting and executing a jump based on a jump-table defined over a sorted, continuous list of possible integer values. Tool, such as javap, might represent the tableswitch instruction textually in ways similar to the following example.

tableswitch   { // 3 to 5
        3: 50
        4: 39
        5: 28
    default: 61
}

The instruction described above will pop the stack for an integer value and for values ranging from 3 to 5 perform a jump to the noted program address. For values outside the defined 3 to 5 range, the instruction will preform a jump to the address noted under the default label. Note, that the actual bytecode instruction uses relative offset values rather than the absolute addresses reported by javap.

The bytecode structure of the tableswitch instruction is as follows:

Data type Description

ubyte8

tableswitch opcode (0xAA)

-

Padding of (0-3) bytes so that the start of the default jump offset field falls on a address, which is a multiple of 4.

uint32

Default jump offset

uint32

Low value

uint32

High value

uint32

Jump offset 1, associated with value low

…​

…​

uint32

Jump offset n, associated with value high (low + n - 1)

The operations performed by the execution of the switchtable instruction can be illustrated with the following pseudo-code.

int value = pop();
if (value < lowValue || value > highValue) {
    pc += defaultOffset;
} else {
    pc += jumpOffset[value - lowValue];
}

The pop() expression in the above code represents the instruction popping the stack for an integer value and the pc variable stands for the program counter register, storing the address of the next instruction to be executed by the JVM.

The lookupswitch instruction

The lookupswitch instruction is used performing branching based on a list of key-offset pairs. This instruction is not as fast as tableswitch instruction, but allows branching on a list of individual values, rather than continuous ranges of values. This branching instruction essentially trades computational efficiency for space efficiency in cases where the branching must be performed on a sparse non-continuous set of values.

A sample lookupswitch as printed by javap might look something like the following:

lookupswitch { // 3
        0: 36
        1: 47
        500: 58
    default: 69
}

This instruction will pop the stack for an integer value, then will search it’s list of key-jump offset pairs for a matching value. The list of pairs are sorted by key to allow a better-than-linear searching.

If a matching key is found in the list, a jump will be performed using the associated jump offset. I case no matching key is found, the instruction will instead perform a jump based on the default jump offset of the instruction.

Note, that the actual bytecode instruction uses relative offset values rather than the absolute addresses reported by javap.

The bytecode structure of the lookupswitch instruction is as follows:

Data type Description

ubyte8

lookupswitch opcode (0xAB)

-

Padding of (0-3) bytes so that the start of the default jump offset field falls on a address, which is a multiple of 4.

uint32

Default jump offset

uint32

Number of key-offset pairs

uint32

Value of key 1

uint32

Jump offset for key 1

…​

…​

uint32

Value of key n

uint32

Jump offset for key n

Choosing the right switch instruction

When compiling a given switch statement the java compiler has to make a decision between either emitting a tableswitch or a lookupswitch instruction. This decision is based on the time/space complexity costs associated with each instruction.

The compiler will stick with the tableswitch instruction as long as the space and time costs for doing so are lower or equal to the cost of using lookupswitch.

The table used for calculating space/time costs for the two instruction types is the following:

Opcode Space cost Time cost

tableswitch

4 + (hi - lo + 1)

3

lookupswitch

3 + 2 * nlabels

nlabels

The hi and lo parameters are the highest and lowest value switch keys used in the switch statement and nlabels is the number of switch keys.

The overall space/time cost of using a specific instruction is calculated by using the formula: spaceCost + timeCost * 3.

Note
An interesting property the cost formula governing code compilation is that switch statements with a less than 3 labels and no gaps between the keys will compile as lookupswitch instructions. This is because, for such few labels, the calculated costs of lookupswitch, are still lower than those of the tableswitch alternative. The performance benefits of tableswitch only start to outperform lookupswitch as the number of labels reaches 3.

Using tableswitch over a non-continuous range of keys

Choosing between tableswitch and lookupswitch based on space/time cost analysis can lead to situations where a tableswitch instruction needs to operate over a non-continuous range of key values. The compiler will resolve this problem by filling in the gaps between the sparse keys with jumps to the default label of the switch statement.

See the following lookupswitch instruction over the sparse value set of 0, 1, 3 and 5 as an example.

lookupswitch   { // 4
        0: 40
        1: 51
        3: 62
        5: 73
    default: 84
}

By inserting values 2 and 4 as jumps to the default offset (84), the instruction can be converted into a semantically equivalent tableswitch instruction:

tableswitch   { // 0 to 5
    0: 40
    1: 51
    2: 84   // dummy case
    3: 62
    4: 84   // dummy case
    5: 73
    default: 84
}

Using a tableswitch instead of a lookupswitch in cases such as these is a decision to trade the space efficiency provided by sparse keys for the constant time lookup of a table-based approach.

Implementing java switches over byte, short, char, and int values

Java switches over numeric values are amongst the java statements where the compiled form of the statement resemble very closely that of the original source code.

Consider the following sample code as an example:

switch (a) {
    case 0:
        System.out.println("zero");
        break;
    case 1:
        System.out.println("one");
        break;
    case 2:
        System.out.println("two");
        break;
    default:
        System.out.println("other");
}

The code compiled from the above source is the following:

0: iload_0
1: tableswitch   { // 0 to 2
            0: 28
            1: 39
            2: 50
        default: 61
    }
28: ...                 // System.out.println("zero");
36: goto          69    // break;
39: ...                 // System.out.println("one");
47: goto          69    // break;
50: ...                 // System.out.println("two");
58: goto          69    // break;
61: ...                 // System.out.println("other");
69: return

Probably the first noticeable feature of the compiled code is, that case labels were removed from what used to be the switch body and are sorted and integrated into switch’s tableswitch/lookupswitch instruction.

Unlike the keys themselves, compiler will preserves the ordering each case block, inserting a goto instruction in place of every break statement. Each such goto instruction is set up to redirect the flow of execution to the fist statement after the switch construct. Any case blocks omitting the break statement will likewise be missing their respective goto instructions, allowing the execution to "fall through" into the next case label.

In the absence of a default label, the compiled tableswitch/lookupswitch instruction will have it’s default jump offset set to the offset of the first statement following the switch construct.

Switching over long values is not supported, unless the value is manually downcast to int. This limitation is most likely imposed because the tableswitch and lookupswitch structures store ranges and key values as unsigned 32-bit integers.

Implementing java switches over String values

The first Java version to support switching over Strings values was JDK 7. Interestingly, this support was introduced with the help of some additional compiler trickery as none of underlying tableswitch and lookupswitch instructions actually work on anything other than numeric data.

See the following switch as an example:

Original code
switch (a) {
    case "aaa":
        System.out.println("aaa");
        break;
    case "bbb":
        System.out.println("bbb");
        break;
    case "ccc":
        System.out.println("ccc");
        break;
    default:
        System.out.println("other");
}

Switch statements, like the one used in the above code are compiled as a series of two consecutive switch statements. The first switch branches over the hash codes of each string label, mapping them to a unique 0-base index, or in the case of missing values, the -1 magic constant. Possible hash collisions are guarded against by performing additional equality checks against the label values.

Compiled code, mapping string labels to id’s
byte $var2 = -1;
switch(a.hashCode()) {
    case 96321:
        if (a.equals("aaa")) {
            $var2  = 0;
        }
        break;
    case 97314:
        if (a.equals("bbb")) {
            $var2  = 1;
        }
        break;
    case 98307:
        if (a.equals("ccc")) {
            $var2  = 2;
        }
}

The second half of the switch pair is a compiled version of the source code original, modified to switch over the numeric labels resolved by the previous switch.

Compiled code, switching over mapped string id’s
switch($var2) {
    case 0:
        System.out.println("aaa");
        break;
    case 1:
        System.out.println("bbb");
        break;
    case 2:
        System.out.println("ccc");
        break;
    default:
        System.out.println("other");
}

Switching over string labels with non-unique hash codes

Hash collisions between distinct string labels require additional equality checks to be performed against possible string values. The following example is intended to showcase this behavior by using the string labels "Ea" and "FB", that, despite having distinct values, share a single hash code.

Original code
switch (a) {
    case "aaa":
        System.out.println("aaa");
        break;
    case "FB":
        System.out.println("FB");
        break;
    case "Ea":
        System.out.println("Ea");
        break;
}

The code generated for mapping the above labels into unique identifiers will contain equality checks for "Ea" and "FB" under the same hash label:

Compiled code, mapping non-unique hashes to id’s
byte $var2 = -1;
switch(a.hashCode()) {
    case 2236:
        if (a.equals("Ea")) {
            $var2 = 2;
        } else if (a.equals("FB")) {
            $var2 = 1;
        }
        break;
    case 96321:
        if (a.equals("aaa")) {
            $var2 = 0;
        }
}

Implementing java switches over Enum values

Switching over Enum values present a similar problem to using Strings: internally switch statements can only operate on numeric data. To solve this problem, switch statements have to rely on the numeric ordinal value of an Enum object, provided by the int ordinal() method. Unfortunately, the ordinal values are only part of the runtime state of enum constant objects and, as such, are not available during the compilation process. To overcome this limitation, the Java compiler needs to generate additional lookup structures in the form of static inner classes that, when loaded, can bootstrap themselves with the ordinal values available at runtime. To demonstrate this, consider the following Enum class and switch statement as an example.

Enum class
public enum SomeEnum {
    ONE, TWO, THREE
}
Original switch
switch (a) {
    case ONE:
        System.out.println("one");
        break;
    case TWO:
        System.out.println("two");
        break;
    case THREE:
        System.out.println("three");
        break;
    default:
        System.out.println("other");
}

Assuming that, the above switch statement is used in a class named "Outer", the generated helper class would look something like the following:

Generated helper class with enum lookup table
static class Outer$1 {

    static final int[] $SwitchMap$SomeEnum;

    static {
        $SwitchMap$SomeEnum = new int[SomeEnum.values().length];
        $SwitchMap$SomeEnum[SomeEnum.ONE.ordinal()] = 1;
        $SwitchMap$SomeEnum[SomeEnum.TWO.ordinal()] = 2;
        $SwitchMap$SomeEnum[SomeEnum.THREE.ordinal()] = 3;
    }
}

For each Enum used in switch statements of a class a single static final int[] field is generated to act as the lookup table of the type. The table is populated by the static constructor of the helper class to map Enum constant ordinals to compiler-assigned numeric case identifiers. Enum constants not used in the switch statement are conveniently assigned the 0 case identifier and thus require no explicit initialization. With the help of the generated lookup table, the original switch can be compiled into as switch over numeric case labels.

Compiled switch, branching over numeric values
switch(Outer$1.$SwitchMap$SomeEnum[a.ordinal()]) {
    case 1:
        System.out.println("one");
        break;
    case 2:
        System.out.println("two");
        break;
    case 3:
        System.out.println("three");
        break;
    default:
        System.out.println("other");
}