Why the stdlib probably already does the best thing - an array concatenation driven learning

Here’s what I was solving: Concatenation of Arrays

Three solutions: 1. One was returning input + input array. 2. The second was to allocate an intermediate array of size 2n and iterate through the input by index, putting each element at the correct indices. 3. Finally, using copyInto to make system call for moving memory blocks.

All of these use O(n) space for holding the result, O(n) for the iteration (eventually under the hood as well, I thought). I did not prefer the first approach because it felt like cheating and I also did not know if it’s performant enough - I was rather sure that it wouldn’t be.

Submitted each. Got surprised!

The simpler solution was far more performant than my loop, and marginally better than using the system call - it had a runtime of 82 milliseconds and was faster than 91% of all submissions versus 93 milliseconds and beat 52% for loop. Here’s why:

With iteration approach, the system moves each element into the new array one at a time. This is highly inefficient as it wastes CPU resources in trips; it might get worse with an increase in input size due to cache misses and round-trips to system memory.

The approach with copyInto is much more interesting. It internally creates a new array of required size calls into system.arraycopy native function (a JVM intrinsic). It moves the required memory chunk directly at the correct place instead of iterating over individual items. Till now, it is quite evident on why it is more performant. However, why is the first approach the simplest and marginally more performant than this? The answer lies in the number of instructions that need to be run by the system - looking at the bytecode, the stack is populated twice with the input array and the system copy is executed in first approach. Compare this to using copyInto, where an array is created then loaded into memory and then the input array is pushed into the correct places.

Looking at the bytecode explains the difference in executable code and explains the (marginal) gains.

input + input:

public final getConcatenation([I)[I
L0
    ALOAD 1
    LDC "nums"
    INVOKESTATIC kotlin/jvm/internal/Intrinsics.checkNotNullParameter (Ljava/lang/Object;Ljava/lang/String;)V
L1
    LINENUMBER 17 L1
    ALOAD 1
    ALOAD 1
    INVOKESTATIC kotlin/collections/ArraysKt.plus ([I[I)[I
    ARETURN
L2
    LOCALVARIABLE this Larrays/ConcatenationOfArray; L0 L2 0
    LOCALVARIABLE nums [I L0 L2 1
    MAXSTACK = 2
    MAXLOCALS = 2
}

copyInto:

public final getConcatenation([I)[I
L0
    ALOAD 1
    LDC "nums"
    INVOKESTATIC kotlin/jvm/internal/Intrinsics.checkNotNullParameter (Ljava/lang/Object;Ljava/lang/String;)V
L1
    LINENUMBER 19 L1
    ALOAD 1
    ARRAYLENGTH
    ICONST_2
    IMUL
    NEWARRAY T_INT
    ASTORE 2
L2
    LINENUMBER 26 L2
    ALOAD 1
    ALOAD 2
    ICONST_0
    ICONST_0
    ICONST_0
    BIPUSH 14
    ACONST_NULL
    INVOKESTATIC kotlin/collections/ArraysKt.copyInto$default ([I[IIIIILjava/lang/Object;)[I
    POP
L3
    LINENUMBER 27 L3
    ALOAD 1
    ALOAD 2
    ALOAD 1
    ARRAYLENGTH
    ICONST_0
    ICONST_0
    BIPUSH 12
    ACONST_NULL
    INVOKESTATIC kotlin/collections/ArraysKt.copyInto$default ([I[IIIIILjava/lang/Object;)[I
    POP
L4
    LINENUMBER 29 L4
    ALOAD 2
    ARETURN
L5
    LOCALVARIABLE result [I L2 L5 2
    LOCALVARIABLE this Larrays/ConcatenationOfArray; L0 L5 0
    LOCALVARIABLE nums [I L0 L5 1
    MAXSTACK = 7
    MAXLOCALS = 3
}

July 1, 2026 ∙ neetcode til