无忧传媒

The Novel Cipher in Kimsuky APT's Malware

Written by Tom Dean

Kimsuky cipher malware samples

Research uncovers cyber threat actor鈥檚 encryption method

Kimsuky鈥攁lso known as VelvetChollima, BlackBanshee, and Thallium鈥攊s a North Korean threat actor that has been active since at least 2012. This advanced persistent threat consistently targets South Korean organizations. Its malware arsenal includes modules designed to collect Hangul Word Processor (HWP) documents tied to the Hancom Office software bundle widely used in South Korea.

This threat actor often targets government and defense agencies, research institutes, and non-governmental organizations (NGO), as well as individuals writing on aspects such as North Korean nuclear issues. In early 2019, Kimsuky also conducted campaigns targeting U.S. think tanks. The Cybersecurity and Infrastructure Security Agency (CISA), the FBI, and the U.S. Cyber Command Cyber National Mission Force (CNMF) issued about Kimsuky in 2020.

Our research sheds new light on the encryption mechanism found in various Kimsuky malware samples, including those named Gold Dragon and Ghost419 (see Figure 7 at the end of the blog for representative sample hashes). Open-source reporting does not identify or describe the encryption mechanism. As a member of the 无忧传媒 Adversary Pursuit cell, we believe that this encryption mechanism鈥攚hich we call the 鈥淜imsuky cipher鈥濃攚as invented by the malware authors. The mechanism does not match any publicly known cipher. This blog provides a full description, along with a reference implementation.

Researchers from the 无忧传媒 Adversary Pursuit cell discovered the Kimsuky cipher while compiling reporting on tactics, techniques, and procedures (TTP) used by the Kimsuky group. We reverse-engineered the cipher from binary samples. Our team has found that Kimsuky typically uses this cipher in conjunction with hard-coded keys. Re-implementation of this cipher allows researchers to decrypt Kimsuky network traffic in real time, providing insight into actor鈥檚 TTPs, real-time threat intelligence, and incident response assistance.听听

Cipher Overview

The Kimsuky cipher uses a 128-bit key and a 32-bit initialization vector (IV), and functions as a stream cipher. First, there is a key expansion step that sets up the internal state of the cipher. Then, the state of the cipher is advanced outputting one bit at a time, which is masked over the plaintext.

The cipher consists of three shift registers and a lookup table, which functions as . Two of three shift registers are entirely linear (linear-feedback shift registers or LFSR). The third shift register functions linearly during key expansion but has a non-linear update step during cipher operation. The S-box is initially filled with an alternating sequence of 0鈥檚 and 1鈥檚. At each step, two entries of the S-box are swapped. The indices of these entries are selected from a subset of bits from one of the shift registers.听

The two open-source ciphers鈥擳RIVIUM and RC4鈥攃ontain many design similarities to the Kimsuky cipher. TRIVIUM consists of three non-linear feedback shift registers whose output is combined to form the keystream. However, TRIVIUM lacks any form of S-box or permutation table. Also, the S-box used in the Kimsuky cipher is highly similar to the S-box used in RC4, which swaps entries at each step in a similar fashion. Both TRIVIUM and RC4 likely influenced the design on the Kimsuky cipher.

Key Expansion

The key and IV are first placed into a 288-bit vector that also contains a run of 128 zeros. This vector is depicted in听Figure听1, and will be denoted as听饾憠.

Chart of the 288-bit vector used to initially fill the three shift registers. Figure 1: The 288-bit vector used to initially fill the three shift registers
Chart on the components that make up the internal state of the cipher, their sizes, and variable names used to reference them. Figure 2: The components that make up the internal state of the cipher, their sizes, and variable names used to reference them

The three LFSRs are initialized to zero and then filled from this vector. The LFSRs, their register sizes, and the variable names used to refer to their internal state are listed in听Figure 2. These registers get filled from the vector听饾憠 by iteratively placing a single bit into each LFSR,听i.e.,听饾憥[0]=饾懀[0],听饾憦[0]=饾懀[1], 饾憪[0]=饾懀[2],听饾憥[1]=饾懀[3], etc. The indices of the shift-registers听in which the key is placed听are taken modulo听the shift register size. This effects LFSR 3 where the initial key bits are overwritten. This means that 42 bits of the听128-bit听key have no effect on the final output of the cipher.听

A drawing of the key expansion process. Three LFSRs are filled from the Key and IV and are each stepped out 10,000 cycles after filling. The S-box is initially filled with an alternating sequence of 0鈥檚 and 1鈥檚. Figure 3: A depiction of the key expansion process. Three LFSRs are filled from the Key and IV and are each stepped out 10,000 cycles after filling. The S-box is initially filled with an alternating sequence of 0鈥檚 and 1鈥檚

After the initial fill, the three LFSRs are stepped out by 10,000 steps. The shift registers,听also depicted in听Figure听3, are defined by the following polynomials over听GF(2):听

尝贵厂搁1:听 a204听+ a203听+ a194听+ a131+ 1
LFSR2:听 b203听+ b197听+ b185听+ b68+ 1
LFSR3:听 c29听+ c27+ 1

The听128-bit S-box is simply filled with the recurring pattern听鈥0 1 0 1 0 1鈥︹.听

Keystream Generation

Drawing showing the keystream generation. A single bit is selected out of the S-box based on an index generated out of the registers of LFSR2. This bit is XOR鈥檈d with a register from LFSR2 and a register from LFSR3 to form the keystream. Figure 4: The keystream generation. A single bit is selected out of the S-box based on an index generated out of the registers of LFSR2. This bit is XOR鈥檈d with a register from LFSR2 and a register from LFSR3 to form the keystream

The process for retrieving a bit of keystream during the normal operation of the cipher is depicted in Figure 4. Seven bits are selected out of LFSR 2 and used as an index in the S-box. Specifically, the index selected from the following bits, listed from least significant to most significant order: 饾憦[40], 饾憦[33],听饾憦[29], 饾憦[20], 饾憦[17], 饾憦[13], 饾憦[10]. The S-box entry at this position is then XOR鈥檈d with the bits 饾憦[202]听and c[28], from LFSRs 2 and 3, respectively, to form the keystream output.

S-Box Update Step

Drawing showing the S-box update process. Two values from the S-box are swapped the indices for these values are generated from registers in LFSR2 Figure 5: The S-box update process. Two values from the S-box are swapped. The indices for these values are generated from registers in LFSR2

After each bit of output, the state of the cipher is advanced. The first step in the process, depicted in Figure 5, is to swap two entries in the S-box. The index of these two entries is selected using bits from LFSR 1. The first index is selected using the following bits, listed from least significant to most significant order: 饾憥[26], 饾憥[22], 饾憥[19], 饾憥[10], 饾憥[9], 饾憥[7], 饾憥[2].听The second index is selected using the bits 饾憥[126],听饾憥[123],听饾憥[118], 饾憥[114],听饾憥[109],听饾憥[103], 饾憥[100].

Shift-Register Update

Following the S-box update, the three shift registers are stepped out once in between keystream outputs. LFSRs 1 and 3 are operate entirely linearly according to the process described during key expansion. Shift register 2 contains a non-linear听component that takes input from shift register 1. Note that this input is computed before LFSR1 is advanced. This step is depicted in Figure 5, where symbol听 denotes a binary AND, and the symbol 听denotes a binary XOR. The non-linear input from LFSR1 can be described by the equation (a[20] a[30])(a[55](a[20]a[30])).

Drawing showing the Non-linear update step for LFSR2 used during cipher operation.  A non-linear combination of bits from LFSR1 is fed into the input of (the now non-linear) LFSR2 Figure 6: The non-linear update step for LFSR2 used during cipher operation. A non-linear combination of bits from LFSR1 is fed into the input of (the now non-linear) LFSR2
fa0b1aa8ff08df4ad254fde218d0b7a8e776d2cda27c4310af338a7b022b6559
a620434b2efc48ec46b5a618e269936cef984ad98cb33d5a656ae1a9eef7362f
fe7327bf67e37cb1a581868010034a4009c298ea73977deed4bb0fa781dace0f
22585b1bc8a43130c2cb4ab03ed7cf2eae20a6364caecbefa31945b7a34f28ff
25a1f1294a51ea92403605b93f7b808b489206d87561ca04cb7b6d3fdc98fc7e
c54837d0b856205bd4ae01887aae9178f55f16e0e1a1e1ff59bd18dbc8a3dd82
639cdfab319af1f9d064ac08f03f990a4a0702ccc07b47538751ce6e5214c95b
891913a89896932dc04caae096e46ebcf8ffbb7c55fdfe7fc5f272ed9354a76c
bfb8d13fcb64e3d09de2850b47d64492dbfc7bba58766546c1511f1fa59a64c9
4ff2a67b094bcc56df1aec016191465be4e7de348360fd307d1929dc9cbab39f
b1e28bc8720303326946ec69d8ad6c90b572e177d562bbe769abaf1aad3d9e1a
8f2cbc93b7cd5cdc54e1670105c3da682bae0b70bc6bc4b0c0c18ab5c40be9c4

Figure 7: Samples using this cipher (SHA256 hashes)

def bytes_to_bits( in_bytes ):
听听听 out = b''听听听听听 
    for b in in_bytes:
        for i in range(8):
听听听听听听听听听听 t = (b >> i) & 1
听听听听听听听听听听 out += t.to_bytes(1,'little')   
    return out
def bits_to_bytes( in_bytes ):
听 听 if len( in_bytes ) % 8 != 0:
听 听 听 听 raise ValueError()

听 听 out = b''
听 听 for i in range(int(len(in_bytes)/8)):
听 听 听 听 byte = 0
听 听 听 听 for j in range(8):
听 听 听 听 听 听 byte += in_bytes[8*i+j] << j听
听 听 听 听 out += byte.to_bytes(1,'little')

  听 return out

class kimsuky_cipher:
听 听 #LFSR registers
听 听 r1 = bytearray(204)
听 听 r2 = bytearray(203)
听 听 r3 = bytearray(29)
听 听 sbox = bytearray(128)
听 听 def __init__( self, key, iv ):
听 听 听 听 '''
听 听 听 听 This is the key expansion algorithm. First fill a buffer with
听 听 听 听 [ 128-bit key | 128-bits of zero | 32-bits of IV ]
听 听 听 听 '''
听 听 听 听 if len(key) != 16 and len(iv) != 8:
听 听 听 听 听 听 raise ValueError()

听 听 听 听 key = bytes_to_bits( key )
听 听 听 听 iv = bytes_to_bits( iv )

听 听 听 听 t = key + bytes(128) + iv

听 听 听 听 #bit 0 -> lfsr1 bit 0
听 听 听 听 #bit 1 -> lfsr2 bit 0
听 听 听 听 #bit 2 -> lfsr3 bit 0
听 听 听 听 #bit 4 -> lfsr1 bit 1
听 听 听 听 # etc
听 听 听 听 for i, b in enumerate(t):
听 听 听 听 听 听 j = int(i/3)
听 听 听 听 听 听 if i % 3 == 0:
听 听 听 听 听 听 听 听 self.r1[ j % 204 ] = t[i]
听 听 听 听 听 听 elif i % 3 == 1:
听 听 听 听 听 听 听 听 self.r2[ j % 203 ] = t[i]
听 听 听 听 听 听 else:
听 听 听 听 听 听 听 听 self.r3[ j % 29 ] = t[i]听 听听

听 听 听 听 #initialized as 0 1 0 1 ...
听 听 听 听 for i in range(128):
  听 听 听 听 听 self.sbox[i] = i % 2

听 听 听 听 #run out each lfsr 10000x
听 听 听 听 for i in range(10000):
听 听 听 听 听 听 b = self.r1[0] ^ self.r1[1] ^ self.r1[10] ^ self.r1[73]
听 听 听 听 听 听 self.r1[0:-1] = self.r1[1:]
听 听 听 听 听 听 self.r1[-1] = b

听 听 听 听 听 听 b = self.r2[0] ^ self.r2[6] ^ self.r2[18] ^ self.r2[135]
  听 听 听 听 听 self.r2[0:-1] = self.r2[1:]
听 听 听 听 听 听 self.r2[-1] = b

听 听 听 听 听 听 b = self.r3[0] ^ self.r3[2]
听 听 听 听 听 听 self.r3[0:-1] = self.r3[1:]
听 听 听 听 听 听 self.r3[-1] = b听 听 听 听 

听 听 def keystream( self, n_bytes ):
听听听听听听听 n_bits = 8*n_bytes
听听听 听听听听out_bits = bytearray(n_bits)
听听听听听听听 for i in range(n_bits):
听听听听听听听听听听听 out_bits[i] = self.next_bit()
 听听听听听听听    return bits_to_bytes( out_bits )

听 听 def next_bit( self ):
听听听听听听听 #get a byte from r2 and use it to pick an entry out of the sbox
听听听听听听听 byte_from_r2听 = (self.r2[40]) | (self.r2[33] << 1) | (self.r2[29] << 2) | (self.r2[20] << 3)
听听听听听听听 byte_from_r2 |= (self.r2[17] << 4) | (self.r2[13] << 5) | (self.r2[10] << 6)
 
听听听听听听 s_byte = self.sbox[ byte_from_r2 ]

听听听听听听 #combine with output from r2 and r3 to make cipher bit
听听听听听听 mask_bit = s_byte ^ self.r3[28] ^ self.r2[202]

听听听听听  self.__advance_state()

听听听听听听 return mask_bit

听 听 def __advance_state( self ):

听听听     #get two bytes from r1 and use them to select to sbox locations
听听听听听听听 b1 = (self.r1[26]) | (self.r1[22] << 1) | (self.r1[19] << 2) | (self.r1[10] << 3)
听听听听听听听 b1 |= (self.r1[9] << 4) | (self.r1[7] << 5) | (self.r1[2] << 6)

听听听听听听听 b2听 = (self.r1[126]) | (self.r1[123] << 1) | (self.r1[118] << 2) | (self.r1[114] << 3)
听听听听听听听 b2 |= (self.r1[109] << 4) | (self.r1[103] << 5) | (self.r1[100] << 6)

听听听听听听听 #swap the two locations
听听听听听听听 self.sbox[ b1 ], self.sbox[ b2 ] = self.sbox[ b2 ], self.sbox[ b1 ]听听听听听听听 

        #lfsr 2 has a non-linear input from r1
听听听听听听听 b = self.r1[20] ^ self.r1[30]
听听听听听听听 b &= self.r1[55]
听听听听听听听 b ^= self.r1[20] & self.r1[30]
听听听听听听听 b ^= self.r2[0] ^ self.r2[6] ^ self.r2[18] ^ self.r2[135]听听听听听听听 
        self.r2[0:-1] = self.r2[1:]
听听听听听听听 self.r2[-1] = b

听听听听听听听 #step out lfsr 1 and 3 as normal
听听听听听听听 b = self.r1[0] ^ self.r1[1] ^ self.r1[10] ^ self.r1[73]
听听听听听听听 self.r1[0:-1] = self.r1[1:]
听听听听听听听 self.r1[-1] = b

听听听听听听听 b = self.r3[0] ^ self.r3[2]
听听听听听听听 self.r3[0:-1] = self.r3[1:]
听听听听听听听 self.r3[-1] = b

Figure 8: Python implementation of cipher

This blog series听is brought to you by 无忧传媒 DarkLabs. Our听DarkLabs听is an elite team of security researchers, penetration testers, reverse engineers, network analysts, and data scientists, dedicated to stopping cyber attacks before they occur.

This article is for informational purposes only, its content may be based on employees鈥 independent research, and does not represent the position or opinion of 无忧传媒. Furthermore, 无忧传媒 disclaims all warranties in the article's content, does not recommend/endorse any third-party products referenced therein, and any reliance and use of the article is at the reader鈥檚 sole discretion and risk.

1 - 4 of 8