FCSC 2021 - Trappy Skippy

Posted on 03 May 2021, 18:00 in WriteUps

France CyberSecurity Challenge (FCSC) 2021

FCSC logo

The french national cybersecurity agency (ANSSI) has organised a CTF called the France CyberSecurity Challenge (FCSC for short) which goal is to select the team that will represent France during the European CyberSecurity Challenge on the 28th of September, 2021.

Description of Trappy Skippy

Trappy Skippy was a challenge in the crypto category. Its description was:

Skippy description

This translates to "Skippy the great guru challenges you to decrypt his flag". This is a reference to a skit by Les Inconnus. This doesn't have any impact on the challenge itself.

We are given two files:

  • A Python script, skippy.py;
  • The flag to decrypt, flag.txt.enc;
  • And a plaintext-ciphertext pair, skippy.txt, skippy.txt.enc.

Here is the content of skippy.py, that I will explain in summary just below:

`skippy.py` source code
import os

class Skippy:
    Sbox = [ 
        0x81, 0x3f, 0xab, 0x3d, 0xa4, 0xb4, 0x31, 0x9e,
        0xba, 0xee, 0x90, 0xec, 0x9f, 0x50, 0x85, 0x62,
        0xb8, 0xde, 0xa2, 0xf4, 0x08, 0x78, 0x0a, 0xc5,
        0xb3, 0x15, 0xa9, 0x27, 0x96, 0xac, 0x33, 0x11,
        0xa0, 0xdc, 0x05, 0x7b, 0xaf, 0xdd, 0xad, 0x7a,
        0x14, 0x9a, 0xb1, 0xaa, 0x29, 0x3b, 0x03, 0x23,
        0x99, 0x82, 0x3c, 0x98, 0x2b, 0x13, 0xa6, 0x21,
        0x2d, 0x63, 0x88, 0x51, 0x10, 0xc7, 0x9d, 0xf5,
        0x4c, 0x58, 0xf3, 0xd5, 0x65, 0x06, 0xc0, 0x91,
        0x5c, 0xd0, 0x76, 0xfa, 0xe6, 0x1a, 0xfc, 0x2a,
        0x5e, 0x6f, 0xd3, 0xf8, 0x6b, 0x97, 0x59, 0x18,
        0xf1, 0x68, 0xc3, 0x6a, 0xe8, 0x2e, 0x4d, 0x1c,
        0xd1, 0x5f, 0x44, 0x47, 0xcc, 0x00, 0xe4, 0xbf,
        0x7e, 0xcf, 0xe9, 0xcd, 0x7f, 0x04, 0x55, 0x89,
        0x7c, 0x40, 0xdb, 0x5a, 0x7d, 0x34, 0x67, 0x2c,
        0xe3, 0x5d, 0x46, 0xe2, 0xfe, 0x02, 0x69, 0x32,
        0x52, 0x87, 0xf7, 0x20, 0x79, 0x1d, 0x4b, 0x1f,
        0x09, 0x8e, 0x39, 0x1b, 0xa8, 0x0e, 0x0f, 0x0c,
        0xae, 0xbe, 0x0b, 0x19, 0x0d, 0x83, 0xb0, 0x26,
        0x48, 0x22, 0xef, 0x12, 0x49, 0x37, 0xf6, 0x92,
        0xb6, 0x94, 0x86, 0x01, 0x25, 0x3e, 0x17, 0x24,
        0xed, 0xb7, 0x60, 0xb5, 0x61, 0x35, 0xc6, 0x2f,
        0xdf, 0x3a, 0x4a, 0x38, 0x53, 0x8a, 0xc4, 0x07,
        0x84, 0xbc, 0x9c, 0x8c, 0xb2, 0x16, 0x80, 0x9b,
        0xbd, 0x71, 0x30, 0x73, 0xca, 0xe1, 0x75, 0x74,
        0xbb, 0xf0, 0x36, 0xea, 0xfd, 0x4e, 0xff, 0x64,
        0x8b, 0x4f, 0x1e, 0xc2, 0x70, 0x56, 0x72, 0x54,
        0xa5, 0xd6, 0xa7, 0xd4, 0x6d, 0xc9, 0x45, 0xf9,
        0xa3, 0xd8, 0xa1, 0xda, 0xd7, 0xeb, 0xe7, 0xd9,
        0x28, 0x41, 0x8d, 0x5b, 0xe0, 0xfb, 0xc8, 0x6e,
        0x95, 0xce, 0x8f, 0x43, 0xd2, 0xcb, 0x77, 0x6c,
        0xb9, 0xf2, 0x93, 0x57, 0xe5, 0xc1, 0x42, 0x66,
    ]

    def __init__(self, key):
        self._key = key + key[:2]

    def _g(self, k, w):
        g1 = w >> 8
        g2 = w & 0xff
        g3 = self.Sbox[g2 ^ self._key[(4 * k + 0) % 10]] ^ g1
        g4 = self.Sbox[g3 ^ self._key[(4 * k + 1) % 10]] ^ g2
        g5 = self.Sbox[g4 ^ self._key[(4 * k + 2) % 10]] ^ g3
        g6 = self.Sbox[g5 ^ self._key[(4 * k + 3) % 10]] ^ g4
        return (g5 << 8) ^ g6

    def _skip(self, b):
        w1 = (b[0] << 8) ^ b[1]
        w2 = (b[2] << 8) ^ b[3]
        w3 = (b[4] << 8) ^ b[5]
        w4 = (b[6] << 8) ^ b[7]

        k = 0
        for t in range(2):
            for i in range(8):
                gw1 = self._g(k, w1)
                w1, w2, w3, w4 = gw1 ^ w4 ^ (k + 1), gw1, w2, w3
                k += 1
            for i in range(8):
                gw1 = self._g(k, w1)
                w1, w2, w3, w4 = w4, gw1, w1 ^ w2 ^ (k + 1), w3
                k += 1

        return bytes([
            w1 >> 8, w1 & 0xff,
            w2 >> 8, w2 & 0xff,
            w3 >> 8, w3 & 0xff,
            w4 >> 8, w4 & 0xff,
        ])

    def encrypt(self, m):
        if len(m) % 8:
            m += b"\x00" * (8 - len(m) % 8)
        return b"".join(self._skip(m[i:i+8]) for i in range(0, len(m), 8))

k = os.urandom(8)
gourou = Skippy(k)

m = open("skippy.txt", "rb").read()
c = gourou.encrypt(m)
open("skippy.txt.enc", "wb").write(c)

m = open("flag.txt", "rb").read()
c = gourou.encrypt(m)
open("flag.txt.enc", "wb").write(c)

First, the script generates a 64-bit key and initialize a Skippy object with it. Then it encrypts the file skippy.txt and does the same for the file flag.txt. Let's look a little closer to the components of Skippy.

The underlying blockcipher, _skip, operates on 64-bit block. To encrypt a message of arbitrary length, it first pads with zero and then uses ECB mode.

_skip is working on a state made of four 16-bit words (for a total 64-bit state, as said before). Two "big rounds" of transformations are applied to the state. Each "big round" is made from two times eight smaller rounds. In these small rounds a non-linear transformation, _g, on a single word is applied; then a linear permutation (that can be seen as a binary 4x4 matrix) is applied; finally a round constant (here, k+1 with k a counter incremented each small round) is xored. There are two types of small rounds that only differ on their linear permutation.

The key is only involved during the computation of _g. In fact, _g uses an underlying 8-bit Sbox implemented as a table lookup and expands it to a 16-bit Sbox using a Feistel-like structure. Here is a the diagram for the computation of _g:

16-bit keyed feistel-like Sbox

The bytes \((K_{i}, K_{i+1}, K_{i+2}, K_{i+3})\) are four consecutive bytes (wrapping around) of the key expanded to a length of ten bytes. The key expansion from the initial eight to ten bytes is done by taking the initial key and concatenating its first two bytes to itself.

To summarize, Skippy is a Substitution-Permutation Network (SPN) with a partial non-linear layer (_g is not applied to the whole state). Moreover, the 16-bit Sbox is constructed from a 4-round feistel and the key addition is done during the application of the Sbox. The non-linear layer looks like the one in Robin, a block cipher from Grosso, Leurent, Standaert and Varici. Skippy was maybe also inspired by other block cipher that uses a partial non-linear layer like LowMC.

Edit: Skippy was in fact Skipjack, a block cipher made by the NSA and publicly released in 1998. The Sbox was changed and the key shortened to 64 bits.

Finally, skippy.txt is 14-block long and, together with skippy.txt.enc, they allow us to know 14 plaintext-ciphertext pairs.

Finding the flaw

The design of Skippy seems to be very close to existing "well known" designs: if there is a generic attack against this construction, maybe it is detailed in a paper somewhere and we just have to find and apply it. But here, it's not the case, so are we really expected to beat the international community of cryptographers? In fact, if there is one thing to remember about cryptography is that the devil is in the details. With a small change (like for example, a different lookup-table Sbox), a design can fall apart really quickly.

In our case, even if there are known attacks against the "partial non-linear layer" approach, those are often differential or algebraic attacks and are requiring a lot more data than just 14 plaintext-ciphertext pairs. In fact, this turns out to be the biggest hint for me: there really aren't a lot of attacks that work with so few data.

One of those is the meet-in-the-middle attack. This family of attacks consists in guessing only part of the key such that it is possible to compute part of the full encryption up to, say, the \(r\)-th round over the total \(n\) rounds. Then, we store the result for every partial-key guess and we compute the decryption of \(n-r\) rounds by guessing the remaining part of the key. When there is a collision (a decrypted "middle-state" that is already is our precomputed set of encrypted states), we have a candidate for our full key by assembling the two part key.

To detect if Skippy is vulnerable to this attack, I first needed to modify the encryption algorithm to allow doing only one of the two big rounds.

Meet-in-the-middle attacks require to independantly guess two "parts" of the key. The easy case is when some bits of the key are not used in the first half of the encryption/decryption. To find those bits, I used the following script:

Finding useless bits of the key
import os
from Crypto.Util.strxor import strxor
for b in range(64):
    i = b // 8
    j = b % 8
    mod = [0]*i + [1 << j] + [0]*(7-i)

    res = b'\0'*8
    for l in range(300):
        k0 = os.urandom(8)
        k1 = strxor(k0, bytes(mod)) # New key, only differs on bit b
        g0 = Skippy(k0)
        g1 = Skippy(k1)

        m = os.urandom(8)
        c0 = g0.encrypt(m, half=True)
        c1 = g1.encrypt(m, half=True)

        # res will be 1 on every bits that changes between c0 and c1
        res = bytes([a|b for a, b in zip(strxor(c0, c1), res)])

    if res != b'\xff'*8:
        print(bytes(mod).hex())
        print(res.hex())
        print()

The result is interesting:

Result of the previous script
0200000000000000
bfbfbfbfbfbfbfbf

0002000000000000
bfbfbfbfbfbfbfbf

0000020000000000
bfbfbfbfbfbfbfbf

0000000200000000
bfbfbfbfbfbfbfbf

0000000002000000
bfbfbfbfbfbfbfbf

0000000000020000
bfbfbfbfbfbfbfbf

0000000000000200
bfbfbfbfbfbfbfbf

0000000000000002
bfbfbfbfbfbfbfbf

When two keys are different on only the second least-significant-bit (LSB) of any byte, then the resulting "half-encrypted" ciphertext are always the same on the second most-significant-bit (MSB) of every byte. We will call these bits of the ciphertext \(b = 6 + 8i, 0 < i < 8\), the attack bits.

What is even more interesting is that this property is in fact also true for the fully encrypted (two big rounds) ciphertext. Thus, from now on, we can leave the meet-in-the-attack idea behind and focus on this behaviour.

Finding all equivalent keys

Thanks to the previous experiment, we have found pairs of keys that are equivalent for the attack bits of the ciphertext. This is very unusual for a "good" block cipher and should already be a big red flag. Unfortunately, it is not yet sufficient to have a powerful enough attack. What we would like is to find even bigger sets of equivalent keys. The first thing to try, is to find out if two keys that differ on any subset of those eight bits are also equivalent. The answer is yes:

Finding more equivalent keys
[...]
0000020202000000
bfbfbfbfbfbfbfbf

0200020202000000
bfbfbfbfbfbfbfbf

0002020202000000
bfbfbfbfbfbfbfbf

0202020202000000
bfbfbfbfbfbfbfbf
[...]

This tells us that a key is equivalent to \(2^{8} - 1\) other keys. To increase the size of those sets even more, I tried to find other differences on a single byte that produces the same result:

Finding even more equivalent keys
import os
from Crypto.Util.strxor import strxor
for b in range(256):
    mod = [b] + 7*[0]

    res = b'\0'*8
    for l in range(300):
        k0 = os.urandom(8)
        k1 = strxor(k0, bytes(mod)) # New key, only differs by b on the first byte
        g0 = Skippy(k0)
        g1 = Skippy(k1)

        m = os.urandom(8)
        c0 = g0.encrypt(m, half=False)
        c1 = g1.encrypt(m, half=False)

        # res will be 1 on every bits that changes between c0 and c1
        res = bytes([a|b for a, b in zip(strxor(c0, c1), res)])

    if res != b'\xff'*8:
        print(bytes(mod).hex())
        print(res.hex())
        print()

This leads to the following result:

Result of the previous script
0000000000000000
0000000000000000

0200000000000000
bfbfbfbfbfbfbfbf

1800000000000000
bfbfbfbfbfbfbfbf

1a00000000000000
bfbfbfbfbfbfbfbf

2800000000000000
bfbfbfbfbfbfbfbf

2a00000000000000
bfbfbfbfbfbfbfbf

3000000000000000
bfbfbfbfbfbfbfbf

3200000000000000
bfbfbfbfbfbfbfbf

8d00000000000000
bfbfbfbfbfbfbfbf

8f00000000000000
bfbfbfbfbfbfbfbf

9500000000000000
bfbfbfbfbfbfbfbf

9700000000000000
bfbfbfbfbfbfbfbf

a500000000000000
bfbfbfbfbfbfbfbf

a700000000000000
bfbfbfbfbfbfbfbf

bd00000000000000
bfbfbfbfbfbfbfbf

bf00000000000000
bfbfbfbfbfbfbfbf

By doing this, we retrieve \(16 = 2^{4}\) constants (\(\mathcal{S}_0 = \{2, 24, 26, 40, 42, 48, 50, 141, 142, 149, 151, 165, 167, 189, 191\}\) such that any two keys which difference on each byte is in this set of constants are equivalent.

By defining \(\mathcal{S}_1 = \{0, 1, 4, 5, 8, 9, 12, 13, 64, 65, 68, 69, 72, 73, 76, 77\}\), each equivalence set can be seen as an affine space \(\mathcal{K}_u = u \oplus \mathcal{S}_0^8\) where \(u \in \mathcal{S}_1^8\) will be called the representative of the set.

This leads to \((\#\mathcal{S}_1)^8 = 2^{32}\) sets of \((\#\mathcal{S}_0)^8 = 2^{32}\) keys each. This partition is sufficient to attack Skippy efficiently and we will explain how in the next section.

Attacking Skippy

The attack will be conducted in two phases:

  • First, we find via bruteforce the representative \(u_k\) such that the key used to encrypt skippy.txt is in \(\mathcal{K}_u\);
  • Then, we find the right key \(k\) among the equivalence set \(u \oplus \mathcal{S}_0\), again using bruteforce.

Phase 1 - Finding the right equivalence set

We will go over all \(u \in \mathcal{S}_1^8\) and encrypt skippy.txt with it. For each block, the probability that the value of the eight attack bits are all equal to the one in the corresponding block in skippy.txt.enc is \(2^{-8}\). Since there are 14 blocks, the overall probability of a collision "by chance" is thus \(\left(2^{-8}\right)^{14} = 2^{-112}\). But we know that if \(u = u_k\) (that is, we have found the representative of the right equivalence set of keys) this property will hold for every block.

Phase 2 - Finding the right key inside the equivalence set

Knowing \(u_k\) allows us to bruteforce the key among the \(2^{32}\) keys in \(\mathcal{K}_{u_k}\). We go over every \(k' \in \mathcal{K}_{u_k}\), that is every \(k' = u_k \oplus v\) for \(v \in \mathcal{S}_0^8\). With this key \(k'\), we encrypt skippy.txt and check if the result is equal to skippy.txt.enc. If it is, then with overwhelmingly high probability, the right key \(k' = k\) is found.

Effciently implementing the attack

Python was found to be too slow to do the bruteforce needed. To go faster, I reimplemented Skippy in C.

C source code of Skippy
uint8_t Sbox[256] = { 
        0x81, 0x3f, 0xab, 0x3d, 0xa4, 0xb4, 0x31, 0x9e,
        0xba, 0xee, 0x90, 0xec, 0x9f, 0x50, 0x85, 0x62,
        0xb8, 0xde, 0xa2, 0xf4, 0x08, 0x78, 0x0a, 0xc5,
        0xb3, 0x15, 0xa9, 0x27, 0x96, 0xac, 0x33, 0x11,
        0xa0, 0xdc, 0x05, 0x7b, 0xaf, 0xdd, 0xad, 0x7a,
        0x14, 0x9a, 0xb1, 0xaa, 0x29, 0x3b, 0x03, 0x23,
        0x99, 0x82, 0x3c, 0x98, 0x2b, 0x13, 0xa6, 0x21,
        0x2d, 0x63, 0x88, 0x51, 0x10, 0xc7, 0x9d, 0xf5,
        0x4c, 0x58, 0xf3, 0xd5, 0x65, 0x06, 0xc0, 0x91,
        0x5c, 0xd0, 0x76, 0xfa, 0xe6, 0x1a, 0xfc, 0x2a,
        0x5e, 0x6f, 0xd3, 0xf8, 0x6b, 0x97, 0x59, 0x18,
        0xf1, 0x68, 0xc3, 0x6a, 0xe8, 0x2e, 0x4d, 0x1c,
        0xd1, 0x5f, 0x44, 0x47, 0xcc, 0x00, 0xe4, 0xbf,
        0x7e, 0xcf, 0xe9, 0xcd, 0x7f, 0x04, 0x55, 0x89,
        0x7c, 0x40, 0xdb, 0x5a, 0x7d, 0x34, 0x67, 0x2c,
        0xe3, 0x5d, 0x46, 0xe2, 0xfe, 0x02, 0x69, 0x32,
        0x52, 0x87, 0xf7, 0x20, 0x79, 0x1d, 0x4b, 0x1f,
        0x09, 0x8e, 0x39, 0x1b, 0xa8, 0x0e, 0x0f, 0x0c,
        0xae, 0xbe, 0x0b, 0x19, 0x0d, 0x83, 0xb0, 0x26,
        0x48, 0x22, 0xef, 0x12, 0x49, 0x37, 0xf6, 0x92,
        0xb6, 0x94, 0x86, 0x01, 0x25, 0x3e, 0x17, 0x24,
        0xed, 0xb7, 0x60, 0xb5, 0x61, 0x35, 0xc6, 0x2f,
        0xdf, 0x3a, 0x4a, 0x38, 0x53, 0x8a, 0xc4, 0x07,
        0x84, 0xbc, 0x9c, 0x8c, 0xb2, 0x16, 0x80, 0x9b,
        0xbd, 0x71, 0x30, 0x73, 0xca, 0xe1, 0x75, 0x74,
        0xbb, 0xf0, 0x36, 0xea, 0xfd, 0x4e, 0xff, 0x64,
        0x8b, 0x4f, 0x1e, 0xc2, 0x70, 0x56, 0x72, 0x54,
        0xa5, 0xd6, 0xa7, 0xd4, 0x6d, 0xc9, 0x45, 0xf9,
        0xa3, 0xd8, 0xa1, 0xda, 0xd7, 0xeb, 0xe7, 0xd9,
        0x28, 0x41, 0x8d, 0x5b, 0xe0, 0xfb, 0xc8, 0x6e,
        0x95, 0xce, 0x8f, 0x43, 0xd2, 0xcb, 0x77, 0x6c,
        0xb9, 0xf2, 0x93, 0x57, 0xe5, 0xc1, 0x42, 0x66,
};

uint16_t g(uint8_t key[10], uint8_t k, uint16_t state)
{
        uint16_t g1 = state >> 8;
        uint16_t g2 = state & 0xff;
        uint16_t g3 = Sbox[g2 ^ key[(4 * k + 0) % 10]] ^ g1;
        uint16_t g4 = Sbox[g3 ^ key[(4 * k + 1) % 10]] ^ g2;
        uint16_t g5 = Sbox[g4 ^ key[(4 * k + 2) % 10]] ^ g3;
        uint16_t g6 = Sbox[g5 ^ key[(4 * k + 3) % 10]] ^ g4;
        return (g5 << 8) ^ g6;
}

uint64_t skip(uint64_t state, uint8_t key[10])
{
        uint16_t w1 = state >> 48;
        uint16_t w2 = (state >> 32) & 0xFFFF;
        uint16_t w3 = (state >> 16) & 0xFFFF;
        uint16_t w4 = state & 0xFFFF;
        uint16_t gw1;
        uint16_t tmp;

        uint8_t k = 0;
        for (int nr = 0; nr < 2; nr++) {
            for (int i = 0; i < 8; i++) {
                gw1 = g(key, k, w1);
                w1 = gw1 ^ w4 ^ (k + 1);
                w4 = w3;
                w3 = w2;
                w2 = gw1;
                k += 1;
            }
            for (int i = 0; i < 8; i++) {
                gw1 = g(key, k, w1);
                tmp = w3;
                w3 = w1 ^ w2 ^ (k+1);
                w2 = gw1;
                w1 = w4;
                w4 = tmp;
                k += 1;
            }
        }

        return ((uint64_t)w1) << 48 | ((uint64_t)w2) << 32 |
        ((uint64_t)w3) << 16 | w4;
}

For both phases, we need to iterate over a cartesian product (\(\mathcal{S}_0^8\) or \(\mathcal{S}_1^8\)). To do so efficiently, I used a Gray code. This way, going from an element to the next only cost a single xor plus the cost of incrementing the gray counter and computing the position of the next element to xor.

The following function is used to determine which position to increment, given the Gray counter c and the array mods of the radix for each coordinate, and n the number of coordinates:

C source code of Skippy
int next_increment(uint64_t *c, uint64_t *mods, int n)
{
    int k = 0;

    (*c)++;

    for (k = 0; k < n; k++) {
        if (*c % mods[k])
            break;
    }

    if (k == n)
        return -1;

    return k;

}

Decrypting the flag

When we have finally retrieved the key \(k\), the only thing to do is to decrypt the file flag.txt.enc block by block with it.

Here is the full C program computing going over around 3500000 elements in \(\mathcal{S}_{.}^8\) per second, allowing to compute the full key in a maximum of about 40 minutes.

C source code of the solution
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

uint64_t M[14] = {
    0x546f757420626965,
    0x6e20717565207475,
    0x2064c3a97469656e,
    0x732065737420756e,
    0x20736f7563692071,
    0x7569207465207265,
    0x7469656e742c0a65,
    0x7420536b69707079,
    0x20657374206cc3a0,
    0x20706f7572206e6f,
    0x757320656e6c6576,
    0x657220746f757420,
    0x6e6f7320736f7563,
    0x697320210a000000,
};
uint64_t C[14] = {
    0x16cb13703966240d,
    0xac804cec44935a34,
    0x242a9828196d0149,
    0xc8eb8da1790bb39b,
    0xa5b7d48ee88edd73,
    0x2c7778c33fbc1d05,
    0x730d99ac64f1902c,
    0x4527a953d018a9b9,
    0xed58891a323369c7,
    0x08922a540bbfe2b3,
    0x8aef9a16072973d2,
    0x50843cc8f5bd078b,
    0x4050ee792c397926,
    0xc5defa50efde6993,
};

uint8_t Sbox[256] = { 
        0x81, 0x3f, 0xab, 0x3d, 0xa4, 0xb4, 0x31, 0x9e,
        0xba, 0xee, 0x90, 0xec, 0x9f, 0x50, 0x85, 0x62,
        0xb8, 0xde, 0xa2, 0xf4, 0x08, 0x78, 0x0a, 0xc5,
        0xb3, 0x15, 0xa9, 0x27, 0x96, 0xac, 0x33, 0x11,
        0xa0, 0xdc, 0x05, 0x7b, 0xaf, 0xdd, 0xad, 0x7a,
        0x14, 0x9a, 0xb1, 0xaa, 0x29, 0x3b, 0x03, 0x23,
        0x99, 0x82, 0x3c, 0x98, 0x2b, 0x13, 0xa6, 0x21,
        0x2d, 0x63, 0x88, 0x51, 0x10, 0xc7, 0x9d, 0xf5,
        0x4c, 0x58, 0xf3, 0xd5, 0x65, 0x06, 0xc0, 0x91,
        0x5c, 0xd0, 0x76, 0xfa, 0xe6, 0x1a, 0xfc, 0x2a,
        0x5e, 0x6f, 0xd3, 0xf8, 0x6b, 0x97, 0x59, 0x18,
        0xf1, 0x68, 0xc3, 0x6a, 0xe8, 0x2e, 0x4d, 0x1c,
        0xd1, 0x5f, 0x44, 0x47, 0xcc, 0x00, 0xe4, 0xbf,
        0x7e, 0xcf, 0xe9, 0xcd, 0x7f, 0x04, 0x55, 0x89,
        0x7c, 0x40, 0xdb, 0x5a, 0x7d, 0x34, 0x67, 0x2c,
        0xe3, 0x5d, 0x46, 0xe2, 0xfe, 0x02, 0x69, 0x32,
        0x52, 0x87, 0xf7, 0x20, 0x79, 0x1d, 0x4b, 0x1f,
        0x09, 0x8e, 0x39, 0x1b, 0xa8, 0x0e, 0x0f, 0x0c,
        0xae, 0xbe, 0x0b, 0x19, 0x0d, 0x83, 0xb0, 0x26,
        0x48, 0x22, 0xef, 0x12, 0x49, 0x37, 0xf6, 0x92,
        0xb6, 0x94, 0x86, 0x01, 0x25, 0x3e, 0x17, 0x24,
        0xed, 0xb7, 0x60, 0xb5, 0x61, 0x35, 0xc6, 0x2f,
        0xdf, 0x3a, 0x4a, 0x38, 0x53, 0x8a, 0xc4, 0x07,
        0x84, 0xbc, 0x9c, 0x8c, 0xb2, 0x16, 0x80, 0x9b,
        0xbd, 0x71, 0x30, 0x73, 0xca, 0xe1, 0x75, 0x74,
        0xbb, 0xf0, 0x36, 0xea, 0xfd, 0x4e, 0xff, 0x64,
        0x8b, 0x4f, 0x1e, 0xc2, 0x70, 0x56, 0x72, 0x54,
        0xa5, 0xd6, 0xa7, 0xd4, 0x6d, 0xc9, 0x45, 0xf9,
        0xa3, 0xd8, 0xa1, 0xda, 0xd7, 0xeb, 0xe7, 0xd9,
        0x28, 0x41, 0x8d, 0x5b, 0xe0, 0xfb, 0xc8, 0x6e,
        0x95, 0xce, 0x8f, 0x43, 0xd2, 0xcb, 0x77, 0x6c,
        0xb9, 0xf2, 0x93, 0x57, 0xe5, 0xc1, 0x42, 0x66,
};


int next_increment(uint64_t *c, uint64_t *mods, int n)
{
    int k = 0;

    (*c)++;

    for (k = 0; k < n; k++) {
        if (*c % mods[k])
            break;
    }

    if (k == n)
        return -1;

    return k;

}


uint16_t g(uint8_t key[10], uint8_t k, uint16_t state)
{
        uint16_t g1 = state >> 8;
        uint16_t g2 = state & 0xff;
        uint16_t g3 = Sbox[g2 ^ key[(4 * k + 0) % 10]] ^ g1;
        uint16_t g4 = Sbox[g3 ^ key[(4 * k + 1) % 10]] ^ g2;
        uint16_t g5 = Sbox[g4 ^ key[(4 * k + 2) % 10]] ^ g3;
        uint16_t g6 = Sbox[g5 ^ key[(4 * k + 3) % 10]] ^ g4;
        return (g5 << 8) ^ g6;
}


uint16_t g_inv(uint8_t key[10], uint8_t k, uint16_t state)
{
        uint16_t g6 = state & 0xff;
        uint16_t g5 = state >> 8;
        uint16_t g4 = g6 ^ Sbox[g5 ^ key[(4 * k + 3) % 10]];
        uint16_t g3 = g5 ^ Sbox[g4 ^ key[(4 * k + 2) % 10]];
        uint16_t g2 = g4 ^ Sbox[g3 ^ key[(4 * k + 1) % 10]];
        uint16_t g1 = g3 ^ Sbox[g2 ^ key[(4 * k + 0) % 10]];
        return (g1 << 8) ^ g2;
}


uint64_t skip(uint64_t state, uint8_t key[10])
{
        uint16_t w1 = state >> 48;
        uint16_t w2 = (state >> 32) & 0xFFFF;
        uint16_t w3 = (state >> 16) & 0xFFFF;
        uint16_t w4 = state & 0xFFFF;
        uint16_t gw1;
        uint16_t tmp;

        uint8_t k = 0;
        for (int nr = 0; nr < 2; nr++) {
            for (int i = 0; i < 8; i++) {
                gw1 = g(key, k, w1);
                w1 = gw1 ^ w4 ^ (k + 1);
                w4 = w3;
                w3 = w2;
                w2 = gw1;
                k += 1;
            }
            for (int i = 0; i < 8; i++) {
                gw1 = g(key, k, w1);
                tmp = w3;
                w3 = w1 ^ w2 ^ (k+1);
                w2 = gw1;
                w1 = w4;
                w4 = tmp;
                k += 1;
            }
        }

        return ((uint64_t)w1) << 48 | ((uint64_t)w2) << 32 |
            ((uint64_t)w3) << 16 | w4;
}


uint64_t inv_skip(uint64_t state, uint8_t key[10])
{
        uint16_t w1 = state >> 48;
        uint16_t w2 = (state >> 32) & 0xFFFF;
        uint16_t w3 = (state >> 16) & 0xFFFF;
        uint16_t w4 = state & 0xFFFF;
        uint16_t gw1;
        uint16_t tmp;
        uint16_t w1w2;

        uint8_t k = 32;
        for (int nr = 0; nr < 2; nr++) {
            for (int i = 0; i < 8; i++) {
                k -= 1;
                tmp = w3;
                w3 = w4;
                w4 = w1;
                gw1 = w2;
                w1w2 = tmp ^ (k + 1);
                w1 = g_inv(key, k, gw1);
                w2 = w1w2 ^ w1;
            }
            for (int i = 0; i < 8; i++) {
                k -= 1;
                tmp = w1;
                gw1 = w2;
                w2 = w3;
                w3 = w4;
                w4 = tmp ^ gw1 ^ (k + 1);
                w1 = g_inv(key, k, gw1);
            }
        }

        return ((uint64_t)w1) << 48 | ((uint64_t)w2) << 32 |
            ((uint64_t)w3) << 16 | w4;
}


void print_state(uint64_t state)
{
    for (int i = 7; i > -1; i--) {
        printf("%x", ((uint8_t *)&state)[i]);
    }
    printf("\n");
}


int bruteforce(uint8_t full_key[10], uint8_t key_space[16], int phase)
{



    uint64_t masked_C[14];
    for (int i = 0; i < 14; i++) {
        masked_C[i] = C[i] & 0x4040404040404040;
    }

    uint64_t count = 0;
    uint64_t mods[8] = {};
    uint64_t tmp = 16;
    // Initialize mods array for Gray code
    mods[0] = 16;
    for (int i = 1; i < 8; i++) {
        mods[i] = mods[i - 1]*16;
    }

    int gray_state[8] = {};

    uint64_t new_c, new_m;
    uint64_t mask = 0x4040404040404040;
    uint64_t n = 0;

    uint8_t full_key_ref[10] = {};
    for (int i = 0; i < 10; i++) {
        full_key_ref[i] = full_key[i];
    }



    for (int k = 0; k != -1; k = next_increment(&count, mods, 8)) {
        n += 1;
        if (n % (1 << 22) == 0) {
            printf("\rn: %lu", n);
        }

        if (phase == 1) {
            int i;
            for (i = 0; i < 14; i++) {
                new_c = skip(M[i], full_key);
                if ((new_c & mask) != masked_C[i]) {
                    break;
                }
            }
            // Equal each time
            if (i == 14) {
                printf("\r[Phase 1] Found after %lu it.\n", n);
                for (int j = 0; j < 10; j++) {
                    printf("%d ", full_key[j]);
                }
                printf("\n");
                return 0;
            }
        } else if (phase == 2) {
            if (skip(M[0], full_key) == C[0]) {
                printf("\r[Phase 2] Found after %lu it.!\n", n);
                for (int j = 0; j < 10; j++) {
                    printf("%d ", full_key[j]);
                }
                printf("\n");
                return 0;
            }
        } else {
            printf("Error, wrong phase number: %d\n", phase);
            return -1;
        }

        // Update key
        gray_state[k] = (gray_state[k] + 1)%16;
        if (phase == 1) {
            full_key[k] = key_space[gray_state[k]];
        } else {
            full_key[k] = full_key_ref[k] ^ key_space[gray_state[k]];
        }
        // Don't forget the last two bytes
        if (k == 0 || k == 1) full_key[8 + k] = full_key[k];
    }

    return -1;
}


void main()
{
    setvbuf(stdout, NULL, _IONBF, 0);


    uint8_t key_space0[16] = {0, 1, 4, 5, 8, 9, 12, 13, 64, 65, 68,
        69, 72, 73, 76, 77};
    uint8_t key_space1[16] = {0, 2, 24, 26, 40, 42, 48, 50, 141, 142,
        149, 151, 165, 167, 189, 191};

    uint8_t full_key[10] = {};

    bruteforce(full_key, key_space0, 1);
    bruteforce(full_key, key_space1, 2);


    uint64_t flag_enc[9] = {
        0xeb830947b84dcba9,
        0x6ff801b9ca178728,
        0xf6bf6dfde47562f7,
        0xf6d7573f85889e32,
        0xc398eabf0c9b97f5,
        0x87d19a12f2f65cf2,
        0xe5f412f780940217,
        0x4b28a79831751bad,
        0xdcceadd8acadbfa5,
    };
    uint64_t flag[9] = {};

    for (int i = 0; i < 9; i++) {
        flag[8-i] = inv_skip(flag_enc[i], full_key);
    }

    for (int i = 0; i < 9*8; i++) {
        printf("%c", ((char *)flag)[9*8-1-i]);
    }


    return;
}