Archive for September, 2018

esthlos-v Part 2: cl-keccak

Saturday, September 15th, 2018

Get the latest vpatch here:

This post introduces Keccak into the refactored esthlos-v. The inner workings of Keccak have already been well explained, including operations at the bit-level, so I'm not going to beat a dead horse. Instead, let's hit the broad strokes of how the thing works, pointing out quirks of implementation.

Oh, and please do point out all the inefficiencies and such that you see. I'm aware of a few optimizations which are begging to be made, but I'm sure there are more in plain right.

First things first. The available knobs:

;; the number of bits in a byte, arch-specific
(defconstant +bits-in-byte+ 8)

;; the keccak L parameter, takes a value in {1,2,3,4,5,6}
(defconstant +keccak_L+ 6)

;; the number of bits absorbed into the sponge on each pass
(defconstant +bitrate+ 1344)

;; the desired number of output bits
(defconstant +output-bits+ 512)

My guess is that you probably shouldn't touch +bits-in-byte+. The Keccak-specific guts of the program theoretically will work on alien, non-octet-based arch, but the I/O will break. Everything else should be self-explanatory: keep +keccak_L+ a valid value, ensure that the +bitrate+ is less than the Keccak width, and keep +output-bits+ divisible by 16 (for the hex output).

Now for the internals. Upon being called, the first action is to look up the input file and create a corresponding bit vector. Immediately we have a challenge of interpretation: The file stored on disk is byte-addressed, so there's no meaning to "the first bit of the the byte". All we can ask for is a the first byte of the file, and we must make a decision of how to store that byte as a vector of bits to carry out the Keccak operations. According to the team Keccak website, the convention is to represent each byte as little-endian vector of eight bits, left-concatenating the vectors as we read off the bytes of the file. You might expect such crucial information, which very much does impact the results of the hash, to be standardized by the standard. Well, I guess you might expect a lot of things. But I'm using little-endian representation, and below you can see how:

(defun integer-to-bit-vector (n)
  (labels ((bit-array-iter (n array)
             (if (zerop n)
                 (multiple-value-bind (q r)
                     (floor n 2)
                   (bit-array-iter q
                                   (append array (list r)))))))
    (bit-pad-right (bit-array-iter n '()) 8)))

The eight-bit-long vectors are concatenated together in a simple and (hopefully) efficient loop:

(defun bit-vector-concatenate-uniform-vector (bit-vector-vector member-size)
  (let ((rtn (make-sequence 'simple-bit-vector
                            (* member-size (length bit-vector-vector)))))
    (dotimes (i (length bit-vector-vector))
      (replace rtn
               (aref bit-vector-vector i)
               :start1 (* i member-size)
               :end1 (* (1+ i) member-size)))

And the input to the concatenation procedure derives from a higher procedure, taking in a path to the file in question:

(defun file-to-bit-vector (filepath)
  (with-open-file (f filepath :direction :input :element-type 'bit)
     (map 'vector
          (let ((s (make-sequence 'list (file-length f))))
            (read-sequence s f)

I'm acutely aware that my Common Lisp skills are mediocre, and my optimization abilities are worse, so please leave feedback if I'm missing obvious improvements.1

After constructing a bit vector to represent the file, we go through the usual Keccak mumbo jumbo as explained elsewhere. At the highest level, this part of the program looks like so:

(defun keccak-sponge (input-bit-vector bitrate output-bits)
  (keccak-squeeze (keccak-absorb input-bit-vector

In the end, it's that simple; absorb and squeeze. Make sure to set the bitrate to be less than the width, where the width can be calculated as

(* 5 5 (expt 2 +keccak_L+))

Now for output. All we do is take the bit vector returned by the sponge, and convert it to big-endian hexadecimal:

(defun keccak-hash-file (filepath bitrate output-bits)
  (bit-vector-to-hex (keccak-sponge (file-to-bit-vector filepath)

The conversion takes place one octet at a time, and padding the hexadecimal so there are two digits without exception. These two-digit hexadecimal numerals are then left-concatenated, creating the usual sloppy big-endian mess2:

(defun bit-vector-to-hex (bv)
  (apply #'concatenate
         (mapcar (lambda (n)
                   (let ((s (write-to-string n :base 16)))
                     (if (= (length s) 2)
                         (concatenate 'string "0" s))))
                 (mapcar #'bit-vector-to-integer
                         (bit-chunk bv 8)))))

Then the concatenated hexadecimal is downcased, output, and we're finished!

(defun main ()
  (let ((args #+sbcl (cdr sb-ext:*posix-argv*)
              #+ccl (cdr ccl:*command-line-argument-list*)))
    (princ (string-downcase (keccak-hash-file (first args)

So how do we know it works? Well, check the tests directory and read the README:

To run the tests:
1. Run make
2. Run the resulting executable

Apologies to the neets and jwzs out there, for making you do something. These tests are the same as those used by Diana, including both tests of the permutations and tests of the sponge. Pipe the output of the executable to a file, or watch it lazily scroll by. If you don't see any "fail", you're good to go.

So we know how it works, but how well? Here are some benchmarks:

| size (KiB) | time (seconds) | rate (KiB/s) |
|          1 |          0.051 |         19.6 |
|         10 |          0.255 |         39.2 |
|        100 |          2.351 |         42.5 |
|       1000 |          23.52 |         42.5 |
|      10000 |          246.4 |        40.58 |

The trend between input size and resulting time of computation appears linear across the orders of magnitude. Though based on a rate of 40 KiB/s, large files will take some time to hash.

Anyway, that's all for now. More on esthlos-v to follow.

  1. Additionally, I am aware that the current approach requires the entire file be loaded into memory at once, whereas the mathematics behind Keccak do not require having the entire file available. However, the Keccak specification itself does require the entire file be in memory to carry out the procedure as specified; a detail which, I'd hazard a guess, most implementations fail to conform to. []
  2. Isn't computing elegant? []