Get the latest vpatch here:

This post introduces Keccak into the refactored esthlos-v. The inner workings of Keccak have already been well explained, including operations at the bit-level, so I'm not going to beat a dead horse. Instead, let's hit the broad strokes of how the thing works, pointing out quirks of implementation.

Oh, and please do point out all the inefficiencies and such that you see. I'm aware of a few optimizations which are begging to be made, but I'm sure there are more in plain right.

First things first. The available knobs:

```
;; the number of bits in a byte, arch-specific
(defconstant +bits-in-byte+ 8)
;; the keccak L parameter, takes a value in {1,2,3,4,5,6}
(defconstant +keccak_L+ 6)
;; the number of bits absorbed into the sponge on each pass
(defconstant +bitrate+ 1344)
;; the desired number of output bits
(defconstant +output-bits+ 512)
```

My guess is that you probably shouldn't touch `+bits-in-byte+`

. The
Keccak-specific guts of the program *theoretically* will work on
alien, non-octet-based arch, but the I/O will break. Everything else
should be self-explanatory: keep `+keccak_L+`

a valid value, ensure
that the `+bitrate+`

is less than the Keccak width, and
keep `+output-bits+`

divisible by 16 (for the hex output).

Now for the internals.
Upon being called, the first action is to look up the input file and
create a corresponding
bit vector.
Immediately we have a challenge of interpretation:
The file stored on disk is byte-addressed, so there's no meaning to
"the first bit of the the byte". All we can ask for is a the first *byte* of
the file, and we must make a decision of how to store that byte as a
vector of *bits* to carry out the Keccak operations. According to the
team Keccak website, the
convention is to represent each byte as little-endian vector of eight bits,
left-concatenating the vectors as we read off the bytes of the file. You might
expect such crucial information, which very much *does* impact the results
of the hash, to be standardized by the standard. Well, I guess you might
expect a lot of things. But
**I'm using little-endian representation**, and below you can see how:

```
(defun integer-to-bit-vector (n)
(labels ((bit-array-iter (n array)
(if (zerop n)
array
(multiple-value-bind (q r)
(floor n 2)
(bit-array-iter q
(append array (list r)))))))
(bit-pad-right (bit-array-iter n '()) 8)))
```

The eight-bit-long vectors are concatenated together in a simple and (hopefully) efficient loop:

```
(defun bit-vector-concatenate-uniform-vector (bit-vector-vector member-size)
(let ((rtn (make-sequence 'simple-bit-vector
(* member-size (length bit-vector-vector)))))
(dotimes (i (length bit-vector-vector))
(replace rtn
(aref bit-vector-vector i)
:start1 (* i member-size)
:end1 (* (1+ i) member-size)))
rtn))
```

And the input to the concatenation procedure derives from a higher procedure, taking in a path to the file in question:

```
(defun file-to-bit-vector (filepath)
(with-open-file (f filepath :direction :input :element-type 'bit)
(bit-vector-concatenate-uniform-vector
(map 'vector
#'integer-to-bit-vector
(let ((s (make-sequence 'list (file-length f))))
(read-sequence s f)
s))
+bits-in-byte+)))
```

I'm acutely aware that my Common Lisp skills are mediocre, and my optimization
abilities are worse, so please leave feedback if I'm missing obvious
improvements.^{1}

After constructing a bit vector to represent the file, we go through the usual Keccak mumbo jumbo as explained elsewhere. At the highest level, this part of the program looks like so:

```
(defun keccak-sponge (input-bit-vector bitrate output-bits)
(keccak-squeeze (keccak-absorb input-bit-vector
bitrate)
bitrate
output-bits))
```

In the end, it's that simple; absorb and squeeze. Make sure to set the
bitrate to be *less than the width*, where the width can be calculated as

```
(* 5 5 (expt 2 +keccak_L+))
```

Now for output. All we do is take the bit vector returned by the sponge, and convert it to big-endian hexadecimal:

```
(defun keccak-hash-file (filepath bitrate output-bits)
(bit-vector-to-hex (keccak-sponge (file-to-bit-vector filepath)
bitrate
output-bits)))
```

The conversion takes place
one octet at a time, and padding the
hexadecimal so there are two digits without exception. These two-digit
hexadecimal numerals are then left-concatenated, creating the usual
sloppy big-endian mess^{2}:

```
(defun bit-vector-to-hex (bv)
(apply #'concatenate
'string
(mapcar (lambda (n)
(let ((s (write-to-string n :base 16)))
(if (= (length s) 2)
s
(concatenate 'string "0" s))))
(mapcar #'bit-vector-to-integer
(bit-chunk bv 8)))))
```

Then the concatenated hexadecimal is downcased, output, and we're finished!

```
(defun main ()
(let ((args #+sbcl (cdr sb-ext:*posix-argv*)
#+ccl (cdr ccl:*command-line-argument-list*)))
(princ (string-downcase (keccak-hash-file (first args)
+bitrate+
+output-bits+)))))
```

So how do we know it works? Well, check the `tests`

directory and
read the `README`

:

```
To run the tests:
1. Run make
2. Run the resulting executable
```

Apologies to the neets and jwzs out there, for making you do something. These tests are the same as those used by Diana, including both tests of the permutations and tests of the sponge. Pipe the output of the executable to a file, or watch it lazily scroll by. If you don't see any "fail", you're good to go.

So we know how it works, but how well? Here are some benchmarks:

```
| size (KiB) | time (seconds) | rate (KiB/s) |
|------------+----------------+--------------|
| 1 | 0.051 | 19.6 |
| 10 | 0.255 | 39.2 |
| 100 | 2.351 | 42.5 |
| 1000 | 23.52 | 42.5 |
| 10000 | 246.4 | 40.58 |
```

The trend between input size and resulting time of computation appears linear across the orders of magnitude. Though based on a rate of 40 KiB/s, large files will take some time to hash.

Anyway, that's all for now. More on esthlos-v to follow.

- Additionally, I am aware that the current approach requires
the entire file be loaded into memory at once, whereas the
*mathematics*behind Keccak do not require having the entire file available.*However*, the Keccak specification itself*does*require the entire file be in memory to carry out the procedure as specified; a detail which, I'd hazard a guess, most implementations fail to conform to. [↩] - Isn't computing elegant? [↩]