A Brief Overview of Eucrypt's MPI Library

Posted 2018-04-29

Introduction

The Eucrypt MPI library, painfully removed from GnuPG 1.4.10, remains a mess in the traditional manner of C programs. This post is not, by any means, an exhaustive walk through the code. Rather, this post tries to convey the essence of the library, so that the curious reader can rummage through the code with a moderate sense of direction. Code excerpts are given to show what procedures depend on what, not necessarily to explain how it works.

Topics which this guide does not cover, include:

  • Secure memory: secmem.c
  • Error handling: error.c
  • Debugging (set by defining M_DEBUG)
  • Logging: logger.c
  • Compile-time configuration: config.h
  • Architecture-specific autoconfiguration: (,code "types.h")
  • Bytewise access/repalcement for MPIs: mpi-scan.c
  • I/O (which is insanely overcomplicated: see the iobuf_struct struct): iobuf.c, iobuf.h, ttyio.h, mpicoder.c
  • Macros for compiler inlining of operations: mpi-inline.h
Admittedly, this is the bulk of the library. But here the focus is on the arithmetic operations, and what is needed to carry those out.

Structure of an MPI

An MPI is defined as typedef struct gcry_mpi *MPI, where

struct gcry_mpi {
	int alloced;    /* array size (# of allocated limbs) */
	int nlimbs;     /* number of valid limbs */
	unsigned int nbits; /* the real number of valid bits (info only) */
	int sign;		/* indicates a negative number */
	unsigned flags; /* bit 0: array must be allocated in secure memory space */
			/* bit 1: not used */
			/* bit 2: the limb is a pointer to some xmalloced data */
	mpi_limb_t *d;  /* array with the limbs */
};
mpi-internal.h

To sumarize, an MPI is an array of limbs, where a limb is some kind of builtin number type:

#if BYTES_PER_MPI_LIMB == SIZEOF_UNSIGNED_INT
  typedef unsigned int mpi_limb_t;
  typedef   signed int mpi_limb_signed_t;
#elif BYTES_PER_MPI_LIMB == SIZEOF_UNSIGNED_LONG
  typedef unsigned long int mpi_limb_t;
  typedef   signed long int mpi_limb_signed_t;
#elif BYTES_PER_MPI_LIMB == SIZEOF_UNSIGNED_LONG_LONG
  typedef unsigned long long int mpi_limb_t;
  typedef   signed long long int mpi_limb_signed_t;
#elif BYTES_PER_MPI_LIMB == SIZEOF_UNSIGNED_SHORT
  typedef unsigned short int mpi_limb_t;
  typedef   signed short int mpi_limb_signed_t;
#else
#error BYTES_PER_MPI_LIMB does not match any C type
#endif
#define BITS_PER_MPI_LIMB    (8*BYTES_PER_MPI_LIMB)
mpi-internal.h

The user can set (before compillation) the underlying type of a limb by modifying mpi-asm-defs.h:

/* This file defines some basic constants for the MPI machinery.  We
 * need to define the types on a per-CPU basis, so it is done with
 * this file here.  */
#define BYTES_PER_MPI_LIMB  (SIZEOF_UNSIGNED_LONG)
mpi-asm-defs.h

A revealing example is the procedure for copying limbs of an MPI, which simply reassignes the limb pointers in the target MPI to the limb pointers in the source MPI:

/* Copy N limbs from S to D.  */
#define MPN_COPY( d, s, n) 
	do {				
	mpi_size_t _i;			
	for( _i = 0; _i < (n); _i++ )	
		(d)[_i] = (s)[_i];		
	} while(0)

Memory Management of MPIs

From the GPG FAQ (yes, I am intentionally preserving their encoding):

GnuPG tries to lock memory so that no other process can see it and so that the memory will not be written to swap. If for some reason it’s not able to do this (for instance, certain platforms don’t support this kind of memory locking), GnuPG will warn you that it’s using insecure memory.

Since this kind of security is flawed, I am skipping its implementation. So this article will not cover these items:

 *  a) functions to provide memory from a secure memory.
 *  b) by looking at the requested allocation size we
 *     can reuse memory very quickly (e.g. MPI storage)
 *     (really needed?)
 *  c) memory usage reporting if compiled with M_DEBUG
 *  d) memory checking if compiled with M_GUARD
memory.c

On to the code. The primary procedure for memory allocation is below:

void *
FNAMEXM(alloc)( size_t n FNAMEPRT )
{
	char *p;

#ifdef M_GUARD
	if(!n)
	  out_of_core(n,0); /* should never happen */
	if( !(p = malloc( n + EXTRA_ALIGN+5 )) )
	out_of_core(n,0);
	store_len(p,n,0);
	used_memory += n;
	p[4+EXTRA_ALIGN+n] = MAGIC_END_BYTE;
	return p+EXTRA_ALIGN+4;
#else
	/* mallocing zero bytes is undefined by ISO-C, so we better make
	   sure that it won't happen */
	if (!n)
	  n = 1;
	if( !(p = malloc( n )) )
	out_of_core(n,0);
	return p;
#endif
}
memory.c

(Note how the name of the procedure is determined by the macro FNAMEXM, which prefixes the procedure name with m_debug when M_DEBUG is defined. Also, if M_GUARD is defined, then the procedure takes a const char*. In our case, we ignore these options.) The procedure checks if the allocation succeeded, calling out_of_core if the allocation fails:

static void
out_of_core(size_t n, int secure)
{
	log_error ("out of %s memory while allocating %u bytesn",
			   secure? "secure":"" ,(unsigned)n );
	if (secure) {
		/*secmem_dump_stats ();*/
		log_info ("(this may be caused by too many secret keys used "
				  "simultaneously or due to excessive large key sizes)n");
	}
#if defined(M_GUARD) && defined(__riscos__)
	abort();
#endif
	exit (2);
}
memory.c

There is also a trymalloc procedure, which doesn't terminate the process on failure:

void *
FNAMEX(trymalloc)(size_t n FNAMEPRT)
{
#ifdef M_GUARD
	char *p;

	if (!n)
	  n = 1;
	p = malloc (n + EXTRA_ALIGN+5);
	if (!p)
	  return NULL;
	store_len(p,n,0);
	used_memory += n;
	p[4+EXTRA_ALIGN+n] = MAGIC_END_BYTE;
	return p+EXTRA_ALIGN+4;
#else
	/* Mallocing zero bytes is undefined by ISO-C, so we better make
	   sure that it won't happen.  */
	return malloc (n? n: 1);
#endif
}

As I'm sure the reader can guess, there is a procedure to re-allocate memory:

void *
FNAMEX(realloc)( void *a, size_t n FNAMEPRT )
{
	void *b;

#ifdef M_GUARD
	if( a ) {
#error "--enable-m-guard does not currently work"
		unsigned char *p = a;
		size_t len = m_size(a);

		if( len >= n ) /* we don't shrink for now */
			return a;
		if( p[-1] == MAGIC_SEC_BYTE )
			b = FNAME(alloc_secure_clear)(n FNAMEARG);
		else
			b = FNAME(alloc_clear)(n FNAMEARG);
		FNAME(check)(NULL FNAMEARG);
		memcpy(b, a, len );
		FNAME(free)(p FNAMEARG);
	}
	else
		b = FNAME(alloc)(n FNAMEARG);
#else
	if( m_is_secure(a) ) {
	if( !(b = secmexrealloc( a, n )) )
		out_of_core(n,1);
	}
	else {
	if( !(b = realloc( a, n )) )
		out_of_core(n,0);
	}
#endif

	return b;
}
memory.c

And there is a procedure to free allocated memory:

void
FNAMEX(free)( void *a FNAMEPRT )
{
	byte *p = a;

	if( !p )
	return;
#ifdef M_DEBUG
	free_entry(p-EXTRA_ALIGN-4, info);
#elif defined M_GUARD
	m_check(p);
	if( m_is_secure(a) )
	secmem_free(p-EXTRA_ALIGN-4);
	else {
	used_memory -= m_size(a);
	free(p-EXTRA_ALIGN-4);
	}
#else
	if( m_is_secure(a) )
	secmem_free(p);
	else
	free(p);
#endif
}
memory.c

The above procedures are for arbitrary memory management. Specific procedures exist for the memory management of MPIs. To allocate an MPI of nlimbs many limbs, use the following procedure:

MPI
#ifdef M_DEBUG
mpi_debug_alloc( unsigned nlimbs, const char *info )
#else
mpi_alloc( unsigned nlimbs )
#endif
{
    MPI a;

    if( DBG_MEMORY )
	log_debug("mpi_alloc(%u)n", nlimbs*BITS_PER_MPI_LIMB );
#ifdef M_DEBUG
    a = m_debug_alloc( sizeof *a, info );
    a->d = nlimbs? mpi_debug_alloc_limb_space( nlimbs, 0, info ) : NULL;
#else
    a = xmalloc( sizeof *a );
    a->d = nlimbs? mpi_alloc_limb_space( nlimbs, 0 ) : NULL;
#endif
    a->alloced = nlimbs;
    a->nlimbs = 0;
    a->sign = 0;
    a->flags = 0;
    a->nbits = 0;
    return a;
}
mpiutil.c

Another procedure takes in an MPI, and spits out an alloced copy of the input:

MPI
#ifdef M_DEBUG
mpi_debug_copy( MPI a, const char *info )
#else
mpi_copy( MPI a )
#endif
{
    int i;
    MPI b;

    if( a && (a->flags & 4) ) {
	void *p = m_is_secure(a->d)? xmalloc_secure( a->nbits )
				   : xmalloc( a->nbits );
	memcpy( p, a->d, a->nbits );
	b = mpi_set_opaque( NULL, p, a->nbits );
    }
    else if( a ) {
#ifdef M_DEBUG
	b = mpi_is_secure(a)? mpi_debug_alloc_secure( a->nlimbs, info )
			    : mpi_debug_alloc( a->nlimbs, info );
#else
	b = mpi_is_secure(a)? mpi_alloc_secure( a->nlimbs )
			    : mpi_alloc( a->nlimbs );
#endif
	b->nlimbs = a->nlimbs;
	b->sign = a->sign;
	b->flags  = a->flags;
	b->nbits = a->nbits;
	for(i=0; i < b->nlimbs; i++ )
	    b->d[i] = a->d[i];
    }
    else
	b = NULL;
    return b;
}
mpiutil.c

Yet another procedure resizes an MPI (but only to increase the size):

void
#ifdef M_DEBUG
mpi_debug_resize( MPI a, unsigned nlimbs, const char *info )
#else
mpi_resize( MPI a, unsigned nlimbs )
#endif
{
    if( nlimbs <= a->alloced )
	return; /* no need to do it */
    /* Note: a->secure is not used - instead the realloc functions
     * take care of it. Maybe we should drop a->secure completely
     * and rely on a mpi_is_secure function, which would be
     * a wrapper around m_is_secure
     */
#ifdef M_DEBUG
    if( a->d )
	a->d = m_debug_realloc(a->d, nlimbs * sizeof(mpi_limb_t), info );
    else
	a->d = m_debug_alloc_clear( nlimbs * sizeof(mpi_limb_t), info );
#else
    if( a->d )
	a->d = xrealloc(a->d, nlimbs * sizeof(mpi_limb_t) );
    else
	a->d = xmalloc_clear( nlimbs * sizeof(mpi_limb_t) );
#endif
    a->alloced = nlimbs;
}
mputil.c

Arithmetic Operations on MPIs

The MPI library supports the following arithmetic operations:

  • Addition and subtraction: mpi-add.c
  • Multiplication: mpi-mul.c
  • Division: mpi-div.c
  • Modular exponentiation: mpi-pow.c
  • GCD: mpi-gcd.c
  • Products of modular exponentiation: mpi-mpow.c
  • Ordering: mpi-cmp.c
  • Bitshifting of MPIs: mpi-bit.c
  • Multiplicative inverse in finite field: mpi-inv.c
A few specially optimized versions of the above exist, e.g. for multiplication by a power of 2.

Let's look carefully at addition. (By the end, you should appreciate just how disasterous the addition facility is). To begin, we need to know how to add arrays of limbs:

mpi_limb_t
mpihelp_add_n( mpi_ptr_t res_ptr, mpi_ptr_t s1_ptr,
	       mpi_ptr_t s2_ptr, mpi_size_t size)
{
    mpi_limb_t x, y, cy;
    mpi_size_t j;

    /* The loop counter and index J goes from -SIZE to -1.  This way
       the loop becomes faster.  */
    j = -size;

    /* Offset the base pointers to compensate for the negative indices. */
    s1_ptr -= j;
    s2_ptr -= j;
    res_ptr -= j;

    cy = 0;
    do {
	y = s2_ptr[j];
	x = s1_ptr[j];
	y += cy;		  /* add previous carry to one addend */
	cy = y < cy;		  /* get out carry from that addition */
	y += x; 		  /* add other addend */
	cy += y < x;		  /* get out carry from that add, combine */
	res_ptr[j] = y;
    } while( ++j );

    return cy;
}
mpih-add1.c

You may or may not notice that this is confusing as hell. Negative index values besides, the magic of < returning 1 is used to calculate the carry. And how can y be less than cy when we just added cy to y? Well, we're really checking if y wrapped around, duh! (Remember, y is unsigned.)

We also need a procedure to subtract an array of limbs, for the case where we add MPIs of differing signs:

mpi_limb_t
mpihelp_sub_n( mpi_ptr_t res_ptr, mpi_ptr_t s1_ptr,
				  mpi_ptr_t s2_ptr, mpi_size_t size)
{
    mpi_limb_t x, y, cy;
    mpi_size_t j;

    /* The loop counter and index J goes from -SIZE to -1.  This way
       the loop becomes faster.  */
    j = -size;

    /* Offset the base pointers to compensate for the negative indices.  */
    s1_ptr -= j;
    s2_ptr -= j;
    res_ptr -= j;

    cy = 0;
    do {
	y = s2_ptr[j];
	x = s1_ptr[j];
	y += cy;		  /* add previous carry to subtrahend */
	cy = y < cy;		  /* get out carry from that addition */
	y = x - y;		  /* main subtract */
	cy += y > x;		  /* get out carry from the subtract, combine */
	res_ptr[j] = y;
    } while( ++j );

    return cy;
}
mpih-sub1.c

Now we are adequately prepared to investigate addition:

void
mpi_add(MPI w, MPI u, MPI v)
{
    mpi_ptr_t wp, up, vp;
    mpi_size_t usize, vsize, wsize;
    int usign, vsign, wsign;

    if( u->nlimbs < v->nlimbs ) { /* Swap U and V. */
	usize = v->nlimbs;
	usign = v->sign;
	vsize = u->nlimbs;
	vsign = u->sign;
	wsize = usize + 1;
	RESIZE_IF_NEEDED(w, wsize);
	/* These must be after realloc (u or v may be the same as w).  */
	up    = v->d;
	vp    = u->d;
    }
    else {
	usize = u->nlimbs;
	usign = u->sign;
	vsize = v->nlimbs;
	vsign = v->sign;
	wsize = usize + 1;
	RESIZE_IF_NEEDED(w, wsize);
	/* These must be after realloc (u or v may be the same as w).  */
	up    = u->d;
	vp    = v->d;
    }
    wp = w->d;
    wsign = 0;

    if( !vsize ) {  /* simple */
	MPN_COPY(wp, up, usize );
	wsize = usize;
	wsign = usign;
    }
    else if( usign != vsign ) { /* different sign */
	/* This test is right since USIZE >= VSIZE */
	if( usize != vsize ) {
	    mpihelp_sub(wp, up, usize, vp, vsize);
	    wsize = usize;
	    MPN_NORMALIZE(wp, wsize);
	    wsign = usign;
	}
	else if( mpihelp_cmp(up, vp, usize) < 0 ) {
	    mpihelp_sub_n(wp, vp, up, usize);
	    wsize = usize;
	    MPN_NORMALIZE(wp, wsize);
	    if( !usign )
		wsign = 1;
	}
	else {
	    mpihelp_sub_n(wp, up, vp, usize);
	    wsize = usize;
	    MPN_NORMALIZE(wp, wsize);
	    if( usign )
		wsign = 1;
	}
    }
    else { /* U and V have same sign. Add them. */
	mpi_limb_t cy = mpihelp_add(wp, up, usize, vp, vsize);
	wp[usize] = cy;
	wsize = usize + cy;
	if( usign )
	    wsign = 1;
    }

    w->nlimbs = wsize;
    w->sign = wsign;
}
mpi-add.c

The first if block allows us to assume that the size of u is no smaller than that of v. Then the destination MPI w is resized to ensure that it is at least one limb larger than u, to ensure there is enough space for a possibly carry. Following the resizing, the one of the previously mentioned operations on limb arrays, mpihelp_sub_n or mpihelp_add, is invoked to add the limb arrays (mpihelp_add is an inlined version of mpihelp_add_n, defined in mpi-inline.h)

Concluding Remarks

I had wished to be more through, but after the above, I am both exhausted and disgusted. If the gentle reader wishes to dive further, then power to them, but I advise on the contrary.

Leave a Reply (USE HTML! Space not preserved!)