A Tour of NTL
                                       
     _________________________________________________________________
                                      
Table of Contents

    1. Introduction 
    2. Examples 
    3. Summary of NTL's Main Modules 
    4. NTL Implementation and Portability 
    5. Some Performance Data 
    6. Obtaining and Installing NTL 
    7. Changes between NTL 1.0 and NTL 1.5 
       
Introduction

   NTL is a high-performance, portable C++ library providing data
   structures and algorithms for manipulating signed, arbitrary length
   integers, and for vectors, matrices, and polynomials over the integers
   and over finite fields.
   
   NTL uses state-of-the-art algorithms. In particular, it's code for
   polynomial arithmetic is one of the fastest available. Early versions
   of NTL have been used to set "world records" for polynomial
   factorization and point counting on elliptic curves.
   
   NTL is written entirely in C++, and can be easily installed on just
   about any Unix platform, including PCs, and 32- and 64-bit
   workstations. Despite the fact that NTL is written in C++ and avoids
   assembly, NTL's performance is generally much better than is typical
   of such portable libraries.
   
   NTL is relatively easy to use, and it provides a good environment for
   easily and quickly implementing new number-theoretic algorithms,
   without sacrificing performance.
   
   NTL is free software that is intended for research and educational
   purposes only.
   
   NTL is by no means a "complete" computer algebra package. Instead of
   having NTL grow into a huge package, the aim is that NTL itself will
   serve as a stable, portable platform for implementing other
   algorithms. It is hoped that interested users with relevant expertise
   will implement algorithms on top of NTL and make their software
   available to other NTL users. Links to such software will be provided
   on the NTL page.
   
     _________________________________________________________________
                                      
Examples

   Perhaps the best way to introduce the basics of NTL is by way of
   example. Please note, however, that there is much more functionality
   provided by NTL than is illustrated in these examples.
   
  Example 1
  
   The first example makes use of the class ZZ, which represents signed,
   arbitrary length integers. This program reads two numbers and prints
   their product:

#include "ZZ.h"

main()
{
   ZZ a, b, prod;

   cin >> a;
   cin >> b;
   mul(prod, a, b);
   cout << prod << "\n";
}

   Note that in NTL, one writes
   mul(prod, a, b);

   instead of
   prod = a * b;

   as many C++ programmers might expect. The reason is that the latter
   form ultimately leads to a lot of unnecessary copying, and allocating
   and freeing of memory, which might not be apparent to the programmer.
   In designing NTL, efficiency won out over aesthetics in this case.
   
  Example 2
  
   The following routine sums up the numbers in a vector of ZZ's.
#include "vec_ZZ.h"

void sum(ZZ& s, const vector(ZZ)& v)
{
   ZZ acc;

   clear(acc);

   long n = v.length();
   long i;
   for (i = 0; i < n; i++)
      add(acc, acc, v[i]);

   s = acc;
}

   The class vector(ZZ) is a dynamic-length array of ZZs; more generally,
   NTL provides macros to create a template-like class vector(T) for any
   type T that acts as a dynamic-length array. The reason that macros are
   used instead of true templates is simple: at the present time,
   compiler support for templates is not entirely satisfactory, and their
   use would make NTL much more difficult to port. At some point in the
   future, a template-version of NTL may be made available.
   
   The routine sum declares a local variable acc of type ZZ, first clears
   it, and then sums up the array elements.
   
   One thing to notice is that the routine add works properly even if the
   output (its first argument) aliases one of its inputs. Except for a
   few exceptions that are well-marked in the documentation, NTL routines
   allow their inputs to alias their outputs.
   
   Also notice that by accumulating the sum in a local variable, instead
   of directly in the argument s, the routine sum will work correctly
   even if s aliases one of the elements of v. Any space required for acc
   is automatically freed by acc's destructor when the routine returns.
   
   Vectors in NTL are indexed from 0, but in many situations it is
   convenient or more natural to index from 1. The class vector(T) allows
   for this; the above example could be written as follows.

#include "vec_ZZ.h"

void sum(ZZ& s, const vector(ZZ)& v)
{
   ZZ acc;

   clear(acc);

   long n = v.length();
   long i;
   for (i = 1; i <= n; i++)
      add(acc, acc, v(i));

   s = acc;
}

  Example 3
  
   There is also basic support for matrices in NTL. In general, the class
   matrix(T) is a special kind of vector(vector(T)), where each row is a
   vector of the same length. Row i of matrix M can be accessed as M[i]
   (indexing from 0) or as M(i) (indexing from 1). Column j of row i can
   be accessed as M[i][j] or M(i)(j); for notational convenience, the
   latter is equivalent to M(i,j).
   
   Here is a matrix multiplication routine, which in fact is already
   provided by NTL.

#include "mat_ZZ.h"

void mul(matrix(ZZ)& X, const matrix(ZZ)& A, const matrix(ZZ)& B)
{
   long n = A.NumRows();
   long l = A.NumCols();
   long m = B.NumCols();

   if (l != B.NumRows())
      Error("matrix mul: dimension mismatch");

   X.SetDims(n, m);

   long i, j, k;
   ZZ acc, tmp;

   for (i = 1; i <= n; i++) {
      for (j = 1; j <= m; j++) {
         clear(acc);
         for(k = 1; k <= l; k++) {
            mul(tmp, A(i,k), B(k,j));
            add(acc, acc, tmp);
         }
         X(i,j) = acc;
      }
   }
}

   Note that in case of a dimension mismatch, the routine calls the Error
   function, which is a part of NTL and which simply prints the message
   and aborts. That is generally how NTL deals with errors. Currently,
   NTL makes no use of exceptions (for the same reason it does not use
   templates--see above), but a future version may incorporate them.
   
   Note that this routine will not work properly if X aliases A or B. The
   actual matrix multiplication routine in NTL takes care of this.
   
   Another thing to notice is that NTL generally avoids the type int,
   preferring instead to use long. This seems to go against what most
   "style" books preach, but nevertheless seems to make the most sense in
   today's world. Although int was originally meant to represent the
   "natural" word size, that seems to no longer be the case. Usually, int
   and long are the same; however, when they are not, it seems that int
   is actually a hack for backward compatibility to a past when word
   sizes were smaller. Thus, for simplicity and safety, NTL uses long for
   all integer values.
   
  Example 4
  
   NTL provides extensive support for polynomial arithmetic. The class
   ZZX represents univariate polynomials with integer coefficients. The
   following program reads a polynomial, factors it, and prints the
   factorization.

#include "ZZXFactoring.h"

main()
{
   ZZX f;

   cin >> f;

   vector(pair(ZZX,long)) factors;
   ZZ c;

   factor(c, factors, f);

   cout << c << "\n";
   cout << factors << "\n";
}

   When this program is compiled an run on input
   [2 10 14 6]

   which represents the polynomial 2 + 10*X + 14*x^2 +6*X^3, the output
   is

   2
   [[[1 3] 1] [[1 1] 2]]

   The first line of output is the content of the polynomial, which is 2
   in this case as each coefficient of the input polynomial is divisible
   by 2. The second line is a vector of pairs, the first member of each
   pair is an irreducible factor of the input, and the second is the
   exponent to which is appears in the factorization. Thus, all of the
   above simply means that

2 + 10*X + 14*x^2 +6*X^3 = 2 * (1 + 3*X) * (1 + X)^2

   Admittedly, I/O in NTL is not exactly user friendly, but then NTL has
   no pretensions about being an interactive computer algebra system: it
   is a library for programmers.
   
  Example 5
  
   Here is another example. The following program prints out the first
   100 cyclotomic polynomials.


#include "ZZX.h"

main()
{
   vector(ZZX) phi;

   phi.SetLength(100);

   long i;
   for (i = 1; i <= 100; i++) {
      SetCoeff(phi(i), i);
      add(phi(i), phi(i), -1);

      long j;
      for (j = 1; j <= i-1; j++)
         if (i % j == 0)
            divide(phi(i), phi(i), phi(j));

      cout << phi(i) << "\n";
   }
}

  Example 6
  
   NTL also supports modular integer arithmetic. The class ZZ_p
   represents the integers mod p. Despite the notation, p need not in
   general be prime, except in situations where this is mathematically
   required. The classes vector(ZZ_p), matrix(ZZ_p), and ZZ_pX represent
   vectors, matrices, and polynomials mod p, and work much the same way
   as the corresponding classes for ZZ.
   
   Here is a program that reads a prime number p, and a polynomial f, and
   factors it.

#include "ZZ_pXFactoring.h"

main()
{
   ZZ p;
   cin >> p;
   ZZ_pInit(p);

   ZZ_pX f;
   cin >> f;

   vector(pair(ZZ_pX,long)) factors;

   CanZass(factors, f);

   cout


As a program is running, NTL keeps track of a "current modulus"
for the class ZZ_p, which can be initialized or changed
using ZZ_pInit.
This must be done before any variables are declared or
computations are done that depend on this modulus.


Please note that for efficiency reasons,
NTL does not make any attempt to ensure that
variables declared under one modulus are not used
under a different one.
If that happens, the behavior of a program in this
case is completely unpredictable.


  Example 7


There is a mechanism for saving and restoring a modulus,
which the following example illustrates.
This routine takes as input an integer polynomial
and a prime, and tests if the polynomial is irreducible modulo
the prime.

#include "ZZX.h"
#include "ZZ_pXFactoring.h"

long IrredTestMod(const ZZX& f, const ZZ& p)
{
   ZZ_pBak bak;
   bak.save();

   ZZ_pInit(p);

   ZZ_pX f1;

   f1 << f;
   long res = DetIrredTest(f1);

   bak.restore();

   return res;
}


The current modulus is saved using the helper class ZZ_pBak
and the operation save().
The old modulus is then later restored using operation restore().
In theory, the call to restore() could be left out,
as the destructor for bak should call it;
however, the paranoid programmer should call it explicitly.


The operator << is a conversion operator.
NTL provides many conversion operators between various types,
all using the same syntax, and which generally do the obvious thing.
As a rule, NTL avoids the use of implicit conversion
operators, as these can easily lead to gross inefficiencies
if one is not extremely careful.


The routine DetIrredTest is one of several tests
for irreducibility provided by NTL.


  Example 8


Suppose in the above example that p is known in advance
to be a small, single-precision  prime.
In this case, NTL provides a class zz_p, that
acts just like ZZ_p,
along with corresponding classes vector(zz_p),
matrix(zz_p), and zz_pX.
The interfaces to all of the routines are generally identical
to those for ZZ_p.
However, the routines are much more efficient, in both time and space.


For small primes, the routine in the previous example could be coded
as follows.


#include "ZZX.h"
#include "zz_pXFactoring.h"

long IrredTestMod(const ZZX& f, long p)
{
   zz_pBak bak;
   bak.save();

   zz_pInit(p);

   zz_pX f1;

   f1 << f;
   long res = DetIrredTest(f1);

   bak.restore();

   return res;
}


  __________________________________________________________________________
                                       

Summary of NTL's Main Modules


NTL consists of a number of software modules.
Generally speaking, for each module foo, there is a header
file foo.h, and an implementation file foo.c.
There is also a documentation file foo.txt.
This takes the form of a header file,
but stripped of implementation
details and declarations of some of the more esoteric
routines and data structures, and it contains more complete
and usually clearer documentation.

The following is a summary of the main NTL modules.
The corresponding ".txt" file can be obtained by clicking on
the module name.


tools


some basic types and utility routines, including a
timing function

   
vector


macros for the template-like class vector(T),
providing dynamic-size arrays

   
matrix


macros for the template-like class matrix(T),
providing dynamic-size 2-dimensional arrays

   
ZZ


class ZZ: arbitrary length integers;
includes routines for GCDs, Jacobi symbols,
modular arithmetic, and primality testing;
also includes small prime generation routines
and in-line routines for single-precision
modular arithmetic

   
ZZ_p


class ZZ_p: integers mod p

   
zz_p


class zz_p: integers mod p, where p is single-precision

   
xdouble


class xdouble: double-precision floating point numbers with
extended exponent range.

   
RR


class RR: arbitrary-precision floating point numbers.

   
ZZX


class ZZX: polynomials over ZZ;
includes routines for GCDs, minimal and characteristic
polynomials, norms and traces

   
ZZXFactoring


routines for factoring univariate polynomials over ZZ

   
ZZ_pX


class ZZ_pX: polynomials over ZZ_p;
includes routines for modular polynomials arithmetic,
modular composition, minimal and characteristic
polynomials, and interpolation.


ZZ_pXFactoring


routines for factoring polynomials over ZZ_p;
also includes routines for testing for and constructing
irreducible polynomials

   
zz_pX


class zz_pX: polynomials over zz_p;
provides the same functionality as class ZZ_pX,
but for single-precision p

   
zz_pXFactoring


routines for factoring polynomials over zz_p;
provides the same functionality as class ZZ_pX,
but for single-precision p

   
mat_ZZ


class matrix(ZZ);
includes basic matrix arithmetic operations,
including determinant calculation, matrix inversion,
and solving nonsingular systems of linear equations

   
HNF


routines for computing the Hermite Normal Form
of a lattice

   
LLL


routines for performing lattice basis reduction,
including an implementation of the Schnorr-Euchner LLL
and Block Korkin Zolotarev reduction algorithm,
as well as an integer-only reduction algorithm.


mat_ZZ_p


class matrix(ZZ_p);
includes basic matrix arithmetic operations,
including determinant calculation, matrix inversion,
solving nonsingular systems of linear equations,
and Gaussian elimination

   
mat_zz_p


class matrix(zz_p);
includes basic matrix arithmetic operations,
including determinant calculation, matrix inversion,
solving nonsingular systems of linear equations,
and Gaussian elimination

   
mat_RR


class matrix(RR);
includes basic matrix arithmetic operations,
including determinant calculation, matrix inversion,
and solving nonsingular systems of linear equations.

   
mat_poly_ZZ


routine for computing the characteristic
polynomial of a matrix(ZZ)

   
mat_poly_ZZ_p


routine for computing the characteristic
polynomial of a matrix(ZZ_p)

   
mat_poly_zz_p


routine for computing the characteristic
polynomial of a matrix(zz_p)

   
vec_ZZ


class vector(ZZ)

   
ZZVec


class ZZVec: fixed-length vectors of fixed-length ZZs;
less flexible, but more efficient than vector(ZZ)

   
vec_ZZ_p


class vector(ZZ_p)

   
vec_zz_p


class vector(zz_p)

   
vec_RR


class vector(RR)


  __________________________________________________________________________
                                       

NTL Implementation and Portability


NTL is designed to be portable, fast,
and relatively easy to use and extend.


To make NTL portable, no assembly code is used.
This is highly desirable, as architectures are constantly
changing and evolving, and maintaining assembly
code is quite costly.
By avoiding assembly code, NTL should remain usable,
with virtually no maintenance, for many years.

The main drawback of this philosophy is that without assembly code,
one cannot use machine instructions
to obtain double-word products, or perform double-word by single-word
division.
There are a number of possible strategies for dealing with this.
NTL's basic strategy uses a
combination of integer and floating-point instruction sequences.


To carry out this strategy, NTL makes two requirements
of its platform,
neither of which are guaranteed by the C++ language
definition, but nevertheless appear to be essentially universal:

    1. Integers are represented using 2's complement, and integer
       overflow is not trapped, but rather just wraps around.
    2. Double precision floating point conforms to the IEEE standard.


Actually, with some modification, NTL would not need the first
requirement, by exploiting language definitions dealing
with unsigned arithmetic.
Future versions of NTL may incorporate this modification,
if there is any need for it.


Relying on floating point may seem prone to errors,
but with the guarantees provided by the IEEE standard,
one can prove the correctness of the NTL code that uses floating point.
Actually, NTL is quite conservative, and substantially weaker
conditions are sufficient for correctness.
In particular, NTL works correctly with any rounding mode,
and also with any mix of double precision and extended double precision
operations (which arise, for example, with Intel x86 processors).


With this strategy, NTL represents arbitrary length integers
using a 30-bit radix on 32-bit machines, and a 50-bit
radix on 64-bit machines.


This general strategy
is used in A. K. Lenstra's LIP library
for arbitrary-length integer arithmetic.
Indeed, NTL's integer arithmetic
evolved from LIP,
but over time almost all of this code has been
rewritten to enhance performance.


Long integer multiplication is implemented using the classical
algorithm, crossing over to Karatsuba for very big numners.
Polynomial multiplication and division is carried out
using a combination of the classical algorithm, Karatsuba,
the FFT using small primes, and the FFT using the Schoenhagge-Strassen
approach.


  __________________________________________________________________________
                                       

Some Performance Data


NTL is high-performance software,
offering high-quality implementations of the best algorithms.
Here are some timing figures from the current version of NTL.
The figures were obtained using an IBM RS6000 Workstation, Model 43P-133,
which has a 133 MHz PowerPC Model 604 processor.
The operating system is AIX and the compiler is lxC.
The compiler options were -O2  -qarch=ppc  -DAVOID_FLOAT -DTBL_REM.


The first problem considered is the factorization of
univariate polynomials modulo a prime p.
As test polynomials, we take the family of polynomials
defined in [V. Shoup, J. Symb. Comp. 20:363-397, 1995].
For every n, we define p
to be the first prime greater than
2^{n-2}*PI, and the polynomial is

\sum_{i=0}^n a_{n-i} X^i,

where a_0 = 1, and a_{i+1} = a_i^2 + 1.
Here are some running times:


  n   64   128   256   512   1024

  hh:mm:ss   2   13   1:53   21:01   4:05:25


Also of interest is space usage.
The n = 512 case used 4MB main memory,
and the n = 1024 case used 17 MB main memory.


Another test suite, this time using small primes,
was used by Kaltofen and Lobo (Proc. ISSAC '94).
One of there polynomials is a degree 10001 polynomial,
modulo the prime 127.
This polynomial was factored with NTL in just over 3 hours,
using 17MB of memory.


The second problem considered is factoring univariate polynomials
over the integers.
We use two test suites.
In the first, we factor F_n(X)*F_{n+1}(X), where

F_n(X) = \sum_{i=0}^n f_{n-i}*X^i,

and f_i is the i-th
Fibonacci number (f_0 = 1, f_1 = 1, f_2 = 2, ...).
Here are some running times:


  n   100   200   300   400   500   1000

  hh:mm:ss   11   34   1:44   2:35   3:35   15:20


The space in the n=500 case was under 5MB, and in the
n=1000 case, under 13MB.


The second test suite comes from 
Paul Zimmermann .
The polynomial P1(X) has
degree 156, coefficients up to 424 digits, and 36 factors
(12 of degree 2, 15 of degree 4, 9 of degree 8). The
polynomial P2(X) has degree 196,
coefficients up to 419 digits and 12 factors
(2 of degree 2, 4 of degree 12 and
6 of degree 24).
The polynomial P3(X) has degree 336,
coefficients up to 597 digits and 16 factors (4 of degree
12 and 12 of degree 24).
The polynomial P4(X) has degree 462,
coefficients up to 756 digits, and two factors
of degree 66 and 396.
 More details 
on this test suite are available.


Our running times (hh:mm:ss) were as follows:

21, 23, 1:16, 1:37:10.

In all cases less than 5MB of main memory was used.


  __________________________________________________________________________
                                       

Obtaining and Installing NTL


To obtain the source code and documentation for NTL,
download ntl-1.5.tar.Z,
placing it an empty directory, and then, working in this directory,
execute the following command:

zcat ntl-1.5.tar.Z | tar xvf -


There is a makefile, which you might need to edit.
You need to specify a C++ compiler, and optionally,
a compatible C compiler.
A few source files are written in pure C,
and will compile under C or C++, but
using a C compiler sometimes yields better code.
The default settings in the makefile use
the Gnu compilers g++ and gcc.

There are a variety of compiler flags that you can set
to customize the compilation, affecting the quality
of the compiled code.

     * The -O2 flag, which invokes the optimizer, is essential for
       reasonable performance.
     * The -g option causes symbol tables to be generated that are used
       by a debugger; for the Gnu compilers, this is compatible with the
       optimizer, but for some other compilers it is not. If not, leave
       it out.
     * The -mv8 option should be used on Sparc stations that have an
       integer multiply instruction (Sparc-10 and later); otherwise, the
       compiler will not generate this instruction, causing a serious
       performance loss.
     * The -qarch=ppc option should be used on the RS6000/PowerPC when
       using the AIX compilers xlc/xlC to get access to the full
       instruction set.
     * The -+ should be added to the CPPFLAGS line in the makefile when
       using the AIX compiler xlC.
     * The -DAVOID_FLOAT option should be used on machines with fast
       integer multiplication, but slow floating point. On such machines,
       this will generally yield faster code. PowerPCs, for example,
       perform much better with this option.
     * The -DTBL_REM avoids some divisions in implementing multiplication
       in ZZ_pX. If you use the -DAVOID_FLOAT flag, then you should
       probably use this flag too.
     * The two flags -DAVOID_BRANCHING and -DFFT_PIPELINE should yield a
       performance gain on just about any RISC architecture, such as
       Sparc stations and Alphas. They don't seem to help on Pentiums,
       however. With the first flag, branches are replaced at several key
       points with equivalent code using shifts and masks. The second
       flag causes the FFT routine to use a software pipeline.
     * The -DRANGE_CHECK flag causes vector indices to be checked at run
       time, resulting in slower execution, of course.
     * The -DCPLUSPLUS_ONLY flag may be used of you are using only a C++
       compiler, and no C compiler. This flag eliminates unnecessary "C"
       linkage.
     * On 32-bit platforms platforms that support it, the -DSINGLE_MUL
       flag causes a 26-bit radix to be used, and avoids integer
       multiplications, using floating point multiplications instead.
       This yields better code on machines with slow integer
       multiplication, and fast floating point. On old Sparc stations,
       this yields a marked performance gain. However, the 26-bit radix
       can be awkward in some applications.
       The -DFAST_INT_MUL flag can be used in conjunction with the
       -DSINGLE_MUL flag. This flag causes integer multiplications to be
       used where possible.
       One technical point to be aware of is that should the library be
       compiled with (resp. without) the -DSINGLE_MUL flag, than any code
       calling the library must also be compiled with (resp. without)
       this flag. FFT primes.


After editing makefile, just execute make.
The first thing that the makefile does is to
make mach_desc.h, which defines some machine characteristics
such as word size and machine precision.
This is done by compiling and running a C program
that figures out these characteristics on its
own, and prints some diagnostics to the terminal.


After this, the makefile will compile all the source files,
and then create the library ntl.a.


Finally, the makefile compiles and runs a series of test programs.
The output generated should indicate if there are any problems.


Executing make clean will remove unnecessary object files.


Executing make clobber removes everything that
was generated by a previous installation.


When linking a program, you need to include ntl.a and -lm
as libraries.


The compilation should run smoothly on just about any UNIX platform.
The only real trouble I've heard of is on the RS6000/PowerPC, where the GNU
compiler seems to have a code-generation bug.
The AIX compilers xlc/xlC, however, work fine.


  __________________________________________________________________________
                                       

Changes between NTL 1.0 and NTL 1.5


Programs written using NTL 1.0 should still work without change
with NTL 1.5.
In addition to some minor performance tuning and bug fixes,
NTL 1.5 offers the following new features:

     * Implementation of Schnorr-Euchner algorithms for lattice basis
       reduction, including deep insertions and block Korkin Zolotarev
       reduction. These are significantly faster than the LLL algorithm
       in NTL 1.0.
     * Implementation of arbitrary-precision floating point.
     * Implementation of double precision with extended exponent range,
       which is useful for lattice basis reduction when the coefficients
       are large.
     * Faster polynomial multiplication over the integers, incorporating
       the Schoenhagge-Strassen method.
     * Compilation flags that increase performance on machines with poor
       floating-point performance.


  __________________________________________________________________________
                                       

 Back to NTL