A Tour of NTL _________________________________________________________________ Table of Contents 1. Introduction 2. Examples 3. Summary of NTL's Main Modules 4. NTL Implementation and Portability 5. Some Performance Data 6. Obtaining and Installing NTL 7. Changes between NTL 1.0 and NTL 1.5 Introduction NTL is a high-performance, portable C++ library providing data structures and algorithms for manipulating signed, arbitrary length integers, and for vectors, matrices, and polynomials over the integers and over finite fields. NTL uses state-of-the-art algorithms. In particular, it's code for polynomial arithmetic is one of the fastest available. Early versions of NTL have been used to set "world records" for polynomial factorization and point counting on elliptic curves. NTL is written entirely in C++, and can be easily installed on just about any Unix platform, including PCs, and 32- and 64-bit workstations. Despite the fact that NTL is written in C++ and avoids assembly, NTL's performance is generally much better than is typical of such portable libraries. NTL is relatively easy to use, and it provides a good environment for easily and quickly implementing new number-theoretic algorithms, without sacrificing performance. NTL is free software that is intended for research and educational purposes only. NTL is by no means a "complete" computer algebra package. Instead of having NTL grow into a huge package, the aim is that NTL itself will serve as a stable, portable platform for implementing other algorithms. It is hoped that interested users with relevant expertise will implement algorithms on top of NTL and make their software available to other NTL users. Links to such software will be provided on the NTL page. _________________________________________________________________ Examples Perhaps the best way to introduce the basics of NTL is by way of example. Please note, however, that there is much more functionality provided by NTL than is illustrated in these examples. Example 1 The first example makes use of the class ZZ, which represents signed, arbitrary length integers. This program reads two numbers and prints their product: #include "ZZ.h" main() { ZZ a, b, prod; cin >> a; cin >> b; mul(prod, a, b); cout << prod << "\n"; } Note that in NTL, one writes mul(prod, a, b); instead of prod = a * b; as many C++ programmers might expect. The reason is that the latter form ultimately leads to a lot of unnecessary copying, and allocating and freeing of memory, which might not be apparent to the programmer. In designing NTL, efficiency won out over aesthetics in this case. Example 2 The following routine sums up the numbers in a vector of ZZ's. #include "vec_ZZ.h" void sum(ZZ& s, const vector(ZZ)& v) { ZZ acc; clear(acc); long n = v.length(); long i; for (i = 0; i < n; i++) add(acc, acc, v[i]); s = acc; } The class vector(ZZ) is a dynamic-length array of ZZs; more generally, NTL provides macros to create a template-like class vector(T) for any type T that acts as a dynamic-length array. The reason that macros are used instead of true templates is simple: at the present time, compiler support for templates is not entirely satisfactory, and their use would make NTL much more difficult to port. At some point in the future, a template-version of NTL may be made available. The routine sum declares a local variable acc of type ZZ, first clears it, and then sums up the array elements. One thing to notice is that the routine add works properly even if the output (its first argument) aliases one of its inputs. Except for a few exceptions that are well-marked in the documentation, NTL routines allow their inputs to alias their outputs. Also notice that by accumulating the sum in a local variable, instead of directly in the argument s, the routine sum will work correctly even if s aliases one of the elements of v. Any space required for acc is automatically freed by acc's destructor when the routine returns. Vectors in NTL are indexed from 0, but in many situations it is convenient or more natural to index from 1. The class vector(T) allows for this; the above example could be written as follows. #include "vec_ZZ.h" void sum(ZZ& s, const vector(ZZ)& v) { ZZ acc; clear(acc); long n = v.length(); long i; for (i = 1; i <= n; i++) add(acc, acc, v(i)); s = acc; } Example 3 There is also basic support for matrices in NTL. In general, the class matrix(T) is a special kind of vector(vector(T)), where each row is a vector of the same length. Row i of matrix M can be accessed as M[i] (indexing from 0) or as M(i) (indexing from 1). Column j of row i can be accessed as M[i][j] or M(i)(j); for notational convenience, the latter is equivalent to M(i,j). Here is a matrix multiplication routine, which in fact is already provided by NTL. #include "mat_ZZ.h" void mul(matrix(ZZ)& X, const matrix(ZZ)& A, const matrix(ZZ)& B) { long n = A.NumRows(); long l = A.NumCols(); long m = B.NumCols(); if (l != B.NumRows()) Error("matrix mul: dimension mismatch"); X.SetDims(n, m); long i, j, k; ZZ acc, tmp; for (i = 1; i <= n; i++) { for (j = 1; j <= m; j++) { clear(acc); for(k = 1; k <= l; k++) { mul(tmp, A(i,k), B(k,j)); add(acc, acc, tmp); } X(i,j) = acc; } } } Note that in case of a dimension mismatch, the routine calls the Error function, which is a part of NTL and which simply prints the message and aborts. That is generally how NTL deals with errors. Currently, NTL makes no use of exceptions (for the same reason it does not use templates--see above), but a future version may incorporate them. Note that this routine will not work properly if X aliases A or B. The actual matrix multiplication routine in NTL takes care of this. Another thing to notice is that NTL generally avoids the type int, preferring instead to use long. This seems to go against what most "style" books preach, but nevertheless seems to make the most sense in today's world. Although int was originally meant to represent the "natural" word size, that seems to no longer be the case. Usually, int and long are the same; however, when they are not, it seems that int is actually a hack for backward compatibility to a past when word sizes were smaller. Thus, for simplicity and safety, NTL uses long for all integer values. Example 4 NTL provides extensive support for polynomial arithmetic. The class ZZX represents univariate polynomials with integer coefficients. The following program reads a polynomial, factors it, and prints the factorization. #include "ZZXFactoring.h" main() { ZZX f; cin >> f; vector(pair(ZZX,long)) factors; ZZ c; factor(c, factors, f); cout << c << "\n"; cout << factors << "\n"; } When this program is compiled an run on input [2 10 14 6] which represents the polynomial 2 + 10*X + 14*x^2 +6*X^3, the output is 2 [[[1 3] 1] [[1 1] 2]] The first line of output is the content of the polynomial, which is 2 in this case as each coefficient of the input polynomial is divisible by 2. The second line is a vector of pairs, the first member of each pair is an irreducible factor of the input, and the second is the exponent to which is appears in the factorization. Thus, all of the above simply means that 2 + 10*X + 14*x^2 +6*X^3 = 2 * (1 + 3*X) * (1 + X)^2 Admittedly, I/O in NTL is not exactly user friendly, but then NTL has no pretensions about being an interactive computer algebra system: it is a library for programmers. Example 5 Here is another example. The following program prints out the first 100 cyclotomic polynomials. #include "ZZX.h" main() { vector(ZZX) phi; phi.SetLength(100); long i; for (i = 1; i <= 100; i++) { SetCoeff(phi(i), i); add(phi(i), phi(i), -1); long j; for (j = 1; j <= i-1; j++) if (i % j == 0) divide(phi(i), phi(i), phi(j)); cout << phi(i) << "\n"; } } Example 6 NTL also supports modular integer arithmetic. The class ZZ_p represents the integers mod p. Despite the notation, p need not in general be prime, except in situations where this is mathematically required. The classes vector(ZZ_p), matrix(ZZ_p), and ZZ_pX represent vectors, matrices, and polynomials mod p, and work much the same way as the corresponding classes for ZZ. Here is a program that reads a prime number p, and a polynomial f, and factors it. #include "ZZ_pXFactoring.h" main() { ZZ p; cin >> p; ZZ_pInit(p); ZZ_pX f; cin >> f; vector(pair(ZZ_pX,long)) factors; CanZass(factors, f); cout As a program is running, NTL keeps track of a "current modulus" for the class ZZ_p, which can be initialized or changed using ZZ_pInit. This must be done before any variables are declared or computations are done that depend on this modulus. Please note that for efficiency reasons, NTL does not make any attempt to ensure that variables declared under one modulus are not used under a different one. If that happens, the behavior of a program in this case is completely unpredictable. Example 7 There is a mechanism for saving and restoring a modulus, which the following example illustrates. This routine takes as input an integer polynomial and a prime, and tests if the polynomial is irreducible modulo the prime. #include "ZZX.h" #include "ZZ_pXFactoring.h" long IrredTestMod(const ZZX& f, const ZZ& p) { ZZ_pBak bak; bak.save(); ZZ_pInit(p); ZZ_pX f1; f1 << f; long res = DetIrredTest(f1); bak.restore(); return res; } The current modulus is saved using the helper class ZZ_pBak and the operation save(). The old modulus is then later restored using operation restore(). In theory, the call to restore() could be left out, as the destructor for bak should call it; however, the paranoid programmer should call it explicitly. The operator << is a conversion operator. NTL provides many conversion operators between various types, all using the same syntax, and which generally do the obvious thing. As a rule, NTL avoids the use of implicit conversion operators, as these can easily lead to gross inefficiencies if one is not extremely careful. The routine DetIrredTest is one of several tests for irreducibility provided by NTL. Example 8 Suppose in the above example that p is known in advance to be a small, single-precision prime. In this case, NTL provides a class zz_p, that acts just like ZZ_p, along with corresponding classes vector(zz_p), matrix(zz_p), and zz_pX. The interfaces to all of the routines are generally identical to those for ZZ_p. However, the routines are much more efficient, in both time and space. For small primes, the routine in the previous example could be coded as follows. #include "ZZX.h" #include "zz_pXFactoring.h" long IrredTestMod(const ZZX& f, long p) { zz_pBak bak; bak.save(); zz_pInit(p); zz_pX f1; f1 << f; long res = DetIrredTest(f1); bak.restore(); return res; } __________________________________________________________________________ Summary of NTL's Main Modules NTL consists of a number of software modules. Generally speaking, for each module foo, there is a header file foo.h, and an implementation file foo.c. There is also a documentation file foo.txt. This takes the form of a header file, but stripped of implementation details and declarations of some of the more esoteric routines and data structures, and it contains more complete and usually clearer documentation. The following is a summary of the main NTL modules. The corresponding ".txt" file can be obtained by clicking on the module name. tools some basic types and utility routines, including a timing function vector macros for the template-like class vector(T), providing dynamic-size arrays matrix macros for the template-like class matrix(T), providing dynamic-size 2-dimensional arrays ZZ class ZZ: arbitrary length integers; includes routines for GCDs, Jacobi symbols, modular arithmetic, and primality testing; also includes small prime generation routines and in-line routines for single-precision modular arithmetic ZZ_p class ZZ_p: integers mod p zz_p class zz_p: integers mod p, where p is single-precision xdouble class xdouble: double-precision floating point numbers with extended exponent range. RR class RR: arbitrary-precision floating point numbers. ZZX class ZZX: polynomials over ZZ; includes routines for GCDs, minimal and characteristic polynomials, norms and traces ZZXFactoring routines for factoring univariate polynomials over ZZ ZZ_pX class ZZ_pX: polynomials over ZZ_p; includes routines for modular polynomials arithmetic, modular composition, minimal and characteristic polynomials, and interpolation. ZZ_pXFactoring routines for factoring polynomials over ZZ_p; also includes routines for testing for and constructing irreducible polynomials zz_pX class zz_pX: polynomials over zz_p; provides the same functionality as class ZZ_pX, but for single-precision p zz_pXFactoring routines for factoring polynomials over zz_p; provides the same functionality as class ZZ_pX, but for single-precision p mat_ZZ class matrix(ZZ); includes basic matrix arithmetic operations, including determinant calculation, matrix inversion, and solving nonsingular systems of linear equations HNF routines for computing the Hermite Normal Form of a lattice LLL routines for performing lattice basis reduction, including an implementation of the Schnorr-Euchner LLL and Block Korkin Zolotarev reduction algorithm, as well as an integer-only reduction algorithm. mat_ZZ_p class matrix(ZZ_p); includes basic matrix arithmetic operations, including determinant calculation, matrix inversion, solving nonsingular systems of linear equations, and Gaussian elimination mat_zz_p class matrix(zz_p); includes basic matrix arithmetic operations, including determinant calculation, matrix inversion, solving nonsingular systems of linear equations, and Gaussian elimination mat_RR class matrix(RR); includes basic matrix arithmetic operations, including determinant calculation, matrix inversion, and solving nonsingular systems of linear equations. mat_poly_ZZ routine for computing the characteristic polynomial of a matrix(ZZ) mat_poly_ZZ_p routine for computing the characteristic polynomial of a matrix(ZZ_p) mat_poly_zz_p routine for computing the characteristic polynomial of a matrix(zz_p) vec_ZZ class vector(ZZ) ZZVec class ZZVec: fixed-length vectors of fixed-length ZZs; less flexible, but more efficient than vector(ZZ) vec_ZZ_p class vector(ZZ_p) vec_zz_p class vector(zz_p) vec_RR class vector(RR) __________________________________________________________________________ NTL Implementation and Portability NTL is designed to be portable, fast, and relatively easy to use and extend. To make NTL portable, no assembly code is used. This is highly desirable, as architectures are constantly changing and evolving, and maintaining assembly code is quite costly. By avoiding assembly code, NTL should remain usable, with virtually no maintenance, for many years. The main drawback of this philosophy is that without assembly code, one cannot use machine instructions to obtain double-word products, or perform double-word by single-word division. There are a number of possible strategies for dealing with this. NTL's basic strategy uses a combination of integer and floating-point instruction sequences. To carry out this strategy, NTL makes two requirements of its platform, neither of which are guaranteed by the C++ language definition, but nevertheless appear to be essentially universal: 1. Integers are represented using 2's complement, and integer overflow is not trapped, but rather just wraps around. 2. Double precision floating point conforms to the IEEE standard. Actually, with some modification, NTL would not need the first requirement, by exploiting language definitions dealing with unsigned arithmetic. Future versions of NTL may incorporate this modification, if there is any need for it. Relying on floating point may seem prone to errors, but with the guarantees provided by the IEEE standard, one can prove the correctness of the NTL code that uses floating point. Actually, NTL is quite conservative, and substantially weaker conditions are sufficient for correctness. In particular, NTL works correctly with any rounding mode, and also with any mix of double precision and extended double precision operations (which arise, for example, with Intel x86 processors). With this strategy, NTL represents arbitrary length integers using a 30-bit radix on 32-bit machines, and a 50-bit radix on 64-bit machines. This general strategy is used in A. K. Lenstra's LIP library for arbitrary-length integer arithmetic. Indeed, NTL's integer arithmetic evolved from LIP, but over time almost all of this code has been rewritten to enhance performance. Long integer multiplication is implemented using the classical algorithm, crossing over to Karatsuba for very big numners. Polynomial multiplication and division is carried out using a combination of the classical algorithm, Karatsuba, the FFT using small primes, and the FFT using the Schoenhagge-Strassen approach. __________________________________________________________________________ Some Performance Data NTL is high-performance software, offering high-quality implementations of the best algorithms. Here are some timing figures from the current version of NTL. The figures were obtained using an IBM RS6000 Workstation, Model 43P-133, which has a 133 MHz PowerPC Model 604 processor. The operating system is AIX and the compiler is lxC. The compiler options were -O2 -qarch=ppc -DAVOID_FLOAT -DTBL_REM. The first problem considered is the factorization of univariate polynomials modulo a prime p. As test polynomials, we take the family of polynomials defined in [V. Shoup, J. Symb. Comp. 20:363-397, 1995]. For every n, we define p to be the first prime greater than 2^{n-2}*PI, and the polynomial is \sum_{i=0}^n a_{n-i} X^i, where a_0 = 1, and a_{i+1} = a_i^2 + 1. Here are some running times: n 64 128 256 512 1024 hh:mm:ss 2 13 1:53 21:01 4:05:25 Also of interest is space usage. The n = 512 case used 4MB main memory, and the n = 1024 case used 17 MB main memory. Another test suite, this time using small primes, was used by Kaltofen and Lobo (Proc. ISSAC '94). One of there polynomials is a degree 10001 polynomial, modulo the prime 127. This polynomial was factored with NTL in just over 3 hours, using 17MB of memory. The second problem considered is factoring univariate polynomials over the integers. We use two test suites. In the first, we factor F_n(X)*F_{n+1}(X), where F_n(X) = \sum_{i=0}^n f_{n-i}*X^i, and f_i is the i-th Fibonacci number (f_0 = 1, f_1 = 1, f_2 = 2, ...). Here are some running times: n 100 200 300 400 500 1000 hh:mm:ss 11 34 1:44 2:35 3:35 15:20 The space in the n=500 case was under 5MB, and in the n=1000 case, under 13MB. The second test suite comes from Paul Zimmermann . The polynomial P1(X) has degree 156, coefficients up to 424 digits, and 36 factors (12 of degree 2, 15 of degree 4, 9 of degree 8). The polynomial P2(X) has degree 196, coefficients up to 419 digits and 12 factors (2 of degree 2, 4 of degree 12 and 6 of degree 24). The polynomial P3(X) has degree 336, coefficients up to 597 digits and 16 factors (4 of degree 12 and 12 of degree 24). The polynomial P4(X) has degree 462, coefficients up to 756 digits, and two factors of degree 66 and 396. More details on this test suite are available. Our running times (hh:mm:ss) were as follows: 21, 23, 1:16, 1:37:10. In all cases less than 5MB of main memory was used. __________________________________________________________________________ Obtaining and Installing NTL To obtain the source code and documentation for NTL, download ntl-1.5.tar.Z, placing it an empty directory, and then, working in this directory, execute the following command: zcat ntl-1.5.tar.Z | tar xvf - There is a makefile, which you might need to edit. You need to specify a C++ compiler, and optionally, a compatible C compiler. A few source files are written in pure C, and will compile under C or C++, but using a C compiler sometimes yields better code. The default settings in the makefile use the Gnu compilers g++ and gcc. There are a variety of compiler flags that you can set to customize the compilation, affecting the quality of the compiled code. * The -O2 flag, which invokes the optimizer, is essential for reasonable performance. * The -g option causes symbol tables to be generated that are used by a debugger; for the Gnu compilers, this is compatible with the optimizer, but for some other compilers it is not. If not, leave it out. * The -mv8 option should be used on Sparc stations that have an integer multiply instruction (Sparc-10 and later); otherwise, the compiler will not generate this instruction, causing a serious performance loss. * The -qarch=ppc option should be used on the RS6000/PowerPC when using the AIX compilers xlc/xlC to get access to the full instruction set. * The -+ should be added to the CPPFLAGS line in the makefile when using the AIX compiler xlC. * The -DAVOID_FLOAT option should be used on machines with fast integer multiplication, but slow floating point. On such machines, this will generally yield faster code. PowerPCs, for example, perform much better with this option. * The -DTBL_REM avoids some divisions in implementing multiplication in ZZ_pX. If you use the -DAVOID_FLOAT flag, then you should probably use this flag too. * The two flags -DAVOID_BRANCHING and -DFFT_PIPELINE should yield a performance gain on just about any RISC architecture, such as Sparc stations and Alphas. They don't seem to help on Pentiums, however. With the first flag, branches are replaced at several key points with equivalent code using shifts and masks. The second flag causes the FFT routine to use a software pipeline. * The -DRANGE_CHECK flag causes vector indices to be checked at run time, resulting in slower execution, of course. * The -DCPLUSPLUS_ONLY flag may be used of you are using only a C++ compiler, and no C compiler. This flag eliminates unnecessary "C" linkage. * On 32-bit platforms platforms that support it, the -DSINGLE_MUL flag causes a 26-bit radix to be used, and avoids integer multiplications, using floating point multiplications instead. This yields better code on machines with slow integer multiplication, and fast floating point. On old Sparc stations, this yields a marked performance gain. However, the 26-bit radix can be awkward in some applications. The -DFAST_INT_MUL flag can be used in conjunction with the -DSINGLE_MUL flag. This flag causes integer multiplications to be used where possible. One technical point to be aware of is that should the library be compiled with (resp. without) the -DSINGLE_MUL flag, than any code calling the library must also be compiled with (resp. without) this flag. FFT primes. After editing makefile, just execute make. The first thing that the makefile does is to make mach_desc.h, which defines some machine characteristics such as word size and machine precision. This is done by compiling and running a C program that figures out these characteristics on its own, and prints some diagnostics to the terminal. After this, the makefile will compile all the source files, and then create the library ntl.a. Finally, the makefile compiles and runs a series of test programs. The output generated should indicate if there are any problems. Executing make clean will remove unnecessary object files. Executing make clobber removes everything that was generated by a previous installation. When linking a program, you need to include ntl.a and -lm as libraries. The compilation should run smoothly on just about any UNIX platform. The only real trouble I've heard of is on the RS6000/PowerPC, where the GNU compiler seems to have a code-generation bug. The AIX compilers xlc/xlC, however, work fine. __________________________________________________________________________ Changes between NTL 1.0 and NTL 1.5 Programs written using NTL 1.0 should still work without change with NTL 1.5. In addition to some minor performance tuning and bug fixes, NTL 1.5 offers the following new features: * Implementation of Schnorr-Euchner algorithms for lattice basis reduction, including deep insertions and block Korkin Zolotarev reduction. These are significantly faster than the LLL algorithm in NTL 1.0. * Implementation of arbitrary-precision floating point. * Implementation of double precision with extended exponent range, which is useful for lattice basis reduction when the coefficients are large. * Faster polynomial multiplication over the integers, incorporating the Schoenhagge-Strassen method. * Compilation flags that increase performance on machines with poor floating-point performance. __________________________________________________________________________ Back to NTL