Change Log
Release 0.5.0 - (under development)
- Switch to PEP 440 version numbering.
- Replace distribute_setup.py with ez_setup.py.
- Improve support for latest NVIDIA GPUs.
- Add more wrappers for CUBLAS 5 functions (enh. by Teodor Moldovan, Sander Dieleman).
- Add support for CULA Dense Free R17 (enh. by Alex Rubinsteyn).
- Memoize elementwise kernel used by ifft scaling (#37).
- Speed up misc.maxabs using reduction and kernel memoization.
- Speed up misc.cumsum using scan and kernel memoization.
- Speed up linalg.conj and misc.diff using elementwise kernel and memoization.
- Speed up special.{sici,exp1,expi} using elementwise kernel and memoization.
- Add wrappers for experimental multi-GPU CULA routines in CULA Dense R14+.
- Use ldconfig to find library paths rather than libdl (#39).
- Fix win32 platform detection.
- Add Cholesky factorization/solve routines (enh. by Steve Taylor).
- Fix Cholesky factorization/solve routines (fix by Thomas Unterthiner).
- Enable dot() function to operate inplace (enh. by Thomas Unterthiner).
- Python 3 compatibility improvements (enh. by Thomas Unterthiner).
- Support for Fortran-order arrays in dot() and cho_solve() (enh. by Thomas Unterthiner)
- CULA-based matrix inversion (enh. by Thomas Unterthiner).
- Add add_diag() function (enh. by Thomas Unterthiner).
- Use cublas*copy in diag() function (enh. by Thomas Unterthiner).
- Improved MacOSX compatibility (enh. by Michael M. Forbes).
- Find CUBLAS version even when it is only accessible via LD_LIBRARY_PATH (enh. by Frédéric Bastien).
- Get both major and minor version numbers from CUBLAS library when determining
version.
- Handle unset LD_LIBRARY_PATH variable (fix by Jan Schlüter).
- Fix library search on MacOS X (fix by capdevc).
- Fix library search on Windows.
- Add Windows support to CULA wrappers.
- Enable specification of memory pool allocator to linalg functions (enh. by
Thomas Unterthiner).
- Improve misc.select_block_grid_sizes() logic to handle different GPU hardware.
- Compute transpose using CUDA 5.0 CUBLAS functions rather than with inefficient naive kernel.
Release 0.042 - (March 10, 2013)
- Add complex exponential integral.
- Fix typo in cublasCgbmv.
- Use CUBLAS v2 API, add preliminary support for CUBLAS 5 functions.
- Detect CUBLAS version without initializing the GPU.
- Work around numpy bug #1898.
- Fix issues with pycuda installations done via easy_install/pip.
- Add support for specifying streams when creating FFT plans.
- Successfully find CULA R13a libraries.
- Raise exceptions when functions in the full release of CULA Dense are invoked
without the library installed.
- Perform post-fft scaling in-place.
- Fix broken Python 2.6 compatibility (#19).
- Download distribute for package installation if it isn’t available.
- Prevent absence of CULA from causing import errors (enh. by Jacob Frelinger)
- FFT batch tests and FFTW mode configuration (enh. by Lars Pastewka)
Release 0.041 - (May 22, 2011)
- Fix bug preventing installation with pip.
Release 0.04 - (May 11, 2011)
- Fix bug in cutoff_invert kernel.
- Add get_compute_capability function and other goodies to misc module.
- Use pycuda-complex.hpp to improve kernel readability.
- Add integrate module.
- Add unit tests for high-level functions.
- Automatically determine device used by current context.
- Support batched and multidimensional FFT operations.
- Extended dot() function to support implicit transpose/Hermitian.
- Support for in-place computation of singular vectors in svd() function.
- Simplify kernel launch setup.
- More CULA routine wrappers.
- Wrappers for CULA R11 auxiliary routines.
Release 0.03 - (November 22, 2010)
- Add support for some functions in the premium version of CULA toolkit.
- Add wrappers for all lapack functions in basic CULA toolkit.
- Fix pinv() to properly invert complex matrices.
- Add Hermitian transpose.
- Add tril function.
- Fix missing library detection.
- Include missing CUDA headers in package.
Release 0.02 - (September 21, 2010)
- Add documentation.
- Update copyright information.
Release 0.01 - (September 17, 2010)