scikits.cuda.cublas.cublasSgeam

scikits.cuda.cublas.cublasSgeam(handle, transa, transb, m, n, alpha, A, lda, beta, B, ldb, C, ldc)[source]

Matrix-matrix addition/transposition (single-precision real).

Computes the sum of two single-precision real scaled and possibly (conjugate) transposed matrices.

Parameters:

handle : int

CUBLAS context

transa, transb : char

‘t’ if they are transposed, ‘c’ if they are conjugate transposed, ‘n’ if otherwise.

m : int

Number of rows in A and C.

n : int

Number of columns in B and C.

alpha : numpy.float32

Constant by which to scale A.

A : ctypes.c_void_p

Pointer to first matrix operand (A).

lda : int

Leading dimension of A.

beta : numpy.float32

Constant by which to scale B.

B : ctypes.c_void_p

Pointer to second matrix operand (B).

ldb : int

Leading dimension of A.

C : ctypes.c_void_p

Pointer to result matrix (C).

ldc : int

Leading dimension of C.

Examples

>>> import pycuda.autoinit
>>> import pycuda.gpuarray as gpuarray
>>> import numpy as np
>>> alpha = np.float32(np.random.rand())
>>> beta = np.float32(np.random.rand())
>>> a = np.random.rand(2, 3).astype(np.float32)
>>> b = np.random.rand(2, 3).astype(np.float32)
>>> c = alpha*a+beta*b
>>> a_gpu = gpuarray.to_gpu(a)
>>> b_gpu = gpuarray.to_gpu(b)
>>> c_gpu = gpuarray.empty(c.shape, c.dtype)
>>> h = cublasCreate()
>>> cublasSgeam(h, 'n', 'n', c.shape[0], c.shape[1], alpha, a_gpu.gpudata, a.shape[0], beta, b_gpu.gpudata, b.shape[0], c_gpu.gpudata, c.shape[0])
>>> np.allclose(c_gpu.get(), c)
True
>>> a = np.random.rand(2, 3).astype(np.float32)
>>> b = np.random.rand(3, 2).astype(np.float32)
>>> c = alpha*a.T+beta*b
>>> a_gpu = gpuarray.to_gpu(a.T.copy())
>>> b_gpu = gpuarray.to_gpu(b.T.copy())
>>> c_gpu = gpuarray.empty(c.T.shape, c.dtype)
>>> transa = 'c' if np.iscomplexobj(a) else 't'
>>> cublasSgeam(h, transa, 'n', c.shape[0], c.shape[1], alpha, a_gpu.gpudata, a.shape[0], beta, b_gpu.gpudata, b.shape[0], c_gpu.gpudata, c.shape[0])
>>> np.allclose(c_gpu.get().T, c)
True
>>> cublasDestroy(h)