Matrix-matrix addition/transposition (single-precision real).
Computes the sum of two single-precision real scaled and possibly (conjugate) transposed matrices.
Parameters: | handle : int
transa, transb : char
m : int
n : int
alpha : numpy.float32
A : ctypes.c_void_p
lda : int
beta : numpy.float32
B : ctypes.c_void_p
ldb : int
C : ctypes.c_void_p
ldc : int
|
---|
Examples
>>> import pycuda.autoinit
>>> import pycuda.gpuarray as gpuarray
>>> import numpy as np
>>> alpha = np.float32(np.random.rand())
>>> beta = np.float32(np.random.rand())
>>> a = np.random.rand(2, 3).astype(np.float32)
>>> b = np.random.rand(2, 3).astype(np.float32)
>>> c = alpha*a+beta*b
>>> a_gpu = gpuarray.to_gpu(a)
>>> b_gpu = gpuarray.to_gpu(b)
>>> c_gpu = gpuarray.empty(c.shape, c.dtype)
>>> h = cublasCreate()
>>> cublasSgeam(h, 'n', 'n', c.shape[0], c.shape[1], alpha, a_gpu.gpudata, a.shape[0], beta, b_gpu.gpudata, b.shape[0], c_gpu.gpudata, c.shape[0])
>>> np.allclose(c_gpu.get(), c)
True
>>> a = np.random.rand(2, 3).astype(np.float32)
>>> b = np.random.rand(3, 2).astype(np.float32)
>>> c = alpha*a.T+beta*b
>>> a_gpu = gpuarray.to_gpu(a.T.copy())
>>> b_gpu = gpuarray.to_gpu(b.T.copy())
>>> c_gpu = gpuarray.empty(c.T.shape, c.dtype)
>>> transa = 'c' if np.iscomplexobj(a) else 't'
>>> cublasSgeam(h, transa, 'n', c.shape[0], c.shape[1], alpha, a_gpu.gpudata, a.shape[0], beta, b_gpu.gpudata, b.shape[0], c_gpu.gpudata, c.shape[0])
>>> np.allclose(c_gpu.get().T, c)
True
>>> cublasDestroy(h)