A NumPy-compatible array library accelerated by CUDA

High performance with CUDA

CuPy is an open-source array library accelerated with NVIDIA CUDA. CuPy provides GPU accelerated computing with Python. CuPy uses CUDA-related libraries including cuBLAS, cuDNN, cuRand, cuSolver, cuSPARSE, cuFFT and NCCL to make full use of the GPU architecture.
The figure shows CuPy speedup over NumPy. Most operations perform well on a GPU using CuPy out of the box. CuPy speeds up some operations more than 100X. Read the original benchmark article Single-GPU CuPy Speedups on the RAPIDS AI Medium blog.

Highly compatible with NumPy

CuPy's interface is highly compatible with NumPy; in most cases it can be used as a drop-in replacement. All you need to do is just replace numpy with cupy in your Python code. The Basics of CuPy tutorial is useful to learn first steps with CuPy.
CuPy supports various methods, indexing, data types, broadcasting and more. This comparison table shows a list of NumPy / SciPy APIs and their corresponding CuPy implementations.

>>> import cupy as cp
>>> x = cp.arange(6).reshape(2, 3).astype('f')
>>> x
array([[ 0.,  1.,  2.],
       [ 3.,  4.,  5.]], dtype=float32)
>>> x.sum(axis=1)
array([  3.,  12.], dtype=float32)

Easy to install

The easiest way to install CuPy is to use pip. CuPy provides wheels (precompiled binary packages) for the recommended environments. These packages include cuDNN and NCCL. Please read Install CuPy for more details.
CuPy can also be installed from source code. The install script in the source code automatically detects installed versions of CUDA, cuDNN and NCCL in your environment.

# For CUDA 9.0
pip install cupy-cuda90

# For CUDA 9.2
pip install cupy-cuda92

# For CUDA 10.0
pip install cupy-cuda100

# For CUDA 10.1
pip install cupy-cuda101

# For CUDA 10.2
pip install cupy-cuda102

# For CUDA 11.0
pip install cupy-cuda110

# For CUDA 11.1 (see here for Linux)
pip install cupy-cuda111

# Install CuPy from source
pip install cupy

Easy to write a custom kernel

You can easily make a custom CUDA kernel if you want to make your code run faster, requiring only a small code snippet of C++. CuPy automatically wraps and compiles it to make a CUDA binary. Compiled binaries are cached and reused in subsequent runs. Please read the User-Defined Kernels tutorial.
And, you can also use raw CUDA kernels via Raw modules.

>>> x = cp.arange(6, dtype='f').reshape(2, 3)
>>> y = cp.arange(3, dtype='f')
>>> kernel = cp.ElementwiseKernel(
...     'float32 x, float32 y', 'float32 z',
...     '''if (x - 2 > y) {
...       z = x * y;
...     } else {
...       z = x + y;
...     }''',
...     'my_kernel')
>>> kernel(x, y)
array([[ 0.,  2.,  4.],
       [ 0.,  4.,  10.]], dtype=float32)

Watch videos.

PyBay 2019, San Francisco
SciPy Japan Conference 2020, Virtual

Companies supporting CuPy.