163 lines
6.2 KiB
Plaintext
163 lines
6.2 KiB
Plaintext
====================
|
|
ISPC Examples README
|
|
====================
|
|
|
|
This directory has a number of sample ispc programs ported to GEN. Before building them,
|
|
install the appropriate ispc compiler binary and runtime into a directory in your path.
|
|
Add ISPC binary to your PATH. Then, do the following:
|
|
mkdir build
|
|
cd build
|
|
cmake ../
|
|
|
|
Some of the benchmarks are running ispc for CPU/GEN and then regular serial C++ implementations,
|
|
printing out execution time.
|
|
|
|
Simple
|
|
======
|
|
|
|
This is the most basic example. It executes a simple kernel on target device
|
|
(which can be a GEN GPU or CPU) and demonstrates basics concepts
|
|
of ISPC Runtime API (such as device, module, kernel, memory view).
|
|
It uses C++ API of ispcrt.
|
|
|
|
If no command line arguments are provided, the example chooses device
|
|
to execute on automatically. It is possible to force usage of concrete
|
|
device using command line options:
|
|
|
|
simple [ --cpu | --gpu ]
|
|
|
|
Simple-USM
|
|
==========
|
|
|
|
This example corresponds to the Simple example, but uses shared memory
|
|
mechanisms. The shared memory functionality in Level Zero allows
|
|
for allocating memory that is shared between the CPU and the GPU
|
|
and forms Unified Shared Memory (pointers valid on the CPU are also
|
|
valid on the GPU). There is no need to explicitly copy data between
|
|
the host and the device. This is handled by the Level Zero.
|
|
|
|
The ISPC Run Time enables using the USM via Array type
|
|
and provides an allocator that can be used in standard C++ containers, such
|
|
as std::vector.
|
|
|
|
AOBench
|
|
=======
|
|
|
|
This is an ISPC implementation of the "AO bench" benchmark
|
|
(http://syoyo.wordpress.com/2009/01/26/ao-bench-is-evolving/).
|
|
The command line arguments are:
|
|
|
|
ao (num iterations) (x resolution) (y resolution)
|
|
|
|
This examples also demontrates usage of C interface of ispcrt so you can see how to
|
|
execute the same ISPC kernel on CPU and GPU in a semaless way.
|
|
|
|
It executes the program for the given number of iterations, rendering an
|
|
(xres x yres) image each time and measuring the computation time with
|
|
serial and ispc implementations on CPU and GEN.
|
|
|
|
|
|
Mandelbrot
|
|
==========
|
|
|
|
Mandelbrot set generation. This example is extensively documented at the
|
|
http://ispc.github.com/example.html page. The comamnd line arguments are:
|
|
mandelbrot [--scale=<factor>] [tasks iterations] [serial iterations]
|
|
|
|
This examples also demontrates usage of C++ interface of ispcrt so you can see how to
|
|
execute the same ISPC kernel on CPU and GPU in a semaless way.
|
|
|
|
It executes the program for the given number of iterations, rendering an
|
|
image of fixed size each time and measuring the computation time with
|
|
serial and ispc implementations on CPU and GEN.
|
|
You can change scale of the image with --scale option.
|
|
|
|
|
|
Noise
|
|
=====
|
|
|
|
This example has an implementation of Ken Perlin's procedural "noise"
|
|
function, as described in his 2002 "Improving Noise" SIGGRAPH paper. The command
|
|
line arguments are:
|
|
|
|
noise [niterations] [group threads width] [group threads height]
|
|
|
|
This examples also demontrates usage of C++ interface of ispcrt so you can see how to
|
|
execute the same ISPC kernel on CPU and GPU in a semaless way.
|
|
|
|
It executes the program for the given number of iterations in particular
|
|
thread space, rendering an image of fixed size each time and measuring the
|
|
computation time with serial and ispc implementations on CPU and GEN.
|
|
|
|
|
|
SGEMM
|
|
=====
|
|
This program uses ISPC to implement naive version of matrix multiply. It also contains
|
|
CM implementation so if you have CM compiler installed you can compare ISPC/CM performance.
|
|
|
|
The command line arguments are:
|
|
sgemm (optional)[num iterations] (optional)[group threads width] (optional)[group threads height]
|
|
|
|
This example demonstrate usage of pure Level 0.
|
|
|
|
|
|
Simple-DPCPP
|
|
======================================
|
|
This simple example demonstrates a basic scanerio of interoperability between ISPC
|
|
and the oneAPI DPC++ Compiler. It runs an ISPC kernel using ISPC Run Time and then
|
|
creates a SYCL context using native Level Zero handles obtained from ISPCRT.
|
|
Then it runs a corresponding SYCL kernel in SYCL. The results are compared to confirm
|
|
that those are identical.
|
|
|
|
It requires oneAPI DPC++ Compiler.
|
|
|
|
To enable this example please configure the build of ISPC examples using the following
|
|
command line:
|
|
|
|
cmake -DCMAKE_C_COMPILER=<dpcpp_path>/bin/clang -DCMAKE_CXX_COMPILER=<dpcpp_path>/bin/clang++ \
|
|
-DISPC_INCLUDE_DPCPP_EXAMPLES=ON <examples source dir>
|
|
|
|
Running this example may require setting the LD_LIBRARY_PATH environmental variable to include
|
|
oneAPI DPC++ Compiler libraries.
|
|
|
|
|
|
Simple-DPCPP-L0
|
|
======================================
|
|
This simple example demonstrates a basic scanerio of interoperability between ISPC
|
|
and the oneAPI DPC++ Compiler. It runs an ISPC kernel in a Level Zero context and then
|
|
a corresponding SYCL kernel in SYCL context created from the same Level Zero context.
|
|
Then the results are compared to check if those are identical.
|
|
The key difference between this and the previous example is that this one uses
|
|
native Level Zero API then the previous one uses ISPCRT.
|
|
|
|
It requires oneAPI DPC++ Compiler.
|
|
|
|
To enable this example please configure the build of ISPC examples using the following
|
|
command line:
|
|
|
|
cmake -DCMAKE_C_COMPILER=<dpcpp_path>/bin/clang -DCMAKE_CXX_COMPILER=<dpcpp_path>/bin/clang++ \
|
|
-DISPC_INCLUDE_DPCPP_EXAMPLES=ON <examples source dir>
|
|
|
|
Running this example may require setting the LD_LIBRARY_PATH environmental variable to include
|
|
oneAPI DPC++ Compiler libraries.
|
|
|
|
Pipeline-DPCPP
|
|
======================================
|
|
This example demonstrates how to create a pipeline of kernels in the ISPC
|
|
and the oneAPI DPC++ Compiler that cooperate working on a single problem represented
|
|
by a memory region. The memory region is shared between the kernels, but it also
|
|
is shared between the CPU and the GPU. The Level Zero runtime takes care
|
|
of the necessary data movements in an efficent way and the user does not need
|
|
to manage copying data to/from the GPU.
|
|
|
|
This example requires the oneAPI DPC++ Compiler.
|
|
|
|
To enable this example please configure the build of ISPC examples using the following
|
|
command line:
|
|
|
|
cmake -DCMAKE_C_COMPILER=<dpcpp_path>/bin/clang -DCMAKE_CXX_COMPILER=<dpcpp_path>/bin/clang++ \
|
|
-DISPC_INCLUDE_DPCPP_EXAMPLES=ON <examples source dir>
|
|
|
|
Running this example may require setting the LD_LIBRARY_PATH environmental variable to include
|
|
oneAPI DPC++ Compiler libraries.
|