1716 lines
75 KiB
Plaintext
1716 lines
75 KiB
Plaintext
=== v1.21.0 === (18 August 2023)
|
|
|
|
ISPC release with template function specializations support; changed rules for signed integer overflow, which match
|
|
C/C++ behavior and lead to more aggressive optimizations; an enhanced ISPC Runtime; multiple stability and performance
|
|
fixes and more. The release is based on patched LLVM 15.0.7.
|
|
|
|
Language changes:
|
|
|
|
- Added support for function template specializations with explicit template arguments.
|
|
For more details please refer to Function Templates section of documentation.
|
|
|
|
- Modified behavior for signed integer overflow.
|
|
Now, in case of signed integer overflow, `ispc` will assume undefined behavior similar to C and C++. This change may
|
|
cause compatibility issues. You can manage this behavior by using the `--[no-]wrap-signed-int` compiler switch. The default
|
|
behavior (before version 1.21.0) can be preserved by using `--wrap-signed-int`, which maintains defined wraparound
|
|
behavior for signed integers, though it may limit some compiler optimizations.
|
|
|
|
New hardware support:
|
|
|
|
Added support of Intel Meteor Lake Xe-LPG graphics:
|
|
|
|
- Added two new ISPC targets: `xelpg-x16` and `xelpg-x8`
|
|
- Added two new device names: `mtl-m` and `mtl-p`
|
|
|
|
Infrastructure changes:
|
|
|
|
- ISPC now uses LLVM's new pass manager. Optimization pipeline was modified by introducing early LoopFullUnrollPass
|
|
which matches ISPC unrolled loops with manually unrolled loops in many cases.
|
|
- Introduced ISPC superbuild, which facilitates building ISPC with Xe dependencies (LLVM, L0, vc-intrinsics,
|
|
SPIRV-Translator). It can generate an archive with dependencies or consume a pre-built archive to build ISPC only.
|
|
It also enables generating LTO or LTO+PGO enabled builds of LLVM and ISPC.
|
|
- Supported building ISPC with LLVM 16.
|
|
|
|
New compiler switches:
|
|
|
|
- `--mcmodel` switch, which accepts `small` and `large` values. The definition is similar to gcc/clang. When `large`
|
|
model is used, it enables programs larger than 2Gb.
|
|
- `--opt=disable-gathers` and `--opt=disable-scatters` options, which disable generation of gathers and scatters
|
|
instructions on platforms that support them (for performance experiments).
|
|
- `--[no-]wrap-signed-int` switches, which [does not] preserve(s) wrap-around behavior on signed integer overflow.
|
|
|
|
ISPC Runtime improvements:
|
|
|
|
- Added `ispcrtSetTaskingCallbacks` to the ISPCRT API, allowing the override of default implementations of
|
|
`ISPCLaunch`, `ISPCAlloc`, and `ISPCSync`.
|
|
- Removed compile-time Level Zero dependency from ISPCRT, no longer necessary after the ISPCRT split into CPU and GPU
|
|
parts.
|
|
|
|
|
|
Recommended versions of Runtime Dependencies when targeting GPU:
|
|
|
|
Linux:
|
|
|
|
- Intel(R) Graphics Compute Runtime
|
|
https://github.com/intel/compute-runtime/releases/tag/23.22.26516.18
|
|
- Level Zero Loader
|
|
https://github.com/oneapi-src/level-zero/releases/tag/v1.13.5
|
|
- Threading Building Blocks (TBB)
|
|
|
|
Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™
|
|
available at https://dgpu-docs.intel.com/driver/installation.html
|
|
|
|
Windows:
|
|
|
|
- Intel(R) Graphics Windows(R) DCH Drivers 31.0.101.4644
|
|
https://www.intel.com/content/www/us/en/download/726609/intel-arc-iris-xe-graphics-whql-windows.html
|
|
- Level Zero Loader
|
|
https://github.com/oneapi-src/level-zero/releases/tag/v1.13.5
|
|
- OpenCL™ Offline Compiler (OCLOC)
|
|
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
|
|
(this is needed for AoT compilation on Windows only)
|
|
- Supported GPU platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core
|
|
processor graphics
|
|
|
|
Components revisions used in GPU-enabled build:
|
|
- https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/e82ecc2
|
|
- https://github.com/intel/vc-intrinsics/commit/910db48
|
|
- https://github.com/oneapi-src/level-zero/commit/e1f09b4 (v1.13.5)
|
|
- https://github.com/llvm/llvm-project/commit/8dfdcc7 (llvmorg-15.0.7) +
|
|
patches from llvm_patches folder
|
|
|
|
=== v1.20.0 === (5 May 2023)
|
|
|
|
ISPC release with compile time improvements, enhancements in the ISPC Runtime,
|
|
and a number of code generation fixes. The release is based on patched LLVM
|
|
15.0.7.
|
|
|
|
ISPC distribution changes.
|
|
|
|
ISPC binaries got faster and smaller. ISPC binaries got smaller approximately
|
|
by 1/3 and a few percent faster. The distribution macOS now includes x86_64,
|
|
arm64 and Universal Binaries. On Linux a snap package with the latest ISPC is
|
|
available.
|
|
|
|
ISPC Runtime.
|
|
|
|
- ispcrt was split under the hood into GPU and CPU parts, which are loaded
|
|
dynamically. This means you don't need GPU dependencies when running CPU-only
|
|
code using ispcrt.
|
|
- ispcrt got support for fences to enable CPU/GPU asynchronous computations.
|
|
- ispcrt does not depend on OpenMP runtime anymore, but requires TBB.
|
|
|
|
New targets.
|
|
|
|
For better fine-tuning when targeting old platforms, sse4 targets were split
|
|
into sse4.1 and sse4.2 targets. All changes are backward compatible - sse4 are
|
|
aliased to sse4.2 and multi-target compilation allows only one of sse4 target,
|
|
so build systems are not confused.
|
|
|
|
Improvements for contributors
|
|
|
|
We got a brand new Github Codespaces config, so you are welcome to start
|
|
hacking on ISPC in browser. Give it a try!
|
|
|
|
Recommended versions of Runtime Dependencies when targeting GPU.
|
|
|
|
Linux:
|
|
|
|
- Intel(R) Graphics Compute Runtime
|
|
https://github.com/intel/compute-runtime/releases/tag/23.09.25812.14
|
|
- Level Zero Loader
|
|
https://github.com/oneapi-src/level-zero/releases/tag/v1.10.0
|
|
- Threading Building Blocks (TBB)
|
|
|
|
Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™
|
|
available at https://dgpu-docs.intel.com/releases/stable_602_20230323.html
|
|
|
|
Windows:
|
|
|
|
- Intel(R) Graphics Windows(R) DCH Drivers 31.0.101.4146
|
|
https://www.intel.com/content/www/us/en/download/726609/772016/intel-arc-iris-xe-graphics-whql-windows.html
|
|
- Level Zero Loader
|
|
https://github.com/oneapi-src/level-zero/releases/tag/v1.10.0
|
|
- OpenCL™ Offline Compiler (OCLOC)
|
|
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
|
|
(this is needed for AoT compilation on Windows only)
|
|
- Supported platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core
|
|
processor graphics
|
|
|
|
Components revisions used in GPU-enabled build:
|
|
- https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/855eb27
|
|
- https://github.com/intel/vc-intrinsics/commit/29fe787
|
|
- https://github.com/oneapi-src/level-zero/commit/0d56d8e (v1.10.0)
|
|
- https://github.com/llvm/llvm-project/commit/8dfdcc7 (llvmorg-15.0.7) +
|
|
patches from llvm_patches folder
|
|
|
|
|
|
=== v1.19.0 === (28 February 2022)
|
|
|
|
ISPC release with long-awaited function templates technical preview; new
|
|
hardware support for 4th generation Intel® Xeon® Scalable (codename Sapphire
|
|
Rapids) CPUs, Intel® Data Center GPU Max (codename Ponte Vecchio), and updated
|
|
support for Intel® Arc™ GPUs; improved performance and compile time; an enhanced
|
|
ISPC Runtime; a bunch of stability fixes and more. The release is based on
|
|
patched LLVM 14.0.6.
|
|
|
|
Language changes.
|
|
|
|
Function templates support was introduced in ISPC and it's currently in
|
|
technical preview, meaning that current language definition might change in
|
|
future versions. For more details please refer to Function Templates
|
|
section of documentation.
|
|
|
|
ISPC has got several other language changes needed for ISPC/SYCL
|
|
interoperability (an experimental feature):
|
|
1. Support of `__regcall` attribute.
|
|
2. A new language construct `invoke_sycl` which is used to call SYCL function
|
|
from ISPC. The function must be declared on ISPC side with `extern "SYCL"
|
|
__regcall` qualifiers.
|
|
3. Support of `extern "C"` functions definitions.
|
|
|
|
New hardware support.
|
|
|
|
1. Targets for 4th generation Intel® Xeon® Scalable (codename Sapphire Rapids)
|
|
CPUs were introduced: `avx512spr-x4`, `avx512spr-x8`,`avx512spr-x16`,
|
|
`avx512spr-x32`, `avx512spr-x64`. The key difference with other AVX512 targets
|
|
is native support for FP16.
|
|
2. New `xehpc-x16`/`xehpc-x32` targets were added for Intel® Data Center GPU Max
|
|
(codename Ponte Vecchio). A new `pvc` device name was introduced.
|
|
3. New device names `acm-g10`, `acm-g11`, and `acm-g12` were added for Intel®
|
|
Arc™ Graphics. The `dg2` device name has been removed.
|
|
4. Support for Aarch64 targets was enabled on Windows.
|
|
|
|
ISPC Runtime.
|
|
|
|
1. A chunking allocator was introduced that can be enabled with `ISPCRT_MEM_POOL`.
|
|
2. An API was added to link input modules through `ispcrtStaticLinkModules`
|
|
(using linking on vISA level under the hood) and `ispcrtDynamicLinkModules`
|
|
(using binary linking under the hood).
|
|
3. Support for creating multiple devices within a single context was added, and
|
|
an API was added to get a function pointer from a module. It's also possible to
|
|
construct ISPC RT objects from native handlers now.
|
|
4. ISPC RT verbose mode was added that can be enabled through `ISPCRT_VERBOSE`.
|
|
|
|
Performance.
|
|
|
|
There's a significant performance boost on Xe targets caused by updates in the
|
|
ISPC optimization pipeline and the usage of the new spill-cost IGC finalizer
|
|
function, which dramatically reduces spill size.
|
|
|
|
Utilities.
|
|
|
|
1. ISPC `link` mode has been introduced, allowing to link several LLVM bitcode
|
|
or SPIR-V files and output the result as LLVM bitcode or SPIR-V. For example:
|
|
ispc link test_a.bc test_b.bc --emit-spirv -o test.spv
|
|
|
|
2. CMake utilities was improved, and support was added for building an ISPC GPU
|
|
target from multiple ISPC files, linking them with `ispc --link`. An
|
|
application's ISPC CMakeLists would look like this:
|
|
add_ispc_library(my_ispc_lib filea.ispc fileb.ispc)
|
|
ispc_target_include_directories(my_ispc_lib <some directory path>)
|
|
ispc_target_compile_definitions(my_ispc_lib -DMY_DEFINE=1)
|
|
|
|
add_ispc_library(my_ispc_kernel filec.ispc)
|
|
ispc_target_link_libraries(my_ispc_kernel my_ispc_lib)
|
|
|
|
Runtime Dependencies when targeting GPU.
|
|
|
|
Linux:
|
|
|
|
- Intel(R) Graphics Compute Runtime
|
|
https://github.com/intel/compute-runtime/releases/tag/22.49.25018.24
|
|
- Level Zero Loader
|
|
https://github.com/oneapi-src/level-zero/releases/tag/v1.9.4
|
|
- OpenMP Runtime. Consult your Linux distribution documentation for the
|
|
installation of OpenMP runtime instructions. No specific version is required.
|
|
|
|
Windows:
|
|
|
|
- Intel(R) Graphics Windows(R) DCH Drivers 30.0.101.4091
|
|
https://www.intel.com/content/www/us/en/download/726609/intel-arc-iris-xe-graphics-whql-windows.html
|
|
- Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v1.9.4
|
|
- OpenCL™ Offline Compiler (OCLOC)
|
|
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
|
|
(this is needed for AoT compilation on Windows only)
|
|
|
|
Components revisions used in GPU-enabled build:
|
|
|
|
- https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/c469fa8
|
|
- https://github.com/intel/vc-intrinsics/commit/3ac855c
|
|
- https://github.com/oneapi-src/level-zero/commit/4ed13f3 (v1.9.4)
|
|
- https://github.com/llvm/llvm-project/commit/f28c006 (llvmorg-14.0.6) +
|
|
patches from llvm_patches folder
|
|
|
|
=== v1.18.0 === (5 May 2022)
|
|
|
|
An ISPC release with a bunch of stability and performance fixes, improvements
|
|
for ISPC Runtime, and complete stdlib support for `float16` type. This release
|
|
is based on patched LLVM 13.0.1.
|
|
|
|
`-E` switch was introduced to run preprocessor only. An old bug preventing the
|
|
compiler to crash in case of preprocessor error was fixed and now the compiler
|
|
will properly crash. As some users considered an old behavior convenient in
|
|
some cases, `--ignore-preprocessor-errors` switch was introduced to maintain
|
|
the old behavior.
|
|
|
|
Targets naming was changed for the targets with native masking support to drop
|
|
"base type" from the naming scheme, the old naming is accepted for
|
|
compatibility. This affected AVX512 target names, the new names are
|
|
`avx512skx-x4`, `avx512skx-x8`, `avx512skx-x16`, `avx512skx-x32`,
|
|
`avx512skx-x64`, and `avx512knl-x16`.
|
|
|
|
For debugging and for those, who are interested in understanding compiler
|
|
internals, `--ast-dump` switch was introduced. The produced dump of AST
|
|
(Abstract Syntax Tree) is intentionally made to look like clang AST dump for
|
|
convenience.
|
|
|
|
Standard library gained full support for `float16` type. Note that it is fully
|
|
supported only on the targets with native hardware support. On the other
|
|
targets emulation is still not guaranteed but may work in some cases.
|
|
|
|
Among other fixes, it is worth mentioning the following:
|
|
- fixed a bug #1308 affecting multi-target compilation
|
|
- a bunch of fixes to make it easier to build ISPC on FreeBSD, even though
|
|
FreeBSD is not officially supported
|
|
|
|
Improvements for the ISPC Runtime in this release:
|
|
- flexible task system selection during build
|
|
- support of ISPCRT build separate from ISPC
|
|
- support of ISPCRT build for CPU only
|
|
- version check in CMake
|
|
- new API to get the type of allocated memory (`ispcrtGetMemoryViewAllocType`
|
|
and `ispcrtGetMemoryAllocType`)
|
|
- new API for memory copy on device (`ispcrtCopyMemoryView`)
|
|
- support of device-only memory without corresponding application memory.
|
|
|
|
Performance on Xe targets was significantly improved in this release due to
|
|
optimizations in ISPC and Vector Backend.
|
|
|
|
Runtime Dependencies when targeting GPU:
|
|
|
|
Linux:
|
|
|
|
- Intel(R) Graphics Compute Runtime
|
|
https://github.com/intel/compute-runtime/releases/tag/22.17.23034
|
|
- Level Zero Loader
|
|
https://github.com/oneapi-src/level-zero/releases/tag/v1.7.15
|
|
- OpenMP Runtime. Consult your Linux distribution documentation for the
|
|
installation of OpenMP runtime instructions. No specific version is required.
|
|
|
|
Windows:
|
|
|
|
- Intel(R) Graphics Windows(R) DCH Drivers 30.0.101.1660
|
|
https://www.intel.com/content/www/us/en/download/19344/intel-graphics-windows-dch-drivers.html
|
|
- Level Zero Loader
|
|
https://github.com/oneapi-src/level-zero/releases/tag/v1.7.15
|
|
- OpenCL™ Offline Compiler (OCLOC)
|
|
https://registrationcenter-download.intel.com/akdlm/irc_nas/18653/ocloc_win_101.1660.zip
|
|
(this is needed for AoT compilation on Windows only)
|
|
|
|
Components revisions used in GPU-enabled build:
|
|
|
|
- https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/d7a0304
|
|
- https://github.com/intel/vc-intrinsics/commit/1e2562d
|
|
- https://github.com/oneapi-src/level-zero/commit/bb7fff0 (v1.7.15)
|
|
- https://github.com/llvm/llvm-project/commit/75e33f7 (llvmorg-13.0.1) +
|
|
patches from llvm_patches folder
|
|
|
|
=== v1.17.0 === (14 January 2022)
|
|
|
|
An ISPC release with massive update of Xe targets, including support for
|
|
forthcoming XeHPG GPUs, improvements for `double` type on AVX512 targets, and
|
|
multiple standard library improvements. Windows and Linux binaries in this
|
|
release support both CPU and GPU targets, while macOS binary supports only CPU.
|
|
This release is based on patched LLVM 12.0.1.
|
|
|
|
Improvements for CPU targets:
|
|
- Performance improvements for `double` type on AVX512 targets - better use of
|
|
gather/scatter instructions, 2-5x improvements for `rsqrt()` and `rcp()`
|
|
standard library functions.
|
|
- New `avx512skx-i32x4` target.
|
|
- `aos_to_soa` and `soa_to_aos` performance improvements for `-x8` and `-x16`
|
|
targets on CPU.
|
|
- `--math-lib=svml` mode was fixed and extended - it requires Intel® C++
|
|
Compiler (`icc` or `icx`) to link the binary.
|
|
- `zen1`, `zen2`, and `zen3` CPU definitions were added.
|
|
- Added experimental support for PS5 platform.
|
|
|
|
ISPC language got experimental support for IEEE 754 half-precision data type -
|
|
`float16`. Not all library functions are supported yet with this type. The key
|
|
focus in this release was on hardware natively supporting this type.
|
|
|
|
This update includes breaking changes in compiler switches for Xe targets:
|
|
- Graphics targets `genx-x8` and `genx-x16` were renamed to `gen9-x8` and
|
|
`gen9-x16`.
|
|
- Compiler architectures for graphics target were renamed from `genx32` and
|
|
`genx64` to `xe32` and `xe64`.
|
|
- Xe targets were renamed from uppercase to lowercase (so instead of SKL/TGLLP
|
|
it is now skl/tgllp).
|
|
- A new `--device` switch (which is an alias for the existing `--cpu` switch)
|
|
was introduced. Now the recommended way to specify the required platform for
|
|
CPU and GPU is: `--device=<platform>`
|
|
|
|
Also this release changes `export` and `task` functions definition on GPU. Now
|
|
GPU kernel is ISPC `task` function only, `export` functions cannot be invoked
|
|
from host (i.e. called from ISPC Runtime/L0 Runtime) anymore. `export`
|
|
functions are ready to be linked with and called from other GPU modules.
|
|
Currently, ISPC experimentally supports such interoperability with Explicit
|
|
SIMD SYCL* Extension (ESIMD).
|
|
|
|
New Xe targets were added:
|
|
- `xelp-x8` and `xelp-x16`. XeLP refers to XeLP generation of hardware
|
|
(TigerLake chips and alike).
|
|
- `xehpg-x8` and `xehpg-x16`. XeHPG is the architecture name for the
|
|
forthcoming Intel® Arc™ GPUs codename Alchemist..
|
|
|
|
GPU part has a bunch of stability, performance, and usability improvements
|
|
including but not limited to `alloca()` with constant parameter support,
|
|
`assume()` support, improved performance for double math functions and integer
|
|
division.
|
|
|
|
`ISPC Runtime` performance was improved several times by fixing the setting of
|
|
local group size for kernels, using events as a synchronization mechanism, and
|
|
utilizing HW compute and copy engines. There is also a new structure
|
|
`ISPCRTModuleOptions` to pass additional options to VC backend if needed.
|
|
Currently, `ISPCRTModuleOptions` allows setting of stack size for VC backend
|
|
which is used to compile SPIR-V.
|
|
|
|
Runtime Dependencies when targeting GPU:
|
|
|
|
Linux:
|
|
|
|
- Intel(R) Graphics Compute Runtime
|
|
https://github.com/intel/compute-runtime/releases/tag/22.02.22151
|
|
- Level Zero Loader
|
|
https://github.com/oneapi-src/level-zero/releases/tag/v1.7.4
|
|
- OpenMP Runtime. Consult your Linux distribution documentation for the
|
|
installation of OpenMP runtime instructions. No specific version is required.
|
|
|
|
Windows:
|
|
|
|
- Intel(R) Graphics Windows(R) DCH Drivers 30.0.101.1191
|
|
https://www.intel.com/content/www/us/en/download/19344/intel-graphics-windows-dch-drivers.html
|
|
- Level Zero Loader
|
|
https://github.com/oneapi-src/level-zero/releases/tag/v1.7.4
|
|
- OpenCL™ Offline Compiler (OCLOC)
|
|
https://software.intel.com/sites/downloads/ocloc/ocloc_win_101.1191.zip (this
|
|
is needed for AoT compilation on Windows only)
|
|
|
|
Components revisions used in GPU-enabled build:
|
|
|
|
KhronosGroup/SPIRV-LLVM-Translator@ed25f1b intel/vc-intrinsics@3a5f4b4
|
|
oneapi-src/level-zero@2824c1f (v1.7.4) llvm/llvm-project@fed4134 (llvmorg-12.0.1) + patches from llvm_patches folder
|
|
|
|
=== v1.16.1 === (15 July 2021)
|
|
|
|
A minor ISPC update, which has a bug fix for [issue
|
|
#2111](https://github.com/ispc/ispc/issues/2111) and is based on patched
|
|
version of LLVM 12.0.1.
|
|
|
|
The bug fix affects x86 targets only and shows up as incorrect code generation
|
|
for the sequence of `shuffle()` and `reduce_add()` stdlib functions.
|
|
|
|
If you are building `ispc` from the sources, note that the fix is implemented
|
|
as a patch for LLVM backend and LLVM must be built with this patch applied in
|
|
order for this fix to take an effect. Stock build of LLVM 12.0.1 will not
|
|
contain this bug fix.
|
|
|
|
=== v1.16.0 === (11 June 2021)
|
|
|
|
An ISPC release with language extensions for performance fine tuning, cpu
|
|
definitions for `AlderLake` and `SapphireRapids` targets, support for macOS ARM
|
|
targets, and massive update of Intel GPUs support. Windows and Linux binaries
|
|
in this release support both CPU and GPU targets, while macOS binary supports
|
|
only CPU. This release is based on patched LLVM 12.0.0.
|
|
|
|
The language changes include the following:
|
|
- The ability to directly call LLVM intrinsics from ISPC source. This should be
|
|
handy for performance fine tuning and reaching the hardware instructions not
|
|
yet covered by the standard library. Note that it is an experimental feature
|
|
and is enabled only with `--enable-llvm-intrinsics` switch. Please refer to
|
|
`LLVM Intrinsic Functions` section of the user manual for more details.
|
|
- `assume()` optimization hint, which can be used for communicating assumptions
|
|
to the optimizer. It will not lead to runtime check, unlike `assert()` calls.
|
|
This is intended for optimizations like removing null pointer checks, removing
|
|
loop reminders, communicating alignment information to the optimizer, and etc.
|
|
Please refer to `Compiler Optimization Hints` section of the user manual for
|
|
more details.
|
|
- Support for stack memory allocations through `alloca()` calls.
|
|
- `trunc()` standard library functions.
|
|
|
|
Changes for CPU targets:
|
|
- CPU definitions for `AlderLake` and `SapphireRapids` were added: `alderlake`
|
|
and `sapphirerapids` respectively.
|
|
- CPU definition for Apple ARM chips were added: `apple-a7`, `apple-a10`,
|
|
`apple-a11`, `apple-a12`, `apple-a13`, `apple-a14`.
|
|
- Support for macOS ARM targets was added.
|
|
|
|
Using GPU-enabled binaries you can build ISPC programs and run them on
|
|
Intel(R) Core(tm) Processors with Gen9 graphics (formerly `Skylake`,
|
|
`Kaby Lake`, `Coffee Lake`) and Gen12 graphics (TigerLake mobile CPU) using
|
|
`--target` options (`genx-x8` and `genx-x16`) and `--cpu` option for
|
|
specifying particular platform (e.g. `--cpu=TGLLP`).
|
|
|
|
The main GPU feature of the current release is Windows support.
|
|
There are also a bunch of stability and performance improvements.
|
|
Here are some of them:
|
|
- ISPC Runtime got support of unified shared memory and multi GPU. Also, there is
|
|
a new `TaskQueue::submit()` method which allows to start executing, but don't
|
|
wait for the completion.
|
|
- Thread private memory was mapped to SVM in VC backend. It greatly improves
|
|
stability of the current release. It may affect performance on Gen9 graphics
|
|
but we do not expect any significant changes on Gen12.
|
|
- L0 binary generation was reworked through libocloc. Supported on Linux only.
|
|
|
|
More details about the current state of GPU support are available here:
|
|
https://ispc.github.io/ispc_for_xe.html
|
|
|
|
For build instructions check our docker recipe:
|
|
https://github.com/ispc/ispc/blob/main/docker/ubuntu/xpu_ispc_build/Dockerfile
|
|
|
|
GPU support is still in Beta stage so you may experience some issues but we
|
|
strongly encourage you to try it out and give us feedback! You can reach us
|
|
through Github discussions and issues, or on Twitter (@ispc_updates).
|
|
|
|
Runtime Dependencies when targeting GPU:
|
|
|
|
Linux:
|
|
- Intel(R) Graphics Compute Runtime https://github.com/intel/compute-runtime/releases/tag/21.21.19914
|
|
- Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v1.2.3
|
|
- OpenMP Runtime. Consult your Linux distribution documentation for the
|
|
installation of OpenMP runtime instructions. No specific version is required.
|
|
|
|
Windows:
|
|
- Intel(R) Graphics - BETA Windows(R) 10 DCH Drivers 30.0.100.9667
|
|
https://downloadcenter.intel.com/download/30522/Intel-Graphics-BETA-Windows-10-DCH-Drivers
|
|
- Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v1.2.3
|
|
|
|
Components revisions used in GPU-enabled build:
|
|
|
|
KhronosGroup/SPIRV-LLVM-Translator@0592c4f
|
|
intel/vc-intrinsics@2d0795c
|
|
oneapi-src/level-zero@0d30b1f (v1.2.3)
|
|
llvm/llvm-project@d28af7c (llvmorg-12.0.0) + patches from llvm_patches folder
|
|
|
|
=== v1.15.0 === (18 December 2020)
|
|
|
|
An ISPC release with several improvements for CPU and Beta support of Intel
|
|
graphics hardware architectures. The binaries in this release include CPU versions
|
|
for Windows, Linux, and macOS, and a GPU-enabled Linux binary, which supports
|
|
both CPU and GPU.
|
|
CPU binaries are based on patched LLVM 11.0.0, GPU binary is based on patched
|
|
LLVM 10.0.1.
|
|
|
|
CPU changes include:
|
|
|
|
- New loop unroll pragmas: '#pragma unroll' and '#pragma nounroll' directives
|
|
provide loop unrolling optimization hints to the compiler. This pragma may be used
|
|
immediately before a loop statement. Currently, this functionality is limited to
|
|
uniform for and do-while.
|
|
- More efficient 'packed_[load|store]_active()' stdlib functions implementation
|
|
(up to 2.5x faster), which now supports 64 bit types.
|
|
- New cpus: 'icelake-server', 'tigerlake' , 'alderlake', 'sapphirerapids'.
|
|
- Several stability fixes related to SOA types, bool varying type initialization,
|
|
broken alignment information, type scoping.
|
|
- Compile time improvements.
|
|
ISPC support was added to CMake 3.19 so you can use now the standard CMake approach
|
|
to find ISPC on the system and use it in your build.
|
|
https://cmake.org/cmake/help/latest/release/3.19.html#languages
|
|
|
|
Using GPU-enabled Linux binary you can build ISPC programs and run them on Intel(R)
|
|
Core(tm) Processors with Gen9 graphics (formerly Skylake, Kaby Lake, Coffee Lake) and
|
|
Gen12 graphics (TigerLake mobile CPU) using '--target' options ('genx-x8' and
|
|
'genx-x16') and '--cpu' option for specifying particular platform (e.g. '--cpu=TGLLP').
|
|
|
|
Stability and performance were significantly improved in this release. Here is the list
|
|
of new features:
|
|
|
|
- Initial support of ahead of time compilation to oneAPI Level Zero binary format using
|
|
'--emit-zebin' switch. You can use this binary from ISPC Runtime by setting
|
|
ISPCRT_USE_ZEBIN env variable to 1. Please note that SPIR-V format is still a
|
|
recommended and default way.
|
|
- Initial function pointers implementation.
|
|
- Global atomics support.
|
|
- Double math functions support.
|
|
- Memory functions support.
|
|
- Reworked masking approach. We disabled genx hardware mask by default and use
|
|
a software mask by default.
|
|
- Improved address spaces differentiation.
|
|
- Initial debug support.
|
|
- TGLLP (TigerLake mobile CPU) support ('--cpu=TGLLP').
|
|
We also added examples to demonstrate interoperability with oneAPI DPC++ Compiler.
|
|
More details about current state of GPU support are available here:
|
|
https://ispc.github.io/ispc_for_xe.html
|
|
|
|
For build instructions check our docker recipe:
|
|
https://github.com/ispc/ispc/blob/main/docker/ubuntu/gen/Dockerfile
|
|
|
|
GPU support is in Beta stage so you may experience some issues but we
|
|
strongly encourage to try it out and give us feedback! You can reach us through
|
|
Github discussions and issues, ISPC mailing list (ispc-users@googlegroups.com),
|
|
or on Twitter (@ispc_updates).
|
|
|
|
Runtime Dependencies:
|
|
|
|
Intel(R) Graphics Compute Runtime
|
|
https://github.com/intel/compute-runtime/releases/tag/20.50.18716
|
|
Level Zero Loader
|
|
https://github.com/oneapi-src/level-zero/releases/tag/v1.0.22
|
|
OpenMP Runtime. Consult your Linux distribution documentation for the
|
|
installation of OpenMP runtime instructions. No specific version is required.
|
|
|
|
Components revisions used in GPU-enabled build:
|
|
|
|
KhronosGroup/SPIRV-LLVM-Translator@ab5e12a
|
|
intel/vc-intrinsics@2de2dd4
|
|
oneapi-src/level-zero@c6fa2cd (v1.0.22)
|
|
llvm/llvm-project@ef32c61 (llvmorg-10.0.1) + patches from llvm_patches folder
|
|
|
|
=== v1.14.1 === (28 August 2020)
|
|
|
|
A minor ISPC update with a bug fix for AVX512 detection problem on macOS
|
|
(for more details see issue #1854) and update of GPU version to use Level0 v1.0.
|
|
CPU binaries are based on patched LLVM 10.0.1.
|
|
|
|
Runtime Dependencies for GPU-enabled build:
|
|
- Intel(R) Graphics Compute Runtime
|
|
https://github.com/intel/compute-runtime/releases/tag/20.33.17675
|
|
- Level Zero Loader
|
|
https://github.com/oneapi-src/level-zero/releases/tag/v1.0
|
|
- OpenMP Runtime
|
|
Consult your Linux distribution documentation for installation of OpenMP runtime
|
|
instructions.
|
|
|
|
Components revisions used in GPU-enabled build:
|
|
KhronosGroup/SPIRV-LLVM-Translator@1a5c52f
|
|
intel/vc-intrinsics@f39ff1e
|
|
oneapi-src/level-zero@fcc7b7a (v1.0)
|
|
llvm/llvm-project@ef32c61 (llvmorg-10.0.1) + patches from llvm_patches folder
|
|
|
|
=== v1.14.0 === (30 July 2020)
|
|
|
|
An ISPC release with several improvements for CPU and initial support of Intel
|
|
graphics hardware architectures. The binaries in this release include CPU versions
|
|
for Windows, Linux, and macOS, as previous releases, plus a GPU-enabled Linux binary,
|
|
which supports both CPU and GPU. CPU binaries are based on patched LLVM 10.0.1.
|
|
|
|
CPU changes include:
|
|
- new avx2-i8x32, avx2-i16x16, avx512skx-i8x64, avx512skx-i16x32 targets.
|
|
- "generic" targets were removed.
|
|
- several stability fixes, including bugs discovered during fuzzing ISPC by YARPGen.
|
|
- integer division performance improvements.
|
|
- support for __vectorcall calling convention on Windows x64 (enabled by
|
|
'--vectorcall')
|
|
|
|
Using GPU-enabled Linux binary you can build ISPC programs and run them on Intel(R)
|
|
Core(tm) Processors with Gen9 graphics (formerly Skylake, Kaby Lake, Coffee Lake)
|
|
using new '--target' options: 'genx-x8' and 'genx-x16'. For code generation ISPC
|
|
uses Vector Compute backend which is the part of 'Intel(R) Graphics Compute Runtime'
|
|
through SPIR-V interface. This release also includes ISPC Runtime based on
|
|
'oneAPI Level Zero' for GPU and 'OpenMP Runtime' for CPU, which creates unified
|
|
abstraction for executing ISPC code on CPU and GPU.
|
|
|
|
More details are available here: https://ispc.github.io/ispc_for_xe.html
|
|
|
|
For build instructions check our docker recipe:
|
|
https://github.com/ispc/ispc/blob/main/docker/ubuntu/gen/Dockerfile
|
|
|
|
The stability and performance of GPU part of this release is not mature yet but we
|
|
strongly encourage to try it out and give us feedback! You can reach us through
|
|
Github issues, ISPC mailing list (ispc-users@googlegroups.com), or on Twitter
|
|
(@ispc_updates).
|
|
|
|
Runtime Dependencies
|
|
- Intel(R) Graphics Compute Runtime
|
|
https://github.com/intel/compute-runtime/releases/tag/20.29.17408
|
|
- Level Zero Loader
|
|
https://github.com/oneapi-src/level-zero/releases/tag/v0.91.21
|
|
- OpenMP Runtime
|
|
Consult your Linux distribution documentation for installation of OpenMP runtime
|
|
instructions.
|
|
|
|
Components revisions used in this build:
|
|
KhronosGroup/SPIRV-LLVM-Translator@1e661b2
|
|
intel/vc-intrinsics.git@a0b66f2
|
|
oneapi-src/level-zero@v0.91.21
|
|
llvm/llvm-project@llvmorg-10.0.0
|
|
|
|
=== v1.13.0 === (23 April 2020)
|
|
|
|
An ISPC update, which graduates cross-compilation support to production and
|
|
has multiple code generation improvements and bug fixes. AVX512 targets may
|
|
get the biggest performance boost due to changed internal representation of
|
|
masks (we observed up to 5% speedups), and new switch '--opt=disable-zmm',
|
|
which disables using zmm registers in favour of ymm for avx512skx-i32x16 target.
|
|
All targets will definitely benefit from LLVM 10.0 backend used in this release.
|
|
|
|
Here is the list of other changes:
|
|
|
|
- new switch '--support-matrix' was added to display information about supported
|
|
cross-compilation targets, which are managed by '--target-os=<os>',
|
|
'--target=<ispc-target>', and '--arch=<arch>' switches.
|
|
- representation of 'bool' type in storage was changed to match C/C++ (i.e. one
|
|
'bool' occupies one byte) for better interoperability.
|
|
- type aliases for unsigned types were added: 'uint8', 'uint16', 'uint32',
|
|
'uint64', and 'uint'. To detect if these types are supported you can check if
|
|
ISPC_UINT_IS_DEFINED macro is defined.
|
|
- 'extract()'/'insert()' for boolean arguments, and 'abs()' for all integer and
|
|
FP types were added to standard library.
|
|
- FreeBSD was added to the list of supported target OSes, but it's not well
|
|
tested.
|
|
|
|
Supported platforms in this release are below. Rows are hosts, columns are
|
|
targets. x86 and arm are both 32 and 64 bits, where appropriate.
|
|
|
|
Windows | Linux | macOS | Android | iOS | PS4 | FreeBSD
|
|
Windows | x86 | x86, arm | x86 | x86, arm | | x86 | x86, arm
|
|
Linux | | x86, arm | x86 | x86, arm | | | x86, arm
|
|
macOS | | x86, arm | x86 | x86, arm | arm | | x86, arm
|
|
|
|
=== v1.12.0 === (15 August 2019)
|
|
|
|
This ISPC update includes experimental cross OS compilation support, ARM and AARCH64
|
|
support and a bunch of language features and stability fixes.
|
|
|
|
Here are the details:
|
|
|
|
- ISPC is now a cross OS compiler - You can build ISPC programs for Windows, Linux,
|
|
macOS, iOS, Android and PS4 targets from Windows, Linux and macOS hosts.
|
|
- ARM and AARCH64 support has been enabled for ISPC. ARM support currently exists for
|
|
neon-i32x4, neon-i8x16 and neon-i16x8 targets. AARCH64 is supported for neon-i32x4
|
|
as well as for a new "double-pumped" 8-wide target: neon-i32x8.
|
|
- A new 128-bit AVX2 target (avx2-i32x4) was added.
|
|
- Added a CPU definition for Ice Lake client CPUs (--cpu=icl). Note that there is no
|
|
special target for new instructions in Ice Lake flavor of AVX512 yet. For now, You
|
|
can use SKX targets (avx512skx-i32x8 and avx512skx-i32x16) with --cpu=icl.
|
|
- Removed the generic targets for KNC and KNL, so ISPC does not have KNC support anymore.
|
|
KNL is still supported through native target (avx512knl-i32x16).
|
|
- Removed AVX1.1 (IvyBridge) targets (use AVX1 targets instead).
|
|
- Introduced new language features:
|
|
- 'noinline' function qualifier.
|
|
- 'rsqrt_fast()' and 'rcp_fast()' functions.
|
|
- Static initialization for varying.
|
|
- A new command line option '--emit-llvm-text' was added to dump LLVM IR in text format.
|
|
|
|
An ISPC top-of-trunk build is now available in the Compiler Explorer (https://godbolt.org)
|
|
|
|
The release is based on a patched LLVM 8.0.0 backend.
|
|
|
|
=== v1.11.0 === (19 April 2019)
|
|
|
|
An ISPC update with a bunch of new features and stability bug fixes based on a
|
|
patched LLVM 8.0.0 backend.
|
|
|
|
Notable new features are:
|
|
- A new 256-bit AVX512 target (avx512skx-i32x8).
|
|
- Modified -O1 switch to optimize for size.
|
|
- “#pragma once” in auto-generated headers.
|
|
- Better debugging support with -O0.
|
|
|
|
Also we resumed support for PS4 build.
|
|
|
|
To efficiently write ISPC programs you can now use the ISPC plug-in for VSCode:
|
|
https://marketplace.visualstudio.com/items?itemName=intel-corporation.ispc
|
|
|
|
=== v1.10.0 === (18 January 2019)
|
|
|
|
An ISPC update, which brings several new features, has a bunch of stability and
|
|
performance bug fixes, and infrastructure improvements for those who are
|
|
interested in participating in hacking on the ISPC trunk. We also are also
|
|
deprecating KNC support and the KNL-generic target (in favor of the native KNL
|
|
target, i.e. avx512knl-i32x16).
|
|
|
|
We've added:
|
|
- a streaming store and load implementation (see "Streaming Load and Store
|
|
Operations" section in documentation)
|
|
- support for 64 bit wide types in aos_to_soa/soa_to_aos intrinsics
|
|
- an option to specify assembler style (see --x86-asm-syntax switch
|
|
documentation is help message)
|
|
- a pragma to disable warnings locally (search for "#pragma ignore" in
|
|
documentation)
|
|
|
|
Our examples include a new SGEMM example which demonstrates different versions
|
|
of matrix multiply with various level of optimality. It is useful for learning
|
|
how to start from a naive implementation and then add various optimizations
|
|
afterwards. Also, our build system is now based on CMake, as are the examples.
|
|
So you can use it as a reference for integrating ISPC to your CMake-based
|
|
project.
|
|
|
|
For those who are interested in hacking ISPC or trying a bleeding edge
|
|
development version, we have CI on Linux (Travis-CI) and Windows (Appveyor),
|
|
including automatic package builds on Windows. We also have Dockerfiles, which
|
|
demonstrate bringing up your environment for ISPC development.
|
|
|
|
The release is based on a patched LLVM 5.0.2 backend.
|
|
|
|
=== v1.9.2 === (10 November 2017)
|
|
|
|
An ISPC update, which brings out-of-the-box debug support on Windows,
|
|
better performance of most of the targets and a bunch of stability
|
|
and performance bug fixes.
|
|
|
|
The release is based on patched LLVM 5.0 backend.
|
|
|
|
Windows build is now supports only VS2015 and newer. If you are using earlier
|
|
versions, the only known problem that you may encounter is a problem with
|
|
"print" ISPC library function.
|
|
|
|
AVX512 targets are the main beneficiaries of a newer LLVM backend and
|
|
demonstrate the biggest performance improvements. SVML support is also
|
|
now available on these targets (requires linking by ICC compiler).
|
|
|
|
=== v1.9.1 === (8 July 2016)
|
|
|
|
An ISPC update with new native AVX512 target for future Xeon CPUs and
|
|
improvements for debugging, including new switch --dwarf-version to support
|
|
debugging on old systems.
|
|
|
|
The release is based on patched LLVM 3.8.
|
|
|
|
=== v1.9.0 === (12 Feb 2016)
|
|
|
|
An ISPC release with AVX512 (KNL flavor) support and a number of bug fixes,
|
|
based on fresh LLVM 3.8 backend.
|
|
|
|
For AVX512 two modes are supported - generic and native. For instructions on how
|
|
to use them, please refer to the wiki. Going forward we assume that native mode
|
|
is the primary way to get AVX512 support and that generic mode will be deprecated.
|
|
If you observe significantly better performance in generic mode, please report
|
|
it via github issues.
|
|
|
|
Starting this release we are shipping two versions on Windows:
|
|
(1) for VS2013 and earlier releases
|
|
(2) for VS2015 and newer releases
|
|
The reason for doing this is the redesigned C run-time library in VS.
|
|
An implementation of "print" ISPC standard library function relies on C runtime
|
|
library, which has changed. If you are not using "print" function in your code,
|
|
you are safe to use either version.
|
|
|
|
A new options was introduced to improve debugging: --no-omit-frame-pointer.
|
|
|
|
=== v1.8.2 === (29 May 2015)
|
|
|
|
An ISPC update with several important stability fixes and an experimental
|
|
AVX512 support.
|
|
|
|
Current level of AVX512 support is targeting the new generation of Xeon Phi
|
|
codename Knights Landing. It's implemented in two different ways: as generic and
|
|
native target. Generic target is similar to KNC support and requires Intel C/C++
|
|
Compiler (15.0 and newer) and is available in regular ISPC build, which is
|
|
based on LLVM 3.6.1. For the native AVX512 target, we have a separate ISPC
|
|
build, which is based on LLVM trunk (3.7). This build is less stable and has
|
|
several known issues. Nevertheless, if you are interested in AVX512 support for
|
|
your code, we encourage you to try it and report the bugs. We actively working
|
|
with LLVM maintainers to fix all AVX512 bugs, so your feedback is important for
|
|
us and will ensure that bugs affecting your code are fixed by LLVM 3.7 release.
|
|
|
|
Other notable changes and fixes include:
|
|
|
|
* Broadwell support via --cpu=broadwell.
|
|
|
|
* Changed cpu naming to accept cpu codenames. Check help for more details.
|
|
|
|
* --cpu switch disallowed in multi-target mode.
|
|
|
|
* Alignment of structure fields (in generated header files) is changed to be
|
|
more consistent regardless used C/C++ compiler.
|
|
|
|
* --dllexport switch is added on Windows to make non-static functions DLL
|
|
export.
|
|
|
|
* --print-target switch is added to dump details of LLVM target machine.
|
|
This may help you to debug issues with code generation for incorrect target
|
|
(or more likely to ensure that code generation is done right).
|
|
|
|
* A bug was fixed, which triggered uniform statements to be executed with
|
|
all-off mask under some circumstances.
|
|
|
|
* The restriction for using some uniform types as return type in multi-target
|
|
mode with targets of different width was relaxed.
|
|
|
|
Also, if you are using ISPC for code generation for current generation of
|
|
Xeon Phi (Knights Corner), the following changes are for you:
|
|
|
|
* A bunch of stability fixes for KNC.
|
|
|
|
* A bug, which affects projects with multiple ISPC source files compiled with generic
|
|
target is fixed. As side effect, you may see multiple warnings about unused static
|
|
functions - you need to add "-wd177" switch to ICC compiling generic output files.
|
|
|
|
The release includes LLVM 3.6.1 binaries for Linux, MacOS, Windows and Windows based
|
|
cross-compiler for Sony PlayStation4. LLVM 3.5 based experimental Linux binary with
|
|
NVPTX support (now supporting also K80).
|
|
|
|
Native AVX512 support is available in the set of less stable LLVM 3.7 based binaries
|
|
for Linux, MacOS and Windows.
|
|
|
|
=== v1.8.1 === (31 December 2014)
|
|
|
|
A minor update of ``ispc`` with several important stability fixes, namely:
|
|
|
|
* Auto-dispatch mechanism is fixed in pre-built Linux binaries (it used to
|
|
select too conservative target).
|
|
|
|
* Compile crash with "-O2 -g" is fixed.
|
|
|
|
Also KNC (Xeon Phi) support is further improved.
|
|
|
|
The release includes experimental build for Sony PlayStation4 target (Windows
|
|
cross compiler), as well NVPTX experimental support (64 bit Linux binaries
|
|
only). Note that there might be NVPTX compilation fails with CUDA 7.0.
|
|
|
|
Similar to 1.8.0 all binaries are based on LLVM 3.5. MacOS binaries are built
|
|
for MacOS 10.9 Mavericks. Linux binaries are compatible with kernel 2.6.32
|
|
(ok for RHEL6) and later.
|
|
|
|
=== v1.8.0 === (16 October 2014)
|
|
|
|
A major new version of ISPC, which introduces experimental support for NVPTX
|
|
target, brings numerous improvements to our KNC (Xeon Phi) support, introduces
|
|
debugging support on Windows and fixes several bugs. We also ship experimental
|
|
build for Sony PlayStation4 target in this release. Binaries for all platforms
|
|
are based on LLVM 3.5.
|
|
|
|
Note that MacOS binaries are build for MacOS 10.9 Mavericks. Linux binaries are
|
|
compatible with kernel 2.6.32 (ok for RHEL6) and later.
|
|
|
|
More details:
|
|
|
|
* Experimental NVPTX support is available for users of our binary distribution
|
|
on Linux only at the moment. MacOS and Windows users willing to experiment
|
|
with this target are welcome to build it from source. Note that GPU imposes
|
|
some limitation on ISPC language, which are discussed in corresponding section
|
|
of ISPC User's Guide. Implementation of NVPTX support was done by our
|
|
contributor Evghenii Gaburov.
|
|
|
|
* KNC support was greatly extended in knc.h header file. Beyond new features
|
|
there are stability fixes and changes for icc 15.0 compatibility. Stdlib
|
|
prefetch functions were improved to map to KNC vector prefetches.
|
|
|
|
* PS4 experimental build is Windows to PS4 cross compiler, which disables arch
|
|
and cpu selection (which are preset to PS4 hardware).
|
|
|
|
* Debug info support on Windows (compatible with VS2010, VS2012 and VS2013).
|
|
|
|
* Critical bug fix, which caused code generation for incorrect target, despite
|
|
explicit target switches, under some conditions.
|
|
|
|
* Stability fix of the bug, which caused print() function to execute under
|
|
all-off mask under some conditions.
|
|
|
|
=== v1.7.0 === (18 April 2014)
|
|
|
|
A major new version of ISPC with several language and library extensions and
|
|
fixes in debug info support. Binaries for all platforms are based on patched
|
|
version on LLVM 3.4. There also performance improvements beyond switchover to
|
|
LLVM 3.4.
|
|
|
|
The list of language and library changes:
|
|
|
|
* Support for varying types in exported functions was added. See documentation
|
|
for more details.
|
|
|
|
* get_programCount() function was moved from stdlib.ispc to
|
|
examples/util/util.isph, which needs to be included somewhere in your
|
|
project, if you want to use it.
|
|
|
|
* Library functions for saturated arithmetic were added. add/sub/mul/div
|
|
operations are supported for signed and unsigned 8/16/32/64 integer types
|
|
(both uniform and varying).
|
|
|
|
* The algorithm for selecting overloaded function was extended to cover more
|
|
types of overloading. Handling of reference types in overloaded functions was
|
|
fixed. The rules for selecting the best match were changed to match C++,
|
|
which requires the function to be the best match for all parameters. In
|
|
ambiguous cases, a warning is issued, but it will be converted to an error
|
|
in the next release.
|
|
|
|
* Explicit typecasts between any two reference types were allowed.
|
|
|
|
* Implicit cast of pointer to const type to void* was disallowed.
|
|
|
|
The list of other notable changes is:
|
|
|
|
* Number of fixes for better debug info support.
|
|
|
|
* Memory corruption bug was fixed, which caused rare but not reproducible
|
|
compile time fails.
|
|
|
|
* Alias analysis was enabled (more aggressive optimizations are expected).
|
|
|
|
* A bug involving inaccurate handling of "const" qualifier was fixed. As a
|
|
result, more "const" qualifiers may appear in .h files, which may cause
|
|
compilation errors.
|
|
|
|
=== v1.6.0 === (19 December 2013)
|
|
|
|
A major new version of ISPC with major improvements in performance and
|
|
stability. Linux and MacOS binaries are based on patched version of LLVM 3.3,
|
|
while Windows version is based on LLVM 3.4rc3. LLVM 3.4 significantly improves
|
|
stability on Win32 platform, so we've decided not to wait for official LLVM 3.4
|
|
release.
|
|
|
|
The list of the most significant changes is:
|
|
|
|
* New avx1-i32x4 target was added. It may play well for you, if you are focused
|
|
on integer computations or FP unit in your hardware is 128 bit wide.
|
|
|
|
* Support for calculations in double precision was extended with two new
|
|
targets avx1.1-i64x4 and avx2-i64x4.
|
|
|
|
* Language support for overloaded operators was added.
|
|
|
|
* New library shift() function was added, which is similar to rotate(), but is
|
|
non-circular.
|
|
|
|
* The language was extended to accept 3 dimensional tasking - a syntactic sugar,
|
|
which may facilitate programming of some tasks.
|
|
|
|
* Regression, which broke --opt=force-aligned-memory is fixed.
|
|
|
|
If you are not using pre-built binaries, you may notice the following changes:
|
|
|
|
* VS2012/VS2013 are supported.
|
|
|
|
* alloy.py (with -b switch) can build LLVM for you on any platform now
|
|
(except MacOS 10.9, but we know about the problem and working on it).
|
|
This is a preferred way to build LLVM for ISPC, as all required patches for
|
|
better performance and stability will automatically apply.
|
|
|
|
* LLVM 3.5 (current trunk) is supported.
|
|
|
|
There are also multiple fixes for better performance and stability, most
|
|
notable are:
|
|
|
|
* Fixed performance problem for x2 targets.
|
|
|
|
* Fixed a problem with incorrect vzeroupper insertion on AVX target on Win32.
|
|
|
|
=== v1.5.0 === (27 September 2013)
|
|
|
|
A major new version of ISPC with several new targets and important bug fixes.
|
|
Here's a list of the most important changes, if you are using pre-built
|
|
binaries (which are based on patched version of LLVM 3.3):
|
|
|
|
* The naming of targets was changed to explicitly include data type width and
|
|
a number of threads in the gang. For example, avx2-i32x8 is avx2 target,
|
|
which uses 32 bit types as a base and has 8 threads in a gang. Old naming
|
|
scheme is still supported, but deprecated.
|
|
|
|
* New SSE4 targets for calculations based on 8 bit and 16 bit data types:
|
|
sse4-i8x16 and sse4-i16x8.
|
|
|
|
* New AVX1 target for calculations based on 64 bit data types: avx1-i64x4.
|
|
|
|
* SVML support was extended and improved.
|
|
|
|
* Behavior of -g switch was changed to not affect optimization level.
|
|
|
|
* ISPC debug infrastructure was redesigned. See --help-dev for more info and
|
|
enjoy capabilities of new --debug-phase=<value> and --off-phase=<value>
|
|
switches.
|
|
|
|
* Fixed an auto-dispatch bug, which caused AVX code execution when OS doesn't
|
|
support AVX (but hardware does).
|
|
|
|
* Fixed a bug, which discarded uniform/varying keyword in typedefs.
|
|
|
|
* Several performance regressions were fixed.
|
|
|
|
If you are building ISPC yourself, then following changes are also available
|
|
to you:
|
|
|
|
* --cpu=slm for targeting Intel Atom codename Silvermont (if LLVM 3.4 is used).
|
|
|
|
* ARM NEON targets are available (if enabled in build system).
|
|
|
|
* --debug-ir=<value> is available to generate debug information based on LLVM
|
|
IR (if LLVM 3.4 is used). In debugger you'll see LLVM IR instead of source
|
|
code.
|
|
|
|
* A redesigned and improved test and configuration management system is
|
|
available to facilitate the process of building LLVM and testing ISPC
|
|
compiler.
|
|
|
|
Standard library changes/fixes:
|
|
|
|
* __pause() function was removed from standard library.
|
|
|
|
* Fixed reduce_[min|max]_[float|double] intrinsics, which were producing
|
|
incorrect code under some conditions.
|
|
|
|
Language changes:
|
|
|
|
* By default a floating point constant without a suffix is a single precision
|
|
constant (32 bit). A new suffix "d" was introduced to allow double precision
|
|
constant (64 bit). Please refer to tests/double-consts.ispc for syntax
|
|
examples.
|
|
|
|
=== v1.4.4 === (19 July 2013)
|
|
|
|
A minor version update with several stability fixes requested by the customers.
|
|
|
|
=== v1.4.3 === (25 June 2013)
|
|
|
|
A minor version update with several stability improvements:
|
|
|
|
* Two bugs were fixed (including a bug in LLVM) to improve stability on 32 bit
|
|
platforms.
|
|
|
|
* A bug affecting several examples was fixed.
|
|
|
|
* --instrument switch is fixed.
|
|
|
|
All tests and examples now properly compile and execute on native targets on
|
|
Unix platforms (Linux and MacOS).
|
|
|
|
=== v1.4.2 === (11 June 2013)
|
|
|
|
A minor version update with a few important changes:
|
|
|
|
* Stability fix for AVX2 target (Haswell) - problem with gather instructions was
|
|
released in LLVM 3.4, if you build with LLVM 3.2 or 3.3, it's available in our
|
|
repository (llvm_patches/r183327-AVX2-GATHER.patch) and needs to be applied
|
|
manually.
|
|
|
|
* Stability fix for widespread issue on Win32 platform (#503).
|
|
|
|
* Performance improvements for Xeon Phi related to mask representation.
|
|
|
|
Also LLVM 3.3 has been released and now it's the recommended version for building ISPC.
|
|
Precompiled binaries are also built with LLVM 3.3.
|
|
|
|
=== v1.4.1 === (28 May 2013)
|
|
|
|
A major new version of ispc has been released with stability and performance
|
|
improvements on all supported platforms (Windows, Linux and MacOS).
|
|
This version supports LLVM 3.1, 3.2, 3.3 and 3.4. The released binaries are built with 3.2.
|
|
|
|
New compiler features:
|
|
|
|
* ISPC memory allocation returns aligned memory with platform natural alignment
|
|
of vector registers by default. Alignment can also be managed via
|
|
--force-alignment=<value>.
|
|
|
|
Important bug fixes/changes:
|
|
|
|
* ISPC was fixed to be fully functional when built by GCC 4.7.
|
|
|
|
* Major cleanup of build and test scripts on Windows.
|
|
|
|
* Gather/scatter performance improvements on Xeon Phi.
|
|
|
|
* FMA instructions are enabled for AVX2 instruction set.
|
|
|
|
* Support of RDRAND instruction when available via library function rdrand (Ivy Bridge).
|
|
|
|
Release also contains numerous bug fixes and minor improvements.
|
|
|
|
=== v1.3.0 === (29 June 2012)
|
|
|
|
This is a major new release of ispc, with support for more compilation
|
|
targets and a number of additions to the language. As usual, the quality
|
|
of generated code has also been improved in a number of cases and a number
|
|
of small bugs have been fixed.
|
|
|
|
New targets:
|
|
|
|
* This release provides "beta" support for compiling to Intel® Xeon
|
|
Phi™ processor, code named Knights Corner, the first processor in
|
|
the Intel® Many Integrated Core Architecture. See
|
|
http://ispc.github.io/ispc.html#compiling-for-the-intel-xeon-phi-architecture
|
|
for more details on this support.
|
|
|
|
* This release also has an "avx1.1" target, which provides support for the
|
|
new instructions in the Intel Ivy Bridge microarchitecutre.
|
|
|
|
New language features:
|
|
|
|
* The foreach_active statement allows iteration over the active program
|
|
instances in a gang. (See
|
|
http://ispc.github.io/ispc.html#iteration-over-active-program-instances-foreach-active)
|
|
|
|
* foreach_unique allows iterating over subsets of program instances in a
|
|
gang that share the same value of a variable. (See
|
|
http://ispc.github.io/ispc.html#iteration-over-unique-elements-foreach-unique)
|
|
|
|
* An "unmasked" function qualifier and statement in the language allow
|
|
re-activating execution of all program instances in a gang. (See
|
|
http://ispc.github.io/ispc.html#re-establishing-the-execution-mask
|
|
|
|
Standard library updates:
|
|
|
|
* The seed_rng() function has been modified to take a "varying" seed value
|
|
when a varying RNGState is being initialized.
|
|
|
|
* An isnan() function has been added, to check for floating-point "not a
|
|
number" values.
|
|
|
|
* The float_to_srgb8() routine does high performance conversion of
|
|
floating-point color values to SRGB8 format.
|
|
|
|
Other changes:
|
|
|
|
* A number of bugfixes have been made for compiler crashes with malformed
|
|
programs.
|
|
|
|
* Floating-point comparisons are now "unordered", so that any comparison
|
|
where one of the operands is a "not a number" value returns false. (This
|
|
matches standard IEEE floating-point behavior.)
|
|
|
|
* The code generated for 'break' statements in "varying" loops has been
|
|
improved for some common cases.
|
|
|
|
* Compile time and compiler memory use have both been improved,
|
|
particularly for large input programs.
|
|
|
|
* A nubmer of bugs have been fixed in the debugging information generated
|
|
by the compiler when the "-g" command-line flag is used.
|
|
|
|
=== v1.2.2 === (20 April 2012)
|
|
|
|
This release includes a number of small additions to functionality and a
|
|
number of bugfixes. New functionality includes:
|
|
|
|
* It's now possible to forward declare structures as in C/C++: "struct
|
|
Foo;". After such a declaration, structs with pointers to "Foo" and
|
|
functions that take pointers or references to Foo structs can be declared
|
|
without the entire definition of Foo being available.
|
|
|
|
* New built-in types size_t, ptrdiff_t, and [u]intptr_t are now available,
|
|
corresponding to the equivalent types in C.
|
|
|
|
* The standard library now provides atomic_swap*() and
|
|
atomic_compare_exchange*() functions for void * types.
|
|
|
|
* The C++ backend has seen a number of improvements to the quality and
|
|
readability of generated code.
|
|
|
|
A number of bugs have been fixed in this release as well. The most
|
|
significant are:
|
|
|
|
* Fixed a bug where nested loops could cause a compiler crash in some
|
|
circumstances (issues #240, and #229)
|
|
|
|
* Gathers could access invlaid mamory (and cause the program to crash) in
|
|
some circumstances (#235)
|
|
|
|
* References to temporary values are now handled properly when passed to a
|
|
function that takes a reference typed parameter.
|
|
|
|
* A case where incorrect code could be generated for compile-time-constant
|
|
initializers has been fixed (#234).
|
|
|
|
=== v1.2.1 === (6 April 2012)
|
|
|
|
This release contains only minor new functionality and is mostly for many
|
|
small bugfixes and improvements to error handling and error reporting.
|
|
The new functionality that is present is:
|
|
|
|
* Significantly more efficient versions of the float / half conversion
|
|
routines are now available in the standard library, thanks to Fabian
|
|
Giesen.
|
|
|
|
* The last member of a struct can now be a zero-length array; this allows
|
|
the trick of dynamically allocating enough storage for the struct and
|
|
some number of array elements at the end of it.
|
|
|
|
Significant bugs fixed include:
|
|
|
|
* Issue #205: When a target ISA isn't specified, use the host system's
|
|
capabilities to choose a target for which it will be able to run the
|
|
generated code.
|
|
|
|
* Issues #215 and #217: Don't allocate storage for global variables that
|
|
are declared "extern".
|
|
|
|
* Issue #197: Allow NULL as a default argument value in a function
|
|
declaration.
|
|
|
|
* Issue #223: Fix bugs where taking the address of a function wouldn't work
|
|
as expected.
|
|
|
|
* Issue #224: When there are overloaded variants of a function that take
|
|
both reference and const reference parameters, give the non-const
|
|
reference preference when matching values of that underlying type.
|
|
|
|
* Issue #225: An error is issed when a varying lvalue is assigned to a
|
|
reference type (rather than crashing).
|
|
|
|
* Issue #193: Permit conversions from array types to void *, not just the
|
|
pointer type of the underlying array element.
|
|
|
|
* Issue #199: Still evaluate expressions that are cast to (void).
|
|
|
|
The documentation has also been improved, with FAQs added to clarify some
|
|
aspects of the ispc pointer model.
|
|
|
|
=== v1.2.0 === (20 March 2012)
|
|
|
|
This is a major new release of ispc, with a number of significant
|
|
improvements to functionality, performance, and compiler robustness. It
|
|
does, however, include three small changes to language syntax and semantics
|
|
that may require changes to existing programs:
|
|
|
|
* Syntax for the "launch" keyword has been cleaned up; it's now no longer
|
|
necessary to bracket the launched function call with angle brackets.
|
|
(In other words, now use "launch foo();", rather than "launch < foo() >;".
|
|
|
|
* When using pointers, the pointed-to data type is now "uniform" by
|
|
default. Use the varying keyword to specify varying pointed-to types when
|
|
needed. (i.e. "float *ptr" is a varying pointer to uniform float data,
|
|
whereas previously it was a varying pointer to varying float values.)
|
|
Use "varying float *" to specify a varying pointer to varying float data,
|
|
and so forth.
|
|
|
|
* The details of "uniform" and "varying" and how they interact with struct
|
|
types have been cleaned up. Now, when a struct type is declared, if the
|
|
struct elements don't have explicit "uniform" or "varying" qualifiers,
|
|
they are said to have "unbound" variability. When a struct type is
|
|
instantiated, any unbound variability elements inherit the variability of
|
|
the parent struct type. See http://ispc.github.io/ispc.html#struct-types
|
|
for more details.
|
|
|
|
ispc has a new language feature that makes it much easier to use the
|
|
efficient "(array of) structure of arrays" (AoSoA, or SoA) memory layout of
|
|
data. A new "soa<n>" qualifier can be applied to structure types to
|
|
specify an n-wide SoA version of the corresponding type. Array indexing
|
|
and pointer operations with arrays SoA types automatically handles the
|
|
two-stage indexing calculation to access the data. See
|
|
http://ispc.github.io/ispc.html#structure-of-array-types for more details.
|
|
|
|
For more efficient access of data that is still in "array of structures"
|
|
(AoS) format, ispc has a new "memory coalescing" optimization that
|
|
automatically detects series of strided loads and/or gathers that can be
|
|
transformed into a more efficient set of vector loads and shuffles. A
|
|
diagnostic is emitted when this optimization is successfully applied.
|
|
|
|
Smaller changes in this release:
|
|
|
|
* The standard library now provides memcpy(), memmove() and memset()
|
|
functions, as well as single-precision asin() and acos() functions.
|
|
|
|
* -I can now be specified on the command-line to specify a search path for
|
|
#include files.
|
|
|
|
* A number of improvements have been made to error reporting from the
|
|
parser, and a number of cases where malformed programs could cause the
|
|
compiler to crash have been fixed.
|
|
|
|
* A number of small improvements to the quality and performance of generated
|
|
code have been made, including finding more cases where 32-bit addressing
|
|
calculations can be safely done on 64-bit systems and generating better
|
|
code for initializer expressions.
|
|
|
|
=== v1.1.4 === (4 February 2012)
|
|
|
|
There are two major bugfixes for Windows in this release. First, a number
|
|
of failures in AVX code generation on Windows have been fixed; AVX on
|
|
Windows now has no known issues. Second, a longstanding bug in parsing 64-bit
|
|
integer constants on Windows has been fixed.
|
|
|
|
This release features a new experimental scalar target, contributed by Gabe
|
|
Weisz <gweisz@cs.cmu.edu>. This target ("--target=generic-1") compiles
|
|
gangs of single program instances (i.e. programCount == 1); it can be
|
|
useful for debugging ispc programs.
|
|
|
|
The compiler now supports dynamic memory allocation in ispc programs (with
|
|
"new" and "delete" operators based on C++). See
|
|
http://ispc.github.io/ispc.html#dynamic-memory-allocation in the
|
|
documentation for more information.
|
|
|
|
ispc now performs "short circuit" evaluation of the || and && logical
|
|
operators and the ? : selection operator. (This represents the correction
|
|
of a major incompatibility with C.) Code like "(index < arraySize &&
|
|
array[index] == 1)" thus now executes as in C, where "array[index]" won't
|
|
be evaluated unless "index" is less than "arraySize".
|
|
|
|
The standard library now provides "local" atomic operations, which are
|
|
atomic across the gang of program instances (but not across other gangs or
|
|
other hardware threads. See the updated documentation on atomics for more
|
|
information:
|
|
http://ispc.github.io/ispc.html#atomic-operations-and-memory-fences.
|
|
|
|
The standard library now offers a clock() function, which returns a uniform
|
|
int64 value that counts processor cycles; it can be used for
|
|
fine-resolution timing measurements.
|
|
|
|
Finally (of limited interest now): ispc now supports the forthcoming AVX2
|
|
instruction set, due with Haswell-generation CPUs. All tests and examples
|
|
compile and execute correctly with AVX2. (Thanks specifically to Craig
|
|
Topper and Nadav Rotem for work on AVX2 support in LLVM, which made this
|
|
possible.)
|
|
|
|
=== v1.1.3 === (20 January 2012)
|
|
|
|
With this release, the language now supports "switch" statements, with the
|
|
same semantics and syntax as in C.
|
|
|
|
This release includes fixes for two important performance related issues:
|
|
the quality of code generated for "foreach" statements has been
|
|
substantially improved (https://github.com/ispc/ispc/issues/151), and a
|
|
performance regression with code for "gathers" that was introduced in
|
|
v1.1.2 has been fixed in this release.
|
|
|
|
A number of other small bugs were fixed in this release as well, including
|
|
one where invalid memory would sometimes be incorrectly accessed
|
|
(https://github.com/ispc/ispc/issues/160).
|
|
|
|
Thanks to Jean-Luc Duprat for a number of patches that improve support for
|
|
building on various platforms, and to Pierre-Antoine Lacaze for patches so
|
|
that ispc builds under MinGW.
|
|
|
|
=== v1.1.2 === (9 January 2012)
|
|
|
|
The major new feature in this release is support for "generic" C++
|
|
vectorized output; in other words, ispc can emit C++ code that corresponds
|
|
to the vectorized computation that the ispc program represents. See the
|
|
examples/intrinsics directory in the ispc distribution for two example
|
|
implementations of the set of functions that must be provided map the
|
|
vector calls generated by ispc to target specific functions.
|
|
|
|
ispc now has partial support for 'goto' statements; specifically, goto is
|
|
allowed if any enclosing control flow statements (if/for/while/do) have
|
|
'uniform' test expressions, but not if they have 'varying' tests.
|
|
|
|
A number of improvements have been made to the code generated for gathers
|
|
and scatters--one of them (better matching x86's "free" scale by 2/4/8 for
|
|
addressing calculations) improved the performance of the noise example by
|
|
14%.
|
|
|
|
Many small bugs have been fixed in this release as well, including issue
|
|
numbers 138, 129, 135, 127, 149, and 142.
|
|
|
|
=== v1.1.1 === (15 December 2011)
|
|
|
|
This release doesn't include any significant new functionality, but does
|
|
include a small improvements in generated code and a number of bug fixes.
|
|
|
|
The one user-visible language change is that integer constants may be
|
|
specified with 'u' and 'l' suffixes, like in C. For example, "1024llu"
|
|
defines the constant with unsigned 64-bit type.
|
|
|
|
More informative and useful error messages are printed when function
|
|
overload resolution fails.
|
|
|
|
Masking is avoided in additional cases when the mask can be
|
|
statically-determined to be all on.
|
|
|
|
A number of small bugs have been fixed:
|
|
- Under some circumstances, incorrect masks were used when assigning a
|
|
value to a reference and when doing gathers/scatters.
|
|
- Incorrect code could be generated in some cases when some instances
|
|
returned part way through a function but others contineud executing.
|
|
- Type checking wasn't being performed for calls through function pointers;
|
|
now an error is issued if the arguments don't match up, etc.
|
|
- Incorrect code was being generated for gather/scatter to structs that had
|
|
elements with varying short-vector types.
|
|
- Typechecking wasn't being performed for "foreach" statements; this led to
|
|
problems like function overload resolution not being performed if an
|
|
overloaded function call was used to determine the iteration range..
|
|
- A number of symbols would be multiply-defined when compiling to multiple
|
|
targets and using the sse2-x2 target as one of them (issue #131).
|
|
|
|
=== v1.1.0 === (5 December 2011)
|
|
|
|
This is a major new release of the compiler, with significant additions to
|
|
language functionality and capabilities. It includes a number of small
|
|
language syntax changes that will require modification of existing
|
|
programs. These changes should generally be straightforward and all are
|
|
steps toward eliminating parts of ispc syntax that are incompatible with
|
|
C/C++. See
|
|
http://ispc.github.io/ispc.html#updating-ispc-programs-for-changes-in-ispc-1-1
|
|
for more information about these changes.
|
|
|
|
ispc now fully supports pointers, including pointer arithmetic, implicit
|
|
conversions of arrays to pointers, and all of the other capabilities of
|
|
pointers in C. See http://ispc.github.io/ispc.html#pointer-types for more
|
|
information about pointers in ispc and
|
|
http://ispc.github.io/ispc.html#function-pointer-types for information
|
|
about function pointers in ispc.
|
|
|
|
Reference types are now declared with C++ syntax (e.g. "const float &foo").
|
|
|
|
ispc now supports 64-bit addressing. For performance reasons, this
|
|
capability is disabled by default (even on 64-bit targets), but can be
|
|
enabled with a command-line flag:
|
|
http://ispc.github.io/ispc.html#selecting-32-or-64-bit-addressing.
|
|
|
|
This release features new parallel "foreach" statements, which make it
|
|
easier in many instances to map program instances to data for data-parallel
|
|
computation than the programIndex/programCount mechanism:
|
|
http://ispc.github.io/ispc.html#parallel-iteration-statements-foreach-and-foreach-tiled.
|
|
|
|
Finally, all of the system's documentation has been significantly revised.
|
|
The documentation of ispc's parallel execution model has been rewritten:
|
|
http://ispc.github.io/ispc.html#the-ispc-parallel-execution-model, and
|
|
there is now a more specific discussion of similarities and differences
|
|
between ispc and C/C++:
|
|
http://ispc.github.io/ispc.html#relationship-to-the-c-programming-language.
|
|
There is now a separate FAQ (http://ispc.github.io/faq.html), and a
|
|
Performance Guide (http://ispc.github.io/perfguide.html).
|
|
|
|
=== v1.0.12 === (20 October 2011)
|
|
|
|
This release includes a new "double-pumped" 8-wide target for SSE2,
|
|
"sse2-x2". Like the sse4-x2 and avx-x2 targets, this target may deliver
|
|
higher performance for some workloads than the regular sse2 target. (For
|
|
other workloads, it may be slower.)
|
|
|
|
The ispc language now includes an "assert()" statement. See
|
|
http://ispc.github.io/ispc.html#assertions for more information.
|
|
|
|
The compiler now sets a preprocessor #define based on the target ISA; for
|
|
example, ISPC_TARGET_SSE4 is defined for the sse4 targets, and so forth.
|
|
|
|
The standard library now provides high-performance routines for converting
|
|
between some "array of structures" and "structure of arrays" formats.
|
|
See
|
|
http://ispc.github.io/ispc.html#converting-between-array-of-structures-and-structure-of-arrays-layout
|
|
for more information.
|
|
|
|
Inline functions now have static linkage.
|
|
|
|
A number of improvements have been made to the optimization passes that
|
|
detect when gathers and scatters can be transformed into vector stores and
|
|
loads, respectively. In particular, these passes now handle variables that
|
|
are used as loop induction variables much better.
|
|
|
|
=== v1.0.11 === (6 October 2011)
|
|
|
|
The main new feature in this release is support for generating code for
|
|
multiple targets (e.g., SSE2, SSE4, and AVX) and having the compiled code
|
|
select the best variant at execution time. For more information, see
|
|
http://ispc.github.io/ispc.html#compiling-with-support-for-multiple-instruction-sets.
|
|
|
|
All of the examples now take advantage of the support for multiple
|
|
compilation targets; thus, if one has an AVX system, it's not necessary to
|
|
recompile the examples to use the AVX target.
|
|
|
|
Performance of the built-in task system that is used in the examples has
|
|
been improved.
|
|
|
|
Finally, the print() statement now works on OSX; it had been broken for the
|
|
last few releases.
|
|
|
|
=== v1.0.10 === (30 September 2011)
|
|
|
|
This release features an extensive new example showing the application of
|
|
ispc to a deferred shading algorithm for scenes with thousands of lights
|
|
(examples/deferred). This is an implementation of the algorithm that Johan
|
|
Andersson described at SIGGRAPH 2009 and was implemented by Andrew
|
|
Lauritzen and Jefferson Montgomery. The basic idea is that a pre-rendered
|
|
G-buffer is partitioned into tiles, and in each tile, the set of lights
|
|
that contribute to the tile is computed. Then, the pixels in the tile are
|
|
then shaded using those light sources. (See slides 19-29 of
|
|
http://s09.idav.ucdavis.edu/talks/04-JAndersson-ParallelFrostbite-Siggraph09.pdf
|
|
for more details on the algorithm.)
|
|
|
|
The mechanism for launching tasks from ispc code has been generalized to
|
|
allow multiple tasks to be launched with a single launch call (see
|
|
http://ispc.github.io/ispc.html#task-parallelism-language-syntax for more
|
|
information.)
|
|
|
|
A few new functions have been added to the standard library: num_cores()
|
|
returns the number of cores in the system's CPU, and variants of all of the
|
|
atomic operators that take 'uniform' values as parameters have been added.
|
|
|
|
=== v1.0.9 === (26 September 2011)
|
|
|
|
The binary release of v1.0.9 is the first that supports AVX code
|
|
generation. Two targets are provided: "avx", which runs with a
|
|
programCount of 8, and "avx-x2" which runs 16 program instances
|
|
simultaneously. (This binary is also built using the in-progress LLVM 3.0
|
|
development libraries, while previous ones have been built with the
|
|
released 2.9 version of LLVM.)
|
|
|
|
This release has no other significant changes beyond a number of small
|
|
bugfixes (https://github.com/ispc/ispc/issues/100,
|
|
https://github.com/ispc/ispc/issues/101, https://github.com/ispc/ispc/issues/103.)
|
|
|
|
=== v1.0.8 === (19 September 2011)
|
|
|
|
A number of improvements have been made to handling of 'if' statements in
|
|
the language:
|
|
- A bug was fixed where invalid memory could be incorrectly accessed even
|
|
if none of the running program instances wanted to execute the
|
|
corresponding instructions (https://github.com/ispc/ispc/issues/74).
|
|
- The code generated for 'if' statements is a bit simpler and thus more
|
|
efficient.
|
|
|
|
There is now '--pic' command-line argument that causes position-independent
|
|
code to be generated (Linux and OSX only).
|
|
|
|
A number of additional performance improvements:
|
|
- Loops are now unrolled by default; the --opt=disable-loop-unroll
|
|
command-line argument can be used to disable this behavior.
|
|
(https://github.com/ispc/ispc/issues/78)
|
|
- A few more cases where gathers/scatters could be determined at compile
|
|
time to actually access contiguous locations have been added.
|
|
(https://github.com/ispc/ispc/issues/79)
|
|
|
|
Finally, warnings are now issued (if possible) when it can be determined
|
|
at compile-time that an out-of-bounds array index is being used.
|
|
(https://github.com/ispc/ispc/issues/98).
|
|
|
|
|
|
=== v1.0.7 === (3 September 2011)
|
|
|
|
The various atomic_*_global() standard library functions are generally
|
|
substantially more efficient. They all previously issued one hardware
|
|
atomic instruction for each running program instance but now locally
|
|
compute a reduction over the operands and issue a single hardware atomic,
|
|
giving the same effect and results in the end (issue #57).
|
|
|
|
CPU/ISA target handling has been substantially improved. If no CPU is
|
|
specified, the host CPU type is used, not just a default of "nehalem". A
|
|
number of bugs were fixed that ensure that LLVM doesn't generate SSE>2
|
|
instructions when using the SSE2 target (fixes issue #82).
|
|
|
|
Shift rights of unsigned integer types use a logical shift right
|
|
instruction now, not an arithmetic shift right (fixed issue #88).
|
|
|
|
When emitting header files, 'extern' declarations of globals used in ispc
|
|
code are now outside of the ispc namespace. Fixes issue #64.
|
|
|
|
The stencil example has been modified to do runs with and without
|
|
parallelism.
|
|
|
|
Many other small bugfixes and improvements.
|
|
|
|
=== v1.0.6 === (17 August 2011)
|
|
|
|
Some additional cross-program instance operations have been added to the
|
|
standard library. reduce_equal() checks to see if the given value is the
|
|
same across all running program instances, and exclusive_scan_{and,or,and}()
|
|
computes a scan over the given value in the running program instances.
|
|
See the documentation of these new routines for more information:
|
|
http://ispc.github.io/ispc.html#cross-program-instance-operations.
|
|
|
|
The simple task system implementations used in the examples have been
|
|
improved. The Windows version no nlonger has a hard limit on the number of
|
|
tasks that can be launched, and all versions have less dynamic memory
|
|
allocation and less locking. More of the examples now have paths that also
|
|
measure performance using tasks along with SPMD vectorization.
|
|
|
|
Two new examples have been added: one that shows the implementation of a
|
|
ray-marching volume rendering algorithm, and one that shows a 3D stencil
|
|
computation, as might be done for PDE solutions.
|
|
|
|
Standard library routines to issue prefetches have been added. See the
|
|
documentation for more details: http://ispc.github.io/ispc.html#prefetches.
|
|
|
|
Fast versions of the float to half-precision float conversion routines have
|
|
been added. For more details, see:
|
|
http://ispc.github.io/ispc.html#conversions-to-and-from-half-precision-floats.
|
|
|
|
There is the usual set of small bug fixes. Notably, a number of details
|
|
related to handling 32 versus 64 bit targets have been fixed, which in turn
|
|
has fixed a bug related to tasks having incorrect values for pointers
|
|
passed to them.
|
|
|
|
=== v1.0.5 === (1 August 2011)
|
|
|
|
Multi-element vector swizzles are supported; for example, given a 3-wide
|
|
vector "foo", then expressions like "foo.zyx" and "foo.yz" can be used to
|
|
construct other short vectors. See
|
|
http://ispc.github.io/ispc.html#short-vector-types
|
|
for more details. (Thanks to Pete Couperus for implementing this code!).
|
|
|
|
int8 and int16 datatypes are now supported. It is still generally more
|
|
efficient to use int32 for intermediate computations, even if the in-memory
|
|
format is int8 or int16.
|
|
|
|
There are now standard library routines to convert to and from 'half'-format
|
|
floating-point values (half_to_float() and float_to_half()).
|
|
|
|
There is a new example with an implementation of Perlin's Noise function
|
|
(examples/noise). It shows a speedup of approximately 4.2x versus a C
|
|
implementation on OSX and a 2.9x speedup versus C on Windows.
|
|
|
|
=== v1.0.4 === (18 July 2011)
|
|
|
|
enums are now supported in ispc; see the section on enumeration types in
|
|
the documentation (http://ispc.github.io/ispc.html#enumeration-types) for
|
|
more informaiton.
|
|
|
|
bools are converted to integers with zero extension, not sign extension as
|
|
before (i.e. a 'true' bool converts to the value one, not 'all bits on'.)
|
|
For cases where sign extension is still desired, there is a
|
|
sign_extend(bool) function in the standard library.
|
|
|
|
Support for 64-bit types in the standard library is much more complete than
|
|
before.
|
|
|
|
64-bit integer constants are now supported by the parser.
|
|
|
|
Storage for parameters to tasks is now allocated dynamically on Windows,
|
|
rather than on the stack; with this fix, all tests now run correctly on
|
|
Windows.
|
|
|
|
There is now support for atomic swap and compare/exchange with float and
|
|
double types.
|
|
|
|
A number of additional small bugs have been fixed and a number of cases
|
|
where the compiler would crash given a malformed program have been fixed.
|
|
|
|
=== v1.0.3 === (4 July 2011)
|
|
|
|
ispc now has a bulit-in pre-processor (from LLVM's clang compiler).
|
|
(Thanks to Pete Couperus for this patch!) It is therefore no longer
|
|
necessary to use cl.exe for preprocessing on Windows; the MSVC proejct
|
|
files for the examples have been updated accordingly.
|
|
|
|
There is another variant of the shuffle() function int the standard
|
|
library: "<type> shuffle(<type> v0, <type> v1, int permute)", where the
|
|
permutation vector indexes over the concatenation of the two vectors
|
|
(e.g. the value 0 corresponds to the first element of v0, the value
|
|
2*programCount-1 corresponds to the last element of v1, etc.)
|
|
|
|
ispc now supports the usual range of atomic operations (add, subtract, min,
|
|
max, and, or, and xor) as well as atomic swap and atomic compare and
|
|
exchange. There is also a facility for inserting memory fences. See the
|
|
"Atomic Operations and Memory Fences" section of the user's guide
|
|
(http://ispc.github.io/ispc.html#atomic-operations-and-memory-fences) for
|
|
more information.
|
|
|
|
There are now both 'signed' and 'unsigned' variants of the standard library
|
|
functions like packed_load_active() that take references to arrays of
|
|
signed int32s and unsigned int32s respectively. (The
|
|
{load_from,store_to}_{int8,int16}() functions have similarly been augmented
|
|
to have both 'signed' and 'unsigned' variants.)
|
|
|
|
In initializer expressions with variable declarations, it is no longer
|
|
legal to initialize arrays and structs with single scalar values that then
|
|
initialize their members; they now must be initialized with initializer
|
|
lists in braces (or initialized after of the initializer with a loop over
|
|
array elements, etc.)
|
|
|
|
=== v1.0.2 === (1 July 2011)
|
|
|
|
Floating-point hexidecimal constants are now parsed correctly on Windows
|
|
(fixes issue #16).
|
|
|
|
SSE2 is now the default target if --cpu=atom is given in the command line
|
|
arguments and another target isn't explicitly specified.
|
|
|
|
The standard library now provides broadcast(), rotate(), and shuffle()
|
|
routines for efficient communication between program instances.
|
|
|
|
The MSVC solution files to build the examples on Windows now use
|
|
/fpmath:fast when building.
|
|
|
|
=== v1.0.1 === (24 June 2011)
|
|
|
|
ispc no longer requires that pointers to memory that are passed in to ispc
|
|
have alignment equal to the targets vector width; now alignment just has to
|
|
be the regular element alignment (e.g. 4 bytes for floats, etc.) This
|
|
change also fixed a number of cases where it previously incorrectly
|
|
generated aligned load/store instructions in cases where the address wasn't
|
|
actually aligned (even if the base address passed into ispc code was).
|
|
|
|
=== v1.0 === (21 June 2011)
|
|
|
|
Initial Release
|