309 lines
13 KiB
Markdown
309 lines
13 KiB
Markdown
# 3.x series change log
|
|
|
|
This page summarizes the major functional and performance changes in each
|
|
release of the 3.x series.
|
|
|
|
All performance data on this page is measured on an Intel Core i5-9600K
|
|
clocked at 4.2 GHz, running `astcenc` using AVX2 and 6 threads.
|
|
|
|
<!-- ---------------------------------------------------------------------- -->
|
|
## 3.7
|
|
|
|
**Status:** April 2022
|
|
|
|
The 3.7 release contains another round of performance optimizations, including
|
|
significant improvements to the command line front-end (faster PNG loader) and
|
|
the arm64 build of the codec (faster NEON implementation).
|
|
|
|
* **General:**
|
|
* **Feature:** The command line tool PNG loader has been switched to use
|
|
the Wuffs library, which is robust and significantly faster than the
|
|
current stb_image implementation.
|
|
* **Feature:** Support for non-invariant builds returns. Opt-in to slightly
|
|
faster, but not bit-exact, builds by setting `-DNO_INVARIANCE=ON` for the
|
|
CMake configuration. This improves performance by around 2%.
|
|
* **Optimization:** Changed SIMD `select()` so that it matches the default
|
|
NEON behavior (bitwise select), rather than the default x86-64 behavior
|
|
(lane select on MSB). Specialization `select_msb()` added for the one case
|
|
we want to select on a sign-bit, where NEON needs a different
|
|
implementation. This provides a significant (>25%) performance uplift on
|
|
NEON implementations.
|
|
|
|
### Performance:
|
|
|
|
Key for charts:
|
|
|
|
* Color = block size (see legend).
|
|
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
|
|
|
**Relative performance vs 3.5 release:**
|
|
|
|

|
|
|
|
<!-- ---------------------------------------------------------------------- -->
|
|
## 3.6
|
|
|
|
**Status:** April 2022
|
|
|
|
The 3.6 release contains another round of performance optimizations.
|
|
|
|
There are no interface changes in this release, but in general the API is not
|
|
designed to be binary compatible across versions. We always recommend
|
|
rebuilding your client-side code using the updated `astcenc.h` header.
|
|
|
|
* **General:**
|
|
* **Feature:** Data tables are now optimized for contexts without the
|
|
`SELF_DECOMPRESS_ONLY` flag set. The flag therefore no longer improves
|
|
compression performance, but still reduces context creation time and
|
|
context data table memory footprint.
|
|
* **Feature:** Image quality for 4x4 `-fastest` configuration has been
|
|
improved.
|
|
* **Optimization:** Decimation modes are reliably excluded from processing
|
|
when they are only partially selected in the compressor configuration (e.g.
|
|
if used for single plane, but not dual plane modes). This is a significant
|
|
performance optimization for all quality levels.
|
|
* **Optimization:** Fast-path block load function variant added for 2D LDR
|
|
images with no swizzle. This is a moderate performance optimization for the
|
|
fast and fastest quality levels.
|
|
|
|
### Performance:
|
|
|
|
Key for charts:
|
|
|
|
* Color = block size (see legend).
|
|
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
|
|
|
**Relative performance vs 3.5 release:**
|
|
|
|

|
|
|
|
<!-- ---------------------------------------------------------------------- -->
|
|
## 3.5
|
|
|
|
**Status:** March 2022
|
|
|
|
The 3.5 release contains another round of performance optimizations.
|
|
|
|
There are no interface changes in this release, but in general the API is not
|
|
designed to be binary compatible across versions. We always recommend
|
|
rebuilding your client-side code using the updated `astcenc.h` header.
|
|
|
|
* **General:**
|
|
* **Feature:** Compressor configurations using `SELF_DECOMPRESS_ONLY` mode
|
|
store compacted partition tables, which significantly improves both
|
|
context create time and runtime performance.
|
|
* **Feature:** Bilinear infill for decimated weight grids supports a new
|
|
variant for half-decimated grids which are only decimated in one axis.
|
|
|
|
### Performance:
|
|
|
|
Key for charts:
|
|
|
|
* Color = block size (see legend).
|
|
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
|
|
|
**Relative performance vs 3.4 release:**
|
|
|
|

|
|
|
|
|
|
<!-- ---------------------------------------------------------------------- -->
|
|
## 3.4
|
|
|
|
**Status:** February 2022
|
|
|
|
The 3.4 release introduces another round of optimizations, removing a number
|
|
of power-user configuration options to simplify the core compressor data path.
|
|
|
|
Reminder for users of the library interface - the API is not designed to be
|
|
binary compatible across versions, and this release is not compatible with
|
|
earlier releases. Please update and rebuild your client-side code using the
|
|
updated `astcenc.h` header.
|
|
|
|
* **General:**
|
|
* **Feature:** Many memory allocations have been moved off the stack into
|
|
dynamically allocated working memory. This significantly reduces the peak
|
|
stack usage, allowing the compressor to run in systems with 128KB stack
|
|
limits.
|
|
* **Feature:** Builds now support `-DBLOCK_MAX_TEXELS=<count>` to allow a
|
|
compressor to support a subset of block sizes. This can reduce binary size
|
|
and runtime memory footprint, and improve performance.
|
|
* **Feature:** The `-v` and `-va` options to set a per-texel error weight
|
|
function are no longer supported.
|
|
* **Feature:** The `-b` option to set a per-texel error weight boost for
|
|
block border texels is no longer supported.
|
|
* **Feature:** The `-a` option to set a per-texel error weight based on texel
|
|
alpha value is no longer supported as an error weighting tool, but is still
|
|
supported for providing sprite-sheet RDO.
|
|
* **Feature:** The `-mask` option to set an error metric for mask map
|
|
textures is still supported, but is currently a no-op in the compressor.
|
|
* **Feature:** The `-perceptual` option to set a perceptual error metric is
|
|
still supported, but is currently a no-op in the compressor for mask map
|
|
and normal map textures.
|
|
* **Bug-fix:** Corrected decompression of error blocks in some cases, so now
|
|
returning the expected error color (magenta for LDR, NaN for HDR). Note
|
|
that astcenc determines the error color to use based on the output image
|
|
data type not the decoder profile.
|
|
* **Binary releases:**
|
|
* **Improvement:** Windows binaries changed to use ClangCL 12.0, which gives
|
|
up to 10% performance improvement.
|
|
|
|
### Performance:
|
|
|
|
Key for charts:
|
|
|
|
* Color = block size (see legend).
|
|
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
|
|
|
**Relative performance vs 3.3 release:**
|
|
|
|

|
|
|
|
|
|
<!-- ---------------------------------------------------------------------- -->
|
|
## 3.3
|
|
|
|
**Status:** November 2021
|
|
|
|
The 3.3 release improves image quality for normal maps, and two component
|
|
textures. Normal maps are expected to compress 25% slower than the 3.2
|
|
release, although it should be noted that they are still faster to compress
|
|
in 3.3 than when using the 2.5 series. This release also fixes one reported
|
|
stability issue.
|
|
|
|
* **General:**
|
|
* **Feature:** Normal map image quality has been improved.
|
|
* **Feature:** Two component image quality has been improved, provided
|
|
that unused components are correctly zero-weighted using e.g. `-cw` on the
|
|
command line.
|
|
* **Bug-fix:** Improved stability when trying to compress complex blocks that
|
|
could not beat even the starting quality threshold. These will now always
|
|
compress in to a constant color blocks.
|
|
|
|
<!-- ---------------------------------------------------------------------- -->
|
|
## 3.2
|
|
|
|
**Status:** August 2021
|
|
|
|
The 3.2 release is a bugfix release; no significant image quality or
|
|
performance differences are expected.
|
|
|
|
* **General:**
|
|
* **Bug-fix:** Improved stability when new contexts were created while other
|
|
contexts were compressing or decompressing an image.
|
|
* **Bug-fix:** Improved stability when decompressing blocks with invalid
|
|
block encodings.
|
|
|
|
<!-- ---------------------------------------------------------------------- -->
|
|
## 3.1
|
|
|
|
**Status:** July 2021
|
|
|
|
The 3.1 release gives another performance boost, typically between 5 and 20%
|
|
faster than the 3.0 release, as well as further incremental improvements to
|
|
image quality. A number of build system improvements make astcenc easier and
|
|
faster to integrate into other projects as a library, including support for
|
|
building universal binaries on macOS. Full change list is shown below.
|
|
|
|
Reminder for users of the library interface - the API is not designed to be
|
|
binary compatible across versions, and this release is not compatible with
|
|
earlier releases. Please update and rebuild your client-side code using the
|
|
updated `astcenc.h` header.
|
|
|
|
* **General:**
|
|
* **Feature:** RGB color data now supports `-perceptual` operation. The
|
|
current implementation is simple, weighting color channel errors by their
|
|
contribution to perceived luminance. This mimics the behavior of the human
|
|
visual system, which is most sensitive to green, then red, then blue.
|
|
* **Feature:** Codec supports a new low weight search mode, which is a
|
|
simpler weight assignment for encodings with a low number of weights in the
|
|
weight grid. The weight threshold can be overridden using the new
|
|
`-lowweightmodelimit` command line option.
|
|
* **Feature:** All platform builds now support building a native binary.
|
|
Native binaries automatically select the SIMD level based on the default
|
|
configuration of the compiler in use. Native binaries built on one machine
|
|
may use different SIMD options than native binaries build on another.
|
|
* **Feature:** macOS platform builds now support building universal binaries
|
|
containing both `x86_64` and `arm64` target support.
|
|
* **Feature:** Building the command line can be disabled when using as a
|
|
library in another project. Set `-DCLI=OFF` during the CMake configure
|
|
step.
|
|
* **Feature:** A standalone minimal example of the core codec API usage has
|
|
been added in the `./Utils/Example/` directory.
|
|
* **Core API:**
|
|
* **Feature:** Config flag `ASTCENC_FLG_USE_PERCEPTUAL` works for color data.
|
|
* **Feature:** Config option `tune_low_weight_count_limit` added.
|
|
* **Feature:** New heuristic added which prunes dual weight plane searches if
|
|
they are unlikely to help. This heuristic is not user controllable.
|
|
* **Feature:** Image quality has been improved. In general we see significant
|
|
improvements (up to 0.2dB) for high bitrate encodings (4x4, 5x4), and a
|
|
smaller improvement (up to 0.1dB) for lower bitrate encodings.
|
|
* **Bug fix:** Arm "none" SIMD builds could be invariant with other builds.
|
|
This fix has also been back-ported to the 2.x LTS branch.
|
|
|
|
### Performance:
|
|
|
|
Key for charts:
|
|
|
|
* Color = block size (see legend).
|
|
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
|
|
|
**Relative performance vs 3.0 release:**
|
|
|
|

|
|
|
|
<!-- ---------------------------------------------------------------------- -->
|
|
## 3.0
|
|
|
|
**Status:** June 2021
|
|
|
|
The 3.0 release is the first in a series of updates to the compressor that are
|
|
making more radical changes than we felt we could make with the 2.x series.
|
|
The primary goals of the 3.x series are to keep the image quality ~static or
|
|
better compared to the 2.5 release, but continue to improve performance.
|
|
|
|
Reminder for users of the library interface - the API is not designed to be
|
|
binary compatible across versions, and this release is not compatible with
|
|
earlier releases. Please update and rebuild your client-side code using the
|
|
updated `astcenc.h` header.
|
|
|
|
* **General:**
|
|
* **Feature:** The code has been significantly cleaned up, with improved
|
|
comments, API documentation, function naming, and variable naming.
|
|
* **Core API:**
|
|
* **API Change:** The core APIs for `astcenc_compress_image()` and for
|
|
`astcenc_decompress_image()` now accept swizzle structures by `const`
|
|
pointer, instead of pass-by-value.
|
|
* **API Change:** Calling the `astcenc_compress_reset()` and the
|
|
`astcenc_decompress_reset()` functions between images is no longer required
|
|
if the context was created for use by a single thread.
|
|
* **Feature:** New heuristics have been added for controlling when to search
|
|
beyond 2 partitions and 1 plane, and when to search beyond 3 partitions and
|
|
1 plane. The previous `tune_partition_early_out_limit` config option has
|
|
been removed, and replaced with two new options
|
|
`tune_2_partition_early_out_limit_factor` and
|
|
`tune_3_partition_early_out_limit_factor`. See command line help for more
|
|
detailed documentation.
|
|
* **Feature:** New heuristics have been added for controlling when to use
|
|
dual weight planes. The previous `tune_two_plane_early_out_limit` has been
|
|
renamed to`tune_2_plane_early_out_limit_correlation`. See command line help
|
|
for more detailed documentation.
|
|
* **Feature:** Support for using dual weight planes has been restricted to
|
|
single partition blocks; it rarely helps blocks with 2 or more partitions
|
|
and takes considerable compression search time.
|
|
|
|
### Performance:
|
|
|
|
Key for charts:
|
|
|
|
* Color = block size (see legend).
|
|
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).
|
|
|
|
**Relative performance vs 2.5 release:**
|
|
|
|

|
|
|
|
- - -
|
|
|
|
_Copyright © 2021-2022, Arm Limited and contributors. All rights reserved._
|