Write your own concurrent Java BitSet with blackjack and thread safety
Recently, I needed to write a simple cache on the local disk to store telemetry data grouped by one-minute intervals. To keep track of whether data was available for a specific minute on a given date, I decided to use a BitSet (1 — available, 0 — not available, index — sequential minute number starting from 00:00 of the current date). Since the cache needed to provide asynchronous operations, it was decided to use a concurrent implementation of BitSet.
It sounds impressive, but there’s a caveat: the native implementation of BitSet in Java is not thread-safe. The authors themselves caution about this in comments within the class: “A BitSet is not safe for multithreaded use without external synchronization.”
I tried to implement my thread-safe BitSet and want to show what came out of it. Important: the complete implementation will be included in the new version of my moonshine project, and you’ll be able to check out the implementation there; repository with code used in this article you can find there — concurrrent-bitset .
For now, let’s consider only 4 methods of the thread-safe BitSet.
public interface ConcurrentBitSet {
boolean get(int bitIndex);
void set(int bitIndex);
void clear(int bitIndex);
void flip(int bitIndex);
}
Below, I will provide code examples of various ways to implement a thread-safe BitSet and their comparison using the following tests:
1. thread safety correctness test, which will demonstrate that a specific implementation is free from data races.
2. benchmark, which will showcase the performance of each implementation in terms of throughput.
The thread safety correctness test is written using the MultithreadedTC framework, while the benchmark is conducted using the Java Microbenchmark Harness (JMH). Important: All tests and benchmarks were conducted on JVM 21. When using a different version of the JVM, the results may vary. In JVM 8 and 11, for instance, the difference between approaches 1–4 is not as significant as in versions 14–21.
To begin, let’s attempt the Thread Safety Correctness Test on the native Java implementation of BitSet and verify that it’s indeed not thread-safe, and the test can identify this.
public class ConcurrentBitSetMultithreadingTest extends MultithreadedTest {
private BitSet nativeBitSet;
@Override
public void initialize() {
this.nativeBitSet = new BitSet(10);
}
public void thread1() throws InterruptedException {
this.nativeBitSet.set(1);
}
public void thread2() throws InterruptedException {
this.nativeBitSet.set(2);
}
@Override
public void finish() {
assertTrue(nativeBitSet.get(1) && nativeBitSet.get(2));
}
@Test
public void set_get_test() throws Throwable {
TestFramework.runManyTimes(new ConcurrentBitSetMultithreadingTest(), 1000);
}
}
In this test, there are two threads simultaneously executing the thread1 and thread2 methods. Both of these methods set the values of bits 1 and 2, respectively. If there are no data races, both threads should be able to see the designated values and set them. *In the Java BitSet implementation, bits 1 and 2 belong to the same array element.* The finish method checks the values of bits 1 and 2: if both bits were set, then assertTrue won’t throw an exception, otherwise, the test will fail.
Such a test on the native Java implementation of BitSet is expected to fail, indicating that the test has identified thread safety issues.
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running io.github.ololx.samples.concurrent.bitset.ConcurrentBitsetTest
Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.056 sec - in io.github.ololx.samples.concurrent.bitset.ConcurrentBitsetTest
Running io.github.ololx.samples.concurrent.bitset.ConcurrentBitSetMultithreadingTest
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.019 sec <<< FAILURE! - in io.github.ololx.samples.concurrent.bitset.ConcurrentBitSetMultithreadingTest
set_get_test(io.github.ololx.samples.concurrent.bitset.ConcurrentBitSetMultithreadingTest) Time elapsed: 0.018 sec <<< FAILURE!
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:55)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.Assert.assertTrue(Assert.java:31)
at io.github.ololx.samples.concurrent.bitset.ConcurrentBitSetMultithreadingTest.finish(ConcurrentBitSetMultithreadingTest.java:28)
Results :
Failed tests:
ConcurrentBitSetMultithreadingTest.set_get_test:33->finish:28 null
Tests run: 17, Failures: 1, Errors: 0, Skipped: 0
1. Concurrent BitSet using synchronization
The first and most apparent approach is to synchronize each access to the internal data structures — the array of bits.
Effectively, it’s enough to create a wrapper around the native BitSet instance, in which every invocation of methods from this instance would be synchronized.
public class ConcurrentBitSetWithFullSynchronization {
private final BitSet bitSet;
private final Object monitor;
public ConcurrentBitSetWithFullSynchronization(int size) {
this.bitSet = new BitSet(size);
this.monitor = new Object();
}
@Override
public boolean get(int bitIndex) {
synchronized (this.monitor) {
return this.bitSet.get(bitIndex);
}
}
@Override
public void set(int bitIndex) {
synchronized (this.monitor) {
this.bitSet.set(bitIndex);
}
}
@Override
public void clear(int bitIndex) {
synchronized (this.monitor) {
this.bitSet.clear(bitIndex);
}
}
@Override
public void flip(int bitIndex) {
synchronized (this.monitor) {
this.bitSet.flip(bitIndex);
}
}
}
Such an implementation of BitSet is already considered thread-safe and passes the thread safety correctness test. It’s important to note that now only one thread can be in any given method at a time, even if different parts of our BitSet are being accessed. In this scenario, under heavy multi-threaded usage, high performance should not be expected. I won’t conduct the benchmark results for this implementation since there’s nothing to compare the results to at the moment.
2. Concurrent BitSet using read-write lock
The main drawback of the first implementation is the synchronization of any thread accessing the BitSet methods, which puts the threads in a sequential queue, causing them to wait for another thread to finish. In this case, it might be worth considering not blocking the threads attempting to read the BitSet value until a thread tries to modify the BitSet value. To implement such an approach, using a read-write lock could be a suitable choice.
A read-write lock distinguishes between read operations and write operations:
1. Read Lock: Multiple threads can hold a read lock simultaneously as long as no threads hold a write lock. This allows for concurrent read operations without blocking each other, which is useful when the data is being read more frequently than being written.
2. Write Lock: Only one thread can hold a write lock at a time, and during this time, no other thread can hold either a read or write lock. This ensures that write operations are exclusive and don’t interfere with each other or ongoing read operations.
The java.util.concurrent package in Java provides the ReentrantReadWriteLock class, which implements a read-write lock. This kind of lock can help achieve better concurrency in scenarios where there’s a mix of read-heavy and write heavy operations.
To implement a thread-safe BitSet using Java’s ReentrantReadWriteLock, it’s enough to create a wrapper around the native BitSet instance. In this wrapper, also need to initialize an instance of ReadWriteLock, and every call to the BitSet methods would be surrounded by the appropriate lock:
- A write-lock for methods that modify the BitSet’s value.
- A read-lock for methods that read the BitSet’s value.
public class ConcurrentBitSetWithGeneralRWLock {
private final BitSet bitSet;
private final ReadWriteLock readWriteLock;
public ConcurrentBitSetWithGeneralRWLock(int size) {
this.bitSet = new BitSet(size);
this.readWriteLock = new ReentrantReadWriteLock();
}
@Override
public boolean get(int bitIndex) {
return this.lockAndGet(bitIndex, () -> this.bitSet.get(bitIndex));
}
@Override
public void set(int bitIndex) {
this.lockAndSet(bitIndex, this.bitSet::set);
}
@Override
public void clear(int bitIndex) {
this.lockAndSet(bitIndex, this.bitSet::clear);
}
@Override
public void flip(int bitIndex) {
this.lockAndSet(bitIndex, this.bitSet::flip);
}
private Boolean lockAndGet(int bitIndex, Supplier<Boolean> getBitSupplier) {
this.readWriteLock.readLock().lock();
try {
return getBitSupplier.get();
} finally {
this.readWriteLock.readLock().unlock();
}
}
private void lockAndSet(int bitIndex, Consumer<Integer> setBitConsumer) {
this.readWriteLock.writeLock().lock();
try {
setBitConsumer.accept(bitIndex);
} finally {
this.readWriteLock.writeLock().unlock();
}
}
}
Such an implementation, in theory, should be more performant than the previous one, as not all threads would be synchronized at all times. Let’s test this assumption in practice using a benchmark.
@State(Scope.Thread)
@Warmup(
iterations = 5,
time = 100,
timeUnit = TimeUnit.MILLISECONDS
)
@Measurement(
iterations = 5,
time = 100,
timeUnit = TimeUnit.MILLISECONDS
)
@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class ConcurrentBitSetBenchmark {
private static final int AVAILABLE_CPU = Runtime.getRuntime().availableProcessors();
private static final String FULL_SYNCHRONIZATION = "javaNativeWithSynchronizationByThis";
private static final String ONE_READ_WRITE_LOCK = "javaNativeWithOneReadWriteLock";
private ConcurrentBitSet concurrentBitSet;
@Param(
{
FULL_SYNCHRONIZATION,
ONE_READ_WRITE_LOCK
}
)
private String typeOfBitSetRealization;
private ExecutorService executor;
@Param({"10", "100", "1000"})
private int sizeOfBitSet;
@Param({"1", "10"})
private int countOfSetters;
@Param({"1", "10"})
private int countOfGetters;
public ConcurrentBitSetBenchmark() {}
public static void main(String[] args) throws RunnerException {
var options = new OptionsBuilder()
.include(ConcurrentBitSetBenchmark.class.getSimpleName())
.build();
new Runner(options).run();
}
@Setup
public void setup() {
switch (typeOfBitSetRealization) {
case FULL_SYNCHRONIZATION -> concurrentBitSet = new ConcurrentBitSetWithFullSynchronization(sizeOfBitSet);
case ONE_READ_WRITE_LOCK -> concurrentBitSet = new ConcurrentBitSetWithGeneralRWLock(sizeOfBitSet);
}
executor = Executors.newWorkStealingPool(AVAILABLE_CPU);
}
@TearDown
public void tearDown() {
executor.shutdown();
}
@Benchmark
public void get_set_benchmark(Blackhole blackhole) {
var tasksCount = sizeOfBitSet * (countOfGetters + countOfSetters);
var bitsetInvocations = new ArrayList<CompletableFuture<Void>>(tasksCount);
for (int setBitOpNumber = 0; setBitOpNumber < countOfGetters; setBitOpNumber++) {
bitsetInvocations.add(CompletableFuture.runAsync(() -> {
for (int bitNumber = 0; bitNumber < sizeOfBitSet; bitNumber++) {
blackhole.consume(concurrentBitSet.get(bitNumber));
}
}, executor));
}
for (int getBitOpNumber = 0; getBitOpNumber < countOfSetters; getBitOpNumber++) {
bitsetInvocations.add(CompletableFuture.runAsync(() -> {
for (int bitNumber = 0; bitNumber < sizeOfBitSet; bitNumber++) {
concurrentBitSet.set(bitNumber);
}
}, executor));
}
CompletableFuture.allOf(bitsetInvocations.toArray(CompletableFuture[]::new)).join();
}
}
This test is executed with different combinations of parameters:
- typeOfBitSetRealization — the type of thread-safe BitSet implementation:
- javaNativeWithSynchronizationByThis — concurrent BitSet using synchronization;
- javaNativeWithOneReadWriteLock — concurrent BitSet using read-write lock;
- countOfGetters — the number of accesses for reading a specific bit from the BitSet (number of threads for reading);
- countOfSetters — the number of accesses for writing a specific bit to the BitSet (number of threads for writing);
- sizeOfBitSet — the number of bits in the BitSet, also determines the number of bits read/written by each thread.
As a result, we can observe that the implementation with a read-write lock performs better compared to the synchronized approach when there is high contention and more frequent write accesses.
3. Concurrent BitSet with fine-grained locking
In the previous enhancement, is attempted to restrict a thread’s access to reading from the BitSet only if another thread was attempting to write a value to the BitSet at that moment. Therefore, the new enhancement is based on the following idea — instead of locking access to the entire BitSet, it’s better to synchronize access only when threads are accessing the same index in the array of bits. This approach is known as fine-grained locking. *It’s worth noting that in a BitSet, all bits are stored in an array (in the native Java BitSet, this is a long[]), where each array element represents a machine word.
And two threads writing to adjacent bytes shouldn’t interfere with each other — word tearing is forbidden in the JVM. For the ‘words’ array, it’s necessary to correspond array of locks. At the start, the size of the lock array should be less than or equal to the internal size of the ‘words’ array — this is important to prevent two locks on a single word. The size of the lock array can be made a constructor parameter and configured more finely, but for simplicity, we’ll create a lock array with the same dimension as the ‘words’ array.
public class ConcurrentBitSetWithSegmentsRWLocks {
private static final int ADDRESS_BITS_PER_WORD = 6;
private final BitSet bitSet;
private final ReadWriteLock[] readWriteLocks;
public ConcurrentBitSetWithSegmentsRWLocks(int size) {
this.bitSet = new BitSet(size);
this.readWriteLocks = new ReadWriteLock[wordIndex(size - 1) + 1];
IntStream.range(0, this.readWriteLocks.length)
.forEach(index -> this.readWriteLocks[index] = new ReentrantReadWriteLock());
}
@Override
public boolean get(int bitIndex) {
return this.lockAndGet(bitIndex, () -> this.bitSet.get(bitIndex));
}
@Override
public void set(int bitIndex) {
this.lockAndSet(bitIndex, this.bitSet::set);
}
@Override
public void clear(int bitIndex) {
this.lockAndSet(bitIndex, this.bitSet::clear);
}
@Override
public void flip(int bitIndex) {
this.lockAndSet(bitIndex, this.bitSet::flip);
}
private Boolean lockAndGet(int bitIndex, Supplier<Boolean> getBitSupplier) {
this.readWriteLocks[wordIndex(bitIndex)].readLock().lock();
try {
return getBitSupplier.get();
} finally {
this.readWriteLocks[wordIndex(bitIndex)].readLock().unlock();
}
}
private void lockAndSet(int bitIndex, Consumer<Integer> setBitConsumer) {
this.readWriteLocks[wordIndex(bitIndex)].writeLock().lock();
try {
setBitConsumer.accept(bitIndex);
} finally {
this.readWriteLocks[wordIndex(bitIndex)].writeLock().unlock();
}
}
private int wordIndex(int bitIndex) {
return bitIndex >> ADDRESS_BITS_PER_WORD;
}
}
Such an implementation passes the thread safety correctness test and in theory, it should be more performant than the previous implementations, as evidenced by the benchmark results. In the new benchmark, the parameter typeOfBitSetRealization is added:
- javaNativeWithSynchronizationByThis — concurrent BitSet using synchronization;
- javaNativeWithOneReadWriteLock — concurrent BitSet using read-write lock;
- javaNativeWithManyReadWriteLocksBySegments — concurrent BitSet using fine-grained locking.
As a result, you can observe that fine-grained locking has proven to be more performant than the two previous approaches, and the performance advantage becomes more noticeable as the size of the BitSet elements increases.
4. Concurrent BitSet using lock free
The previous implementation turned out to be even more performant, but it still queues up threads, forcing them to wait for another thread’s completion.
Additionally, the extra cost of synchronizing the system context increases as the number of waiting threads grows. Therefore, the emphasis in this new implementation is on eliminating locks entirely.
The requirements for the new implementation can be formulated as follows:
1. Any two threads working with different words should not be synchronized or queued (word tearing is forbidden in the JVM).
2. If multiple threads access the same word for read/write, there should be no reordering of operations (happens-before guarantees between threads should be provided).
3. Multiple readers for the same word index should not be blocked unless there is a writer among them.
When looking at these requirements, (1) is already inherent in the JVM and was utilized in the previous implementation, and for (2)-(3), we need to make our ‘words’ array volatile. Unfortunately, we can’t simply declare our ‘words’ array as volatile and be done with it: even if an array is declared as volatile, it doesn’t provide volatile semantics for reading or writing its elements. For simultaneous access to the k-th element of the array, external synchronization is required. Volatile applies only to the array reference itself. However, we can implement our custom array with atomic features (similar to atomic data types), which would fulfill both (2) and (3). This custom array can provide us with volatile read operations and non-blocking insertion of new values through CAS operations.
class AtomicWordArray {
private static final VarHandle WORDS_VAR_HANDLER = MethodHandles.arrayElementVarHandle(byte[].class);
private final byte[] words;
private AtomicWordArray(int size) {
this.words = new byte[size];
}
private void setWordVolatile(int wordIndex, Function<Byte, Byte> binaryOperation) {
byte currentWordValue;
byte newWordValue;
do {
currentWordValue = this.getWordVolatile(wordIndex);
newWordValue = binaryOperation.apply(currentWordValue);
} while (!WORDS_VAR_HANDLER.compareAndSet(this.words, wordIndex, currentWordValue, newWordValue));
}
private byte getWordVolatile(int wordIndex) {
return (byte) WORDS_VAR_HANDLER.getVolatile(this.words, wordIndex);
}
}
I intentionally use a byte[] as the 'words' array, as this allows us to use 1 byte when creating an 8-bit BitSet instead of 64 bits, and simplifies the logic of working with bits. However, nothing prevents us from implementing a similar Atomic array over long[].
Now, it's enough to apply the AtomicWordArray in the lock-free BitSet implementation and write the implementation of the ConcurrentBitSet methods.
public class NonBlockingConcurrentBitSet {
private static final int WORD_SIZE = Byte.SIZE;
private final AtomicWordArray data;
public NonBlockingConcurrentBitset(int size) {
this.data = new AtomicWordArray((size + WORD_SIZE - 1) / WORD_SIZE);
}
@Override
public boolean get(int bitIndex) {
int wordIndex = bitIndex / WORD_SIZE;
int bitOffset = bitIndex % WORD_SIZE;
int bitMask = 1 << bitOffset;
return (this.data.getWordVolatile(wordIndex) & bitMask) >> bitOffset != 0;
}
@Override
public void set(int bitIndex) {
int wordIndex = bitIndex / WORD_SIZE;
int bitOffset = bitIndex % WORD_SIZE;
int bitMask = 1 << bitOffset;
this.data.setWordVolatile(wordIndex, (word) -> (byte) (word | bitMask));
}
@Override
public void clear(int bitIndex) {
int wordIndex = bitIndex / WORD_SIZE;
int bitOffset = bitIndex % WORD_SIZE;
int bitMask = 1 << bitOffset;
this.data.setWordVolatile(wordIndex, (word) -> (byte) (word & ~bitMask));
}
@Override
public void flip(int bitIndex) {
int wordIndex = bitIndex / WORD_SIZE;
int bitOffset = bitIndex % WORD_SIZE;
int bitMask = 1 << bitOffset;
this.data.setWordVolatile(wordIndex, (word) -> (byte) (word ^ bitMask));
}
private static class AtomicWordArray {
private static final VarHandle WORDS_VAR_HANDLER = MethodHandles.arrayElementVarHandle(byte[].class);
private final byte[] words;
private AtomicWordArray(int size) {
this.words = new byte[size];
}
private void setWordVolatile(int wordIndex, Function<Byte, Byte> binaryOperation) {
byte currentWordValue;
byte newWordValue;
do {
currentWordValue = this.getWordVolatile(wordIndex);
newWordValue = binaryOperation.apply(currentWordValue);
} while (!WORDS_VAR_HANDLER.compareAndSet(this.words, wordIndex, currentWordValue, newWordValue));
}
private byte getWordVolatile(int wordIndex) {
return (byte) WORDS_VAR_HANDLER.getVolatile(this.words, wordIndex);
}
}
}
This implementation passes the thread safety correctness test. In the new benchmark, the parameter typeOfBitSetRealization is added:
- javaNativeWithSynchronizationByThis — concurrent BitSet using synchronization;
- javaNativeWithOneReadWriteLock — concurrent BitSet using read-write lock;
- javaNativeWithManyReadWriteLocksBySegments — concurrent BitSet using fine-grained locking;
- NonBlockingConcurrentBitset — concurrent BitSet using lock free.
As a result, it’s evident that the current implementation has outperformed all the previous ones in terms of performance, all while maintaining the absence of compromises present in the 2nd and 3rd implementations. It’s also worth noting that with low or no contention (1 setter, 1 getter, and 64 bits), all implementations exhibit similar performance.