高吞吐框架DISRUPTOR应用场景
多年前在并发编程网http://ifeve.com/disruptor了解到了自认为是黑科技的并发框架DISRUPTOR, 我当时在想NETTY为什么没有和它整合。后来了解过的log4j2, jstorm也慢慢有用到, 而一直以来也并没有机会去使用和了解细节, 大多时候觉得Doug Lea的JDK并发包也足够使用。而近期业务需要基于NETTY简单裹了一个类似vertx的luoying-server, https://github.com/zealzeng/luoying-server, 业务处理线程不适合在event loop中处理, 简单用有界的ThreadPoolExecutor作为worker pool, 想考虑把disruptor整合进来, 看了两天发觉对disruptor的使用场景产生了误解。
具体的入门有需要可以到并发网或官网https://github.com/LMAX-Exchange/disruptor看下
1.官方的性能测试用例
1.1 JDK BlockingQueue的吞吐测试
先来个一个生产者, 一个消费者用例 https://github.com/zealzeng/fabric-samples/blob/master/disruptor-demo/src/main/java/com/lmax/disruptor/queue/OneToOneQueueThroughputTest.java
package com.lmax.disruptor.queue;
import com.lmax.disruptor.AbstractPerfTestQueue;
import com.lmax.disruptor.support.ValueAdditionQueueProcessor;
import com.lmax.disruptor.util.DaemonThreadFactory;
import java.util.concurrent.*;
import static com.lmax.disruptor.support.PerfTestUtil.failIf;
/**
* <pre>
* UniCast a series of items between 1 publisher and 1 event processor.
*
* +----+ +-----+
* | P1 |--->| EP1 |
* +----+ +-----+
*
* Queue Based:
* ============
*
* put take
* +----+ +====+ +-----+
* | P1 |--->| Q1 |<---| EP1 |
* +----+ +====+ +-----+
*
* P1 - Publisher 1
* Q1 - Queue 1
* EP1 - EventProcessor 1
*
* </pre>
*/
public final class OneToOneQueueThroughputTest extends AbstractPerfTestQueue
{
private static final int BUFFER_SIZE = 1024 * 64;
private static final long ITERATIONS = 1000L * 1000L * 10L;
private final ExecutorService executor = Executors.newSingleThreadExecutor(DaemonThreadFactory.INSTANCE);
private final long expectedResult = ITERATIONS * 3L;
///////////////////////////////////////////////////////////////////////////////////////////////
private final BlockingQueue<Long> blockingQueue = new LinkedBlockingQueue<Long>(BUFFER_SIZE);
private final ValueAdditionQueueProcessor queueProcessor =
new ValueAdditionQueueProcessor(blockingQueue, ITERATIONS - 1);
///////////////////////////////////////////////////////////////////////////////////////////////
@Override
protected int getRequiredProcessorCount()
{
return 2;
}
@Override
protected long runQueuePass() throws InterruptedException
{
final CountDownLatch latch = new CountDownLatch(1);
queueProcessor.reset(latch);
Future<?> future = executor.submit(queueProcessor);
long start = System.currentTimeMillis();
for (long i = 0; i < ITERATIONS; i++)
{
blockingQueue.put(3L);
}
latch.await();
long opsPerSecond = (ITERATIONS * 1000L) / (System.currentTimeMillis() - start);
queueProcessor.halt();
future.cancel(true);
failIf(expectedResult, 0);
return opsPerSecond;
}
public static void main(String[] args) throws Exception
{
OneToOneQueueThroughputTest test = new OneToOneQueueThroughputTest();
test.testImplementations();
}
}
主线程往queue增加元素, 一个消费线程执行ValueAdditionQueueProcessor是runnable任务,逻辑简单就是获取队列元素后累加, 时间损耗很很小基本相当于空跑。 在老的四代I5貌似每秒吞吐也蛮高, 百万级。 实际场景ValueAdditionQueueProcessor处理业务也要几十到几百毫秒吧,单线程消费基本就不行了,所以ThreadPoolExecutor基本是要为获取出来的任务分配一个线程,这个是常规的搞法。
Starting Queue tests
Run 0, BlockingQueue=4,539,264 ops/sec
Run 1, BlockingQueue=5,414,185 ops/sec
Run 2, BlockingQueue=4,657,661 ops/sec
Run 3, BlockingQueue=5,288,207 ops/sec
Run 4, BlockingQueue=5,339,028 ops/sec
Run 5, BlockingQueue=5,246,589 ops/sec
Run 6, BlockingQueue=5,197,505 ops/sec
1.2 disruptor的一个生产者, 一个消费者
package com.lmax.disruptor.sequenced;
import static com.lmax.disruptor.RingBuffer.createSingleProducer;
import static com.lmax.disruptor.support.PerfTestUtil.failIfNot;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import com.lmax.disruptor.*;
import com.lmax.disruptor.support.PerfTestUtil;
import com.lmax.disruptor.support.ValueAdditionEventHandler;
import com.lmax.disruptor.support.ValueEvent;
import com.lmax.disruptor.util.DaemonThreadFactory;
/**
* <pre>
* UniCast a series of items between 1 publisher and 1 event processor.
*
* +----+ +-----+
* | P1 |--->| EP1 |
* +----+ +-----+
*
* Disruptor:
* ==========
* track to prevent wrap
* +------------------+
* | |
* | v
* +----+ +====+ +====+ +-----+
* | P1 |--->| RB |<---| SB | | EP1 |
* +----+ +====+ +====+ +-----+
* claim get ^ |
* | |
* +--------+
* waitFor
*
* P1 - Publisher 1
* RB - RingBuffer
* SB - SequenceBarrier
* EP1 - EventProcessor 1
*
* </pre>
*/
public final class OneToOneSequencedThroughputTest extends AbstractPerfTestDisruptor
{
private static final int BUFFER_SIZE = 1024 * 64;
private static final long ITERATIONS = 1000L * 1000L * 100L;
private final ExecutorService executor = Executors.newSingleThreadExecutor(DaemonThreadFactory.INSTANCE);
private final long expectedResult = PerfTestUtil.accumulatedAddition(ITERATIONS);
///////////////////////////////////////////////////////////////////////////////////////////////
private final RingBuffer<ValueEvent> ringBuffer =
createSingleProducer(ValueEvent.EVENT_FACTORY, BUFFER_SIZE, new YieldingWaitStrategy());
private final SequenceBarrier sequenceBarrier = ringBuffer.newBarrier();
private final ValueAdditionEventHandler handler = new ValueAdditionEventHandler();
private final BatchEventProcessor<ValueEvent> batchEventProcessor =
new BatchEventProcessor<ValueEvent>(ringBuffer, sequenceBarrier, handler);
{
ringBuffer.addGatingSequences(batchEventProcessor.getSequence());
}
///////////////////////////////////////////////////////////////////////////////////////////////
@Override
protected int getRequiredProcessorCount()
{
return 2;
}
@Override
protected PerfTestContext runDisruptorPass() throws InterruptedException
{
PerfTestContext perfTestContext = new PerfTestContext();
final CountDownLatch latch = new CountDownLatch(1);
long expectedCount = batchEventProcessor.getSequence().get() + ITERATIONS;
handler.reset(latch, expectedCount);
executor.submit(batchEventProcessor);
long start = System.currentTimeMillis();
final RingBuffer<ValueEvent> rb = ringBuffer;
for (long i = 0; i < ITERATIONS; i++)
{
long next = rb.next();
rb.get(next).setValue(i);
rb.publish(next);
}
latch.await();
perfTestContext.setDisruptorOps((ITERATIONS * 1000L) / (System.currentTimeMillis() - start));
perfTestContext.setBatchData(handler.getBatchesProcessed(), ITERATIONS);
waitForEventProcessorSequence(expectedCount);
batchEventProcessor.halt();
failIfNot(expectedResult, handler.getValue());
return perfTestContext;
}
private void waitForEventProcessorSequence(long expectedCount) throws InterruptedException
{
while (batchEventProcessor.getSequence().get() != expectedCount)
{
Thread.sleep(1);
}
}
public static void main(String[] args) throws Exception
{
OneToOneSequencedThroughputTest test = new OneToOneSequencedThroughputTest();
test.testImplementations();
}
}
用例没用完整封装的Disruptor类, 而直接用了RingBuffer和BatchEventProcessor处理, 一样的处理逻辑,吞吐是千万级别。但还是那句话, 如果ValueAdditionEventHandler 耗时几十到几百ms, ring buffer再无锁再高效也没用。 所以接下来我们看下disruptor的两种消费方式。
Starting Disruptor tests
Run 0, Disruptor=32,701,111 ops/sec BatchPercent=95.16% AverageBatchSize=20
Run 1, Disruptor=36,805,299 ops/sec BatchPercent=62.61% AverageBatchSize=2
Run 2, Disruptor=69,348,127 ops/sec BatchPercent=86.93% AverageBatchSize=7
Run 3, Disruptor=69,396,252 ops/sec BatchPercent=87.21% AverageBatchSize=7
Run 4, Disruptor=67,430,883 ops/sec BatchPercent=86.10% AverageBatchSize=7
Run 5, Disruptor=69,108,500 ops/sec BatchPercent=86.49% AverageBatchSize=7
Run 6, Disruptor=66,979,236 ops/sec BatchPercent=86.42% AverageBatchSize=7
2.Disruptor消息处理方式
2.1 muti-cast 广播消息
官方入门例子给的蛮多都是这个模式, 即使用Disruptor.handleEventsWith(EventHandler… handlers) 这种, 实际会构建一个BatchEventProcessor, 而对应一个线程在跑这个EventProcessor, 这个EventProcessor把ring buffer获取到的任务在同一线程内调用多个EventHandler处理。 多次调用Disruptor.handleEventsWith()就多个BatchEventProcessor消费者线程, 不过这种模式是广播, 每个BatchEventProcessor都可以获取到广播的Event.
// Construct the Disruptor
Disruptor<LongEvent> disruptor = new Disruptor<>(factory, bufferSize, DaemonThreadFactory.INSTANCE);
// Connect the handler
disruptor.handleEventsWith(new LongEventHandler());
不是多个EventProcessor消费者去抢一个Event, 是广播。 如果EventHandler如果是耗时多基本没意义, 又得起个线程池异步处理了。
2.2 Work Pool模式
即调用Disruptor.handleEventsWithWorkerPool, 这样每个WorkHandler会是一个线程,各自处理属于自己的Event, 这样就跟平常用的线程池差不多的用法了。
public final EventHandlerGroup<T> handleEventsWithWorkerPool(final WorkHandler<T>... workHandlers)
{
return createWorkerPool(new Sequence[0], workHandlers);
}
官方也有若干个例子,百万级别吞吐, 当WorkHandler耗时较多时其实和线程池相差不大。 https://github.com/zealzeng/fabric-samples/blob/master/disruptor-demo/src/main/java/com/lmax/disruptor/workhandler/OneToThreeWorkerPoolThroughputTest.java
Starting Disruptor tests
Run 0, Disruptor=4,349,906 ops/sec BatchPercent=0.00% AverageBatchSize=-1
Run 1, Disruptor=4,591,579 ops/sec BatchPercent=0.00% AverageBatchSize=-1
Run 2, Disruptor=4,590,946 ops/sec BatchPercent=0.00% AverageBatchSize=-1
Run 3, Disruptor=4,662,222 ops/sec BatchPercent=0.00% AverageBatchSize=-1
Run 4, Disruptor=4,695,276 ops/sec BatchPercent=0.00% AverageBatchSize=-1
Run 5, Disruptor=4,690,211 ops/sec BatchPercent=0.00% AverageBatchSize=-1
Run 6, Disruptor=4,713,201 ops/sec BatchPercent=0.00% AverageBatchSize=-1
3.Disruptor使用场景
参考使用到disruptor的一些框架.
3.1 log4j2
Log4j2异步日志使用到了disruptor, 日志一般是有缓冲区, 满了才写到文件, 增量追加文件结合NIO等应该也比较快, 所以无论是EventHandler还是WorkHandler处理应该延迟比较小的, 写的文件也不多, 所以场景是比较合适的。
3.2Jstorm
在流处理中不同线程中数据交换,数据计算可能蛮多内存中计算, 流计算快进快出,disruptor应该不错的选择。
3.3百度uid-generator
部分使用ring buffer和去伪共享等思路缓存已生成的uid, 应该也部分参考了disruptor吧。
3.4小结
Luoying-framework在event loop中在使用disruptor作为work pool性能不会有什么提升, 因为服务器实现内部的业务带着数据库查询等操作, disruptor只是数据交换快, 业务慢终究还是慢。
这就好比我要寄给合同到东北,有些快递确实快两天就到了,有些慢些可能3,4天,但快递只负责把东西送收件人手上, 收件人处理合同讲不好要个一两周,快递再快也解决不了合同处理慢的问题。
不同线程间需要快速交换数据, 快速处理数据的场景我从事的领域见得不多, 可能在大数据开发中和一些中间件开发中会有用武之地。
Disruptor也没深入去看,但是源码貌似不多,如果有纰漏请大家指正, 但ring buffer, sequencer等代码值得研究。
- 原文作者:Zealot
- 原文链接:https://www.51discuss.com/posts/disruptor-introduction/
- 版权声明:本作品采用知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议进行许可,非商业转载请注明出处(作者,原文链接),商业转载请联系作者获得授权。