读书人

源处理的特征与对比

发布时间: 2012-12-22 12:05:06 作者: rapoo

流处理的特征与对比

?

——读The 8 Requirements of Real-Time Stream Processing笔记

?

?????? 这篇文章介绍了8条实时流处理所需要的要求与特征,然后对比了传统DBMS(或者是内存DBMS),Rule engine与SPE在处理流数据方面所能达到的上面8条特征中的几条。


源处理的特征与对比

?

Rule1: Keep the data moving

?????? The first requirement for a real-time stream processing systemis to process messages “in-stream”, without any requirement to store them toperform any operation or sequence of operations. Ideally the system should alsouse an active (i.e., non-polling) processing model.

?

Rule2: Query using SQL on Streams(StreamSQL)

?????? The second requirement is to support a high-level “StreamSQL” languagewith built-in extensible stream-oriented primitives and operators.?

?

Rule3: Handle stream imperfections(delayed, missing, and out-of-order data)

?????? The third requirement is to have built-in mechanisms to provideresiliency against stream “imperfections”, including missing and out-of-orderdata, which are commonly present in real-world data streams.?

?

Rule4: Generate Predictable Outcomes

?????? The fourth requirement is that a stream processing engine mustguarantee predictable and repeatable outcomes.

?

Rule5: Integrate Stored and Streaming Data

?????? The fifth requirement is to have the capability to efficientlystore, access, and modify state information, and combine it with live streamingdata. For seamless integration, the system should use a uniform language whendealing with either type of data.?

?

Rule6: Guarantee Data Safety and Availability

?????? The sixth requirement is to ensure that the applications are upand available, and the integrity of the data maintained at all times, despitefailures.?

?

Rule7: Partition and Scale Applications Automatically

?????? The seventh requirement is to have the capability to distributeprocessing across multiple processors and machines to achieve incrementalscalability. Ideally, the distribution should be automatic andtransparent.?

?

Rule8:? Process and Respond Instantaneously

?????? The eighth requirement is that a stream processing system musthave a highly-optimized, minimal-overhead execution engine to deliver real-timeresponse for high-volume applications.?

?

??????


源处理的特征与对比

?

DBMS在处理数据上是先存储后处理的,即“process-after-store” model. 所以在处理实时数据流方面天生就不是适合,尽管可以利用内存数据库来缓和效率方面的弱势, 同时其也具备trigger的特性,但所有这些都不显得不够可扩展。

?????? Rule Engine 虽然某种程度上能够处理实时的数据流,但其在Rule Language方面有欠缺, 不能够拥有类似SQL的表达能力。对数据流的处理操作有限。

?????? 只有SPE是专门为处理实时流数据定做的。有许多天生的特性,专门用来处理和操作流数据。

?????? 下面就是它们的一个对比:

?

DBMS

Rule engine

SPE

Keep the data moving

No

Yes

Yes

SQL on streams

No

No

Yes

Handle stream imperfections

Difficult

Possible

Possible

Predictable outcome

Difficult

Possible

Possible

High availability

Possible

Possible

Possible

Stored and streamed data

No

No

Yes

Distribution and scalability

Possible

Possible

Possible

Instantaneous response

Possible

Possible

Possible

?

读书人网 >编程

热点推荐