in ignite Apache Camel ~ read.
A universal Streamer for Apache Ignite based on Apache Camel

A universal Streamer for Apache Ignite based on Apache Camel

Apache Ignite has the concept of Data Streamers: components to ingest fast data in a streaming fashion into an Ignite cache from a variety of protocols, technologies or platforms, such as JMS, MQTT, Twitter, Flume, Kafka, etc.

However, with Apache Ignite 1.5.0 we released the jack of all trades: an Apache Camel streamer. In case you are not familiar with Camel, quick skip below for a little introduction, and then come back.

NOTE: I have also committed an camel-ignite component (woohoo!) to the Camel codebase, but in this post I'll be talking about the Ignite Camel streamer. I'll cover the camel-ignite component in another post.

Ignite Camel Streamer

This streamer allows you to consume from any Camel endpoint straight into Ignite. Thus you can ingest entries straight into an Ignite cache based on:

  • Calls received on a Web Service (SOAP or REST), by extracting the body or headers.
  • Listening on a TCP or UDP channel for messages.
  • The content of files received via FTP or written to the local filesystem.
  • Email messages received via POP3 or IMAP.
  • A MongoDB tailable cursor.
  • An AWS SQS queue.
  • And many others.

This is what I call a direct ingestion from an endpoint.

Moreover, you can also create a Camel route that performs more complex processing on incoming messages, e.g. transformations, validations, splitting, aggregating, idempotency, resequencing, enrichment, etc. and ingest only the result into the Ignite cache. This is what I call mediated ingestion.

A brief example

Let's assume we want to consume from an HTTP endpoint serviced by Jetty. We are going to receive HTTP POSTs with temperature readings from weather stations.

The payload's Content-Type is text/plain and it contains the reading. The station ID is passed in an HTTP request header called X-StationId:

// Start Apache Ignite.
Ignite ignite = Ignition.start();

// Create an streamer pipe which ingests into the 'mycache' cache.
IgniteDataStreamer<String, String> pipe = ignite.dataStreamer("mycache");

// Create a Camel streamer and connect it.
CamelStreamer<String, String> streamer = new CamelStreamer<>();  
streamer.setIgnite(ignite);  
streamer.setStreamer(pipe);

// This endpoint starts a Jetty server and consumes from all network interfaces on port 8080 and context path /ignite.
streamer.setEndpointUri("jetty:http://0.0.0.0:8080/ignite?httpMethodRestrict=POST");

// This is the tuple extractor. We'll assume each message contains only one tuple.
// If your message contains multiple tuples, use a StreamMultipleTupleExtractor.
// The Tuple Extractor receives the Camel Exchange and returns a Map.Entry<?,?> with the key and value.
streamer.setSingleTupleExtractor(new StreamSingleTupleExtractor<Exchange, String, String>() {  
    @Override public Map.Entry<String, String> extract(Exchange exchange) {
        String stationId = exchange.getIn().getHeader("X-StationId", String.class);
        String temperature = exchange.getIn().getBody(String.class);
        return new GridMapEntry<>(stationId, temperature);
    }
});

// Start the streamer.
streamer.start();  

By default, the response sent back to the caller (if it is a synchronous endpoint) is simply an echo of the original request. If you want to customise the response, set a Camel Processor as a responseProcessor:

streamer.setResponseProcessor(new Processor() {  
    @Override public void process(Exchange exchange) throws Exception {
        exchange.getOut().setHeader(Exchange.HTTP_RESPONSE_CODE, 200);
        exchange.getOut().setBody("OK");
    }
});

You can also pass in a custom Camel Context to customise further stuff in Camel. In fact, that is exactly how you implement a mediated ingestion: by creating a route that consumes from the endpoint and sends the exchange to a direct: endpoint, from where the CamelStreamer will be consuming.

As I said before, you'll want to use this approach when you need more sophistication. In the following example, we'll be consuming from the same Jetty endpoint as before, but we'll be receiving JSON in this case. We will unmarshal it into an object and we'll validate it with Bean Validation.

If all goes OK, we'll dispatch it to the direct:ignite.ingest endpoint, where our Ignite streamer will be consuming from (we need to set this URI in our streamer). Direct endpoints in Camel are just in-memory transfers of messages.

// Create a CamelContext with a custom route that will:
//  (1) consume from our Jetty endpoint.
//  (2) transform incoming JSON into a Java object with Jackson.
//  (3) uses JSR 303 Bean Validation to validate the object.
//  (4) dispatches to the direct:ignite.ingest endpoint, where the streamer is consuming from.
CamelContext context = new DefaultCamelContext();  
context.addRoutes(new RouteBuilder() {  
    @Override
    public void configure() throws Exception {
        from("jetty:http://0.0.0.0:8080/ignite?httpMethodRestrict=POST")
            .unmarshal().json(JsonLibrary.Jackson)
            .to("bean-validator:validate")
            .to("direct:ignite.ingest");
    }
});

// Remember our Streamer is now consuming from the Direct endpoint above.
streamer.setEndpointUri("direct:ignite.ingest");  

Feel free to contact the Ignite team at Gitter Gitter badge or the Mailing Lists in case you have questions.

About Apache Camel
(skip if you're already familiar)

Apache Camel is an enterprise integration framework that revolves around the idea of the well-known Enterprise Integration Patterns popularised by Gregor Hohpe and Bobby Woolf, such as channels, pipes, filters, splitters, aggregators, routers, resequencers, etc. which you piece with one another like a Lego puzzle to create integration routes that connect systems together.

To date, there are over 200 components for Camel, many of which are adapters for different technologies like JMS, SOAP, HTTP, Files, FTP, POP3, SMTP, SSH; including cloud services like Amazon Web Services, Google Compute Engine, Salesforce; social networks like Twitter, Facebook; and even new generation databases like MongoDB, Cassandra; and data processing technologies like Hadoop (HDFS, HBase) and Spark.

Camel runs in any environment: standalone Java, OSGi, Servlet containers, Spring Boot, JEE application servers, etc. and it's fully modular, so you only deploy the components you'll actually be using and nothing else.

Check out What is Camel? for more information.

  • Email
  • Facebook
  • Twitter
  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket
comments powered by Disqus