Do you need to retrieve and process huge amounts of data, e.g. Videos, images and documents from the source to the target?
We have a potential solution called Streaming in DW to help you manage massive amounts of data, which is explained in this blog.
What is Streaming?
The term "streaming" refers to continuous, never-ending flows of data that can be used without the need for downloading.
These streams improve efficiency and scalability since there is no need to load a huge amount of data into memory before a service execution. They can also accelerate the processing of large documents without overburdening the memory. One of the features of DataWeave is that it supports what's known as "end-to-end streaming" in Mule applications.
Streaming in DW
Instead of scanning the entire document to index it, DW processes the data as it arrives during Streaming. When using the deferred option, the Streaming DW can send the streamed output data directly to the next message processer. This behaviour allows DataWeave in Mule to process data more quickly, utilising fewer resources/memory.
To perform to enable streaming, these are the configuration properties that we need:
- Streaming property: To read data from source as streams.
- Deferred property: Used to pass the Output stream to the next message processor.
- Also streaming can be enabled by:
- Setting OutputMimeType with the required data format and streaming is set to true. Below is an example scenario where DW streaming is used with the HTTP listener.
As the data is huge, streaming is enabled to avoid memory overloading, and processing gets done more quickly. In this scenario, we have processed one file at a time.
In the current example, a third party system stores the content and is exposed as the rest API.
1. An HTTP requestor should be used from the system API to invoke the rest API, get the content, and be sent in the response without using any transform activity.
Note: If any translation is performed on the response received, the data will be stored in the memory.
Below are the steps to be followed to enable streaming.
- Streaming property is added at the HTTP requestor of the target system from which the attachment information is fetched in the system API.
<http:request method="GET" doc:name="Request TARGET API To Get Content " doc:id="918128c2-fe2c-4523-be70-860da6941aa8" config-ref="HTTPS_Request_configuration" url="${getcontenturl}" outputMimeType="application/json; streaming=true">
2. From the process API, the above system API has to be called to get the content information, so in this step also, we will enable streaming to continue through the pipeline
- The streaming property is added at the HTTP requestor of the target system, from which the attachment information is fetched in the system API.
<http:request method="GET" doc:name="GET Content Details" doc:id="f58b7c40-0444-4a72-bbda-a46c4438a708" config-ref="HTTP_Request_configuration" path="/getcontent" outputMimeType="application/json; streaming=true"/>
- If the data has to be passed to the next processor, the deferred property must be set in the transform activity and access the payload.
In this way, DW Streaming is utilised in this scenario to deliver better performance. If multiple attachments have to be processed in this scenario, we have to use concurrency and streaming together. Also, this streaming is supported for the data formats JSON, XML and CSV.
The limitations of DW Streaming
There are a few limitations of the DataWeave streaming solution. For example:
- If the property deferred=true is used in the transform activity and during the end-to-end processing, if any error occurs in that activity, the error will not be thrown in that transform, causing ambiguity.
- During Streaming, it accesses each unit of the stream sequentially, due to which random access to the object/document is not supported.
If you would like to find out more about how we can help you leverage the RAML and MuleSoft, give us a call or email us at salesforce@coforge.com.
Related reads.
About Coforge.
We are a global digital services and solutions provider, who leverage emerging technologies and deep domain expertise to deliver real-world business impact for our clients. A focus on very select industries, a detailed understanding of the underlying processes of those industries, and partnerships with leading platforms provide us with a distinct perspective. We lead with our product engineering approach and leverage Cloud, Data, Integration, and Automation technologies to transform client businesses into intelligent, high-growth enterprises. Our proprietary platforms power critical business processes across our core verticals. We are located in 21 countries with 26 delivery centers across nine countries.