Tokenize
Since Camel 2.0
The tokenizer language is a built-in language in camel-core
, which is
most often used with the Split EIP
to split a message using a token-based strategy.
The tokenizer language is intended to tokenize text documents using a specified delimiter pattern. It can also be used to tokenize XML documents with some limited capability. For a truly XML-aware tokenization, the use of the XML Tokenize language is recommended as it offers a faster, more efficient tokenization specifically for XML documents.
Tokenize Options
The Tokenize language supports 12 options, which are listed below.
Name | Default | Java Type | Description |
---|---|---|---|
|
Required The (start) token to use as tokenizer, for example you can use the new line token. You can use simple language as the token to support dynamic tokens. |
||
|
The end token to use as tokenizer if using start/end token pairs. You can use simple language as the token to support dynamic tokens. |
||
|
To inherit namespaces from a root/parent tag name when using XML You can use simple language as the tag name to support dynamic names. |
||
|
|
If the token is a regular expression pattern. The default value is false. |
|
|
|
Whether the input is XML messages. This option must be set to true if working with XML payloads. |
|
|
|
Whether to include the tokens in the parts when using pairs. When including tokens then the endToken property must also be configured (to use pair mode). The default value is false. |
|
|
To group N parts together, for example to split big files into chunks of 1000 lines. You can use simple language as the group to support dynamic group sizes. |
||
|
Sets the delimiter to use when grouping. If this has not been set then token will be used as the delimiter. |
||
|
|
To skip the very first element. |
|
|
Source to use, instead of message body. You can prefix with variable:, header:, or property: to specify kind of source. Otherwise, the source is assumed to be a variable. Use empty or null to use default source, which is the message body. |
||
|
Sets the class of the result type (type from output). |
||
|
|
Whether to trim the value to remove leading and trailing whitespaces and line breaks. |
Example
The following example shows how to take a request from the direct:a endpoint then split it into pieces using an Expression, then forward each piece to direct:b:
<route>
<from uri="direct:a"/>
<split>
<tokenize token="\n"/>
<to uri="direct:b"/>
</split>
</route>
And in Java DSL:
from("direct:a")
.split(body().tokenize("\n"))
.to("direct:b");
See Also
For more examples see Split EIP.