Welcome! This is part one of a three part series all about Talend’s own data mapping tool: Talend Data Mapper. Subjects like input and output structures, using EDI documents, conditional statements and filtering will be available over the series to help develop insight into the mapping madness.
We are also sharing posts and previews on channels like Instagram, TikTok, LinkedIn and YouTube. Feel free to join the communities over there and keep up to date on Talend, SQL, Data Integration, Analytics and much more!
Why is there a Data Mapper?
The Talend Data Mapper is the low-code solution for matching and transforming hierarchic structured data.
Hierarchical data structures like JSON and XML are commonly used but can become quite complex with many loops and arrays existing both on the same level and deeply nested. Accessing these structures requires looping during parsing, which quickly turns into a technically complex product to develop and maintain.
The Talend Data Mapper is a drag and drop solution to match fields in any depth of a list or array. But beneath the surface are development functions that help with different parts of the Data Integration process. Lookups in a database, conditionals and complex java code can all be used in the tool.
The Data Mapper is a single data integration component to be used in a Job in Talend Studio, which makes it a simple to understand solution for the client and an easier to maintain job for the developer.
Data Mapping
One mistake during data mapping can lead to a replicated error, ripple through your data process and, ultimately, lead to low-quality data for analysis or costly misinterpretation.
Data Mapping is crucial for data integration, data migration, data warehouse automation, automated data extraction and other data management processes used in all layers of business. It is the bridge between systems, data models or sources, and makes data usable at the destination.
While data mapping has been common business for some time, the amount of data (and its sources) has been increasing, making the data mapping process more complex. This requires tools to that can be automated and customized for large data projects.
Talend Data Mapper Applied
The data mapper has many functions, utilities and tricks up its sleeve. Getting to know some of these may help you in deciding if the Talend Data Mapper can help you in the future.
Filtering loops
Using Talend Data Mapper makes data mapping for complex hierarchical structures as easy as dragging and dropping elements but might require more specific choices being made depending on certain conditions. That is why loop functions have a filter argument to specify expressions that run each instance of the loop. Each time the loop runs, the expression is evaluated. If it returns True, the instance is included in the output. If it returns False it is not included.
Filtering enables more control over which data gets passed on to the output and multiple filters can be applied on each loop level. Filters can be as simple as a constant or as complex as filtering on a validation rule.
Simple Filtering on Loop
In this example we create a simple filter on the main loop, to make sure the mapping only processes incoming messages that are from the specific RetailFormula ‘STORE’.
-
- When highlighting the root node, we see the main loop being a SimpleLoop function on the root node of the input data. This means that for each instance of the input root node, the output root node and all its child nodes run once.
- Below the Input Map Element argument in the SimpleLoop function is the Filter argument. From the functions panel, drag the Equal function onto the Filter argument.
- We now want to specify the Equal function with two arguments which it needs to evaluate. For the first value we drag the RetailFormula element from the input onto the first argument.
- Again, from the functions panel, drag in the constant function onto the second argument.
- Double-Click the constant and specify its value as a string ‘STORE’.
This results in a filtered output for this complete mapping since it now filters on the root node of the output structure. For each instance of the incoming data, the mapping will evaluate if the value of the element RetailFormula is equal to ‘STORE’. Only if the result is true will the mapping be executed.
Filtering on a Validation Rule
As a more complex example of filtering loops, we will set up a validation rule and apply this rule to a filter. There are three functions part of setting up a filter on the validation rule: ValidateGroup, Contains and IsValid. They will be explained during use in the example.
- First, to prepare the Validation Rule, we assign the looping input element to a Validation Group using the ValidateGroup function. From the functions panel, drag the ValidateGroup function onto the looping input element.
- Specify the Message and the Number describing the validation issue.
- For the data argument we drag in the DeliveryMethod element from the input structure.
- Next, select that same DeliveryMethod element in the input structure and open the Validate tab.
- From the function panel we drag in the Contains function and add the DeliveryMethod to the InputValue argument.
- Double-Click the Contains function and specify the contained string as ‘HomeDelivery’.
For every loop instance the filter will evaluate if the mapped input value is valid according to its validation rule and return a Boolean, which then either includes the instance or excludes the instance from the output.
Aggregate on Loop
When building a mapping in the Talend Data Mapper, connecting looping elements from the input to looping elements in the output is a straightforward drag-and-drop. For each element the data mapper will define a loop expression to specify how the incoming element is handled. But when mapping a looping element from the input to a non-looping element in the output, there is no loop handling available and you need to define an aggregate function to handle the incoming loop.
The Talend Data Mapper has several functions available to aggregate on incoming loops, with the four most used being: AgConcat, AgConcateFirstPresentValue, AgCount and AgSum.
AgConcat
The AgConcat function Aggregates all loop iterations and Concatenates element values.
This function requires defining a loop expression to handle the incoming looping element. It will then loop according to the loop expression and concatenate all elements. Within the loop expression the filter commonly used to define which elements from the incoming loop are written to the output, for example filtering on indexNumber or a key for a key/value pair.
AgConcatFirstPresentValue
The AgConcatFirstPresentValue function Aggregates all loop iterations and Returns the First Present Value.
It evaluates each argument in order, while also allowing for the use of the filter option, and then returns only the first value that is present and not blank. If it does not find any present or non-blank values, it will return nothing and if the returned value is another loop itself it will return just the first value of the loop.
AgCount
The AgCount function Aggregates and Counts all loop iterations and returns this number.
Just like other Aggregate functions the AgCount function requires defining a loop expression on how to handle the incoming loop. It then counts all iterations of the loop and returns the number as a single integer.
AgSum
The AgSum function Aggregates all loop iterations and calculates the sum of all elements.
Within the loop expression you can specify which elements to sum with a filter. It will then sum all numbers and return a value, which can be cast to the specified data type for the output element.
Using Routines & Beans
While the Talend Platform and the Talend Data Mapper have many built-in functions, sometimes custom Java code is needed to perform more intricate or repeated operations. For reusability you can factorize your code in a routine or a bean, which are classes that encapsulate one or more objects. These can then be called within a mapping in the Data Mapper, specifying the class name and method name you want to use.
While in the mapping perspective, select the output field where the routine is going to be used. From the functions panel, drag the ‘Java’ function to the ‘Value’ tab. It will ask you to specify the class and method.
In the Class Name field enter ‘routines’ followed by your class name, which in the example is ‘routines.DataLookUp’. For Method Name you specify the method that should be used here, for example ‘findCountryCode’.
If your function needs any parameters specified you can drag a ‘constant’ from the function panel onto the Java function, which will then pass it on as a parameter. You can also pass on fields from the Input Structure as parameters by dragging them on to the Java function.
With the Java expression in place, whenever you run a job with this mapping, it will call the routine and return the result in the output of the Data Mapper component in your job. And any changes you make in your routine will be directly applied in your mapping!
More on the Talend Data Mapper is coming in the next blog! There is even more educational content available in other posts in our blog. Also follow us on Instagram, TikTok, LinkedIn, YouTube for more information and don’t hesitate to contact us!