Some of the most important and most used components in Talend is the tMap and tXMLMap. Those components allow source schemas to be mapped to target schemas. The list of elements in those schemas can be quite long and complex, especially when working with XSD-based XML.
Did you know there is a fast way to import an XSD-schema in your tXMLMap?
In this blogpost you will find an (easy) example on how to do this quick and efficient.
The files used
For the example we made a XSD schema that will be used in our Talend process. It looks like this:
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="employee"> <xs:complexType> <xs:sequence> <xs:element type="xs:short" name="employee_id"/> <xs:element type="xs:string" name="first_name"/> <xs:element type="xs:string" name="last_name"/> <xs:element type="xs:string" name="email"/> <xs:element type="xs:date" name="birthdate"/> <xs:element name="address"> <xs:complexType> <xs:sequence> <xs:element type="xs:string" name="city"/> <xs:element type="xs:string" name="state"/> <xs:element type="xs:short" name="zip"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
The input of the Talend job will be a CSV file. It looks like this:
employee_id;first_name;last_name;email;birthdate;city;state;zip 1573;Steve;Walters;steve.walters@outlook.com;1985-03-12;Antwerp;Antwerp;2000 4863;Mike;Foster;mfoster@gmail.com;1993-10-16;Ghent;East-Flanders;9000 5131;Jeff;Collins;jeffcollins@outlook.com;1979-06-29;Bruges;West-Flanders;8000
The final output is XML. Below is an example:
<?xml version="1.0" encoding="UTF-8"?> <employee> <employee_id>1573</employee_id> <first_name>Steve</first_name> <last_name>Walters</last_name> <email>steve.walters@outlook.com</email> <birthdate>1985-03-12</birthdate> <address> <city>Antwerp</city> <state>Antwerp</state> <zip>2000</zip> </address> </employee>
The Talend job
The following screenshot gives the global picture of the job we want to build.
The process begins with reading the CSV file. The rows go through a tXMLMap.
The integrity of the XML will be guaranteed by using the XSD file as the schema of the output flow. This can be achieved by setting the output column to the Document type. Right-click on the column
in the mapping and pick ‘Import from File’ or ‘Import from Repository’. In our example we will import the XSD from a file. See Figure 3.
In our case
All columns can be filled in with a corresponding value. This won’t always be the case with real projects. When you leave a field empty, Talend will generate an empty XML element. That can result in a violation of the XSD. Sometimes it’s better to not generate the element at all, rather than to generate an empty one, when this is allowed within the XSD.
Talend has a built-in setting for this in the output schema. See Figure 4.
Afterwards, the mapping of columns can be done as usual.
Even though the schema is generated from the XSD, it is always a good idea to validate it. This can be done with a tXSDValidator. This component however, does not accept values of the Document type, it accepts only Strings. Therefore it has to be converted, before the validation, using a tConvertType of the Java method ‘Document.toString()’.
Set the tXSDValidator to ‘FlowMode’. Enter the new XML along with the path to the XSD file. There will be two output flows: one for valid rows, another for invalid rows.
As usual, Talend will tell why a row did not comply with the XSD file.