Using XSD files in Talend

Some of the most important and most used components in Talend is the tMap and tXMLMap. Those components allow source schemas to be mapped to target schemas. The list of elements in those schemas can be quite long and complex, especially when working with XSD-based XML.

Did you know there is a fast way to import an XSD-schema in your tXMLMap?

In this blogpost you will find an (easy) example on how to do this quick and efficient.

The files used

For the example we made a XSD schema that will be used in our Talend process. It looks like this:

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="employee">
      <xs:complexType>
         <xs:sequence>
            <xs:element type="xs:short" name="employee_id"/>
            <xs:element type="xs:string" name="first_name"/>
            <xs:element type="xs:string" name="last_name"/>
            <xs:element type="xs:string" name="email"/>
            <xs:element type="xs:date" name="birthdate"/>
            <xs:element name="address">
               <xs:complexType>
                  <xs:sequence>
                     <xs:element type="xs:string" name="city"/>
                     <xs:element type="xs:string" name="state"/>
                     <xs:element type="xs:short" name="zip"/>
                  </xs:sequence>
               </xs:complexType>
            </xs:element>
         </xs:sequence>
      </xs:complexType>
   </xs:element>
</xs:schema>

The input of the Talend job will be a CSV file. It looks like this:

employee_id;first_name;last_name;email;birthdate;city;state;zip
1573;Steve;Walters;steve.walters@outlook.com;1985-03-12;Antwerp;Antwerp;2000
4863;Mike;Foster;mfoster@gmail.com;1993-10-16;Ghent;East-Flanders;9000
5131;Jeff;Collins;jeffcollins@outlook.com;1979-06-29;Bruges;West-Flanders;8000

The final output is XML. Below is an example:

<?xml version="1.0" encoding="UTF-8"?>
<employee>
   <employee_id>1573</employee_id>
   <first_name>Steve</first_name>
   <last_name>Walters</last_name>
   <email>steve.walters@outlook.com</email>
   <birthdate>1985-03-12</birthdate>
   <address>
      <city>Antwerp</city>
      <state>Antwerp</state>
      <zip>2000</zip>
   </address>
</employee>

The Talend job

The following screenshot gives the global picture of the job we want to build.

Overview of the talend job
Figure 1: Overview of the Talend job

The process begins with reading the CSV file. The rows go through a tXMLMap.

Overiew of the tXMLMap
Figure 2: Overview of the tXMLMap

The integrity of the XML will be guaranteed by using the XSD file as the schema of the output flow. This can be achieved by setting the output column to the Document type. Right-click on the column
in the mapping and pick ‘Import from File’ or ‘Import from Repository’. In our example we will import the XSD from a file. See Figure 3.

Importing an XML or XSD in a tXMLMap
Figure 3: Importing an XML or XSD in a tXMLMap

In our case

All columns can be filled in with a corresponding value. This won’t always be the case with real projects. When you leave a field empty, Talend will generate an empty XML element. That can result in a violation of the XSD. Sometimes it’s better to not generate the element at all, rather than to generate an empty one, when this is allowed within the XSD.

Talend has a built-in setting for this in the output schema. See Figure 4.

The settings of the output of a tXMLMap
Figure 4: The settings of the output of a tXMLMap

Afterwards, the mapping of columns can be done as usual.

Even though the schema is generated from the XSD, it is always a good idea to validate it. This can be done with a tXSDValidator. This component however, does not accept values of the Document type, it accepts only Strings. Therefore it has to be converted, before the validation, using a tConvertType of the Java method ‘Document.toString()’.

Set the tXSDValidator to ‘FlowMode’. Enter the new XML along with the path to the XSD file. There will be two output flows: one for valid rows, another for invalid rows.

The settings of a tXSDValidator
Figure 5: The settings of a tXSDValidator

As usual, Talend will tell why a row did not comply with the XSD file.

Geef een reactie

Je e-mailadres zal niet getoond worden. Verplichte velden zijn gemarkeerd met *