Talend Open Studio: How To Create A Custom Component (2/4)

Call to action: your first component!

In the first part of this tutorial the basics of component creation were handled. If you haven’t read up on it yet, then I suggest you do before reading this second part. In this second part you’ll create your very own first component. We’ll start off with a very basic version of a component and then, in the third part of this tutorial, we’ll add some extra functionality to it. 

As stated before, I prefer to create custom components manually so throughout this tutorial we’ll follow that method. The component that we’ll create will be called tFirstComponent. First of all, you have to go to your Custom_Components folder. Within this folder you have to create a folder with the same name as your component. Go ahead and create that folder. Switch over to Talend and open the Component Designer perspective. If you select the COMPONENT_PROJECT root folder and press F5, the perspective as a whole is refreshed. The latest changes to the Custom_Components folder are then visible. After refreshing the perspective you’ll see the folder appear as a subfolder underneath the root folder.

Adding the XML descriptor file

We’ll now start adding files to the newly created tFirstComponent folder one by one. These files have already been discussed in the “location and associated files” section of the first part of this tutorial. We’ll start off with the XML descriptor file. In general, there are only a few parts which are mandatory within the XML structure of this file. There’s no information available on this matter. However, thanks to PowerUp, a standard structure can be defined. Before defining the actual structure, we have to create the file in the tFirstComponent folder. An XML descriptor file has the following naming convention: NameOfTheComponent_java.xml. Do not use a different convention! It’s absolutely imperative to use this one and only this one. It allows Talend to recognize this file as an XML descriptor file after all. Go to your tFirstComponent folder and create the tFirstComponent_java.xml file.

Switch over to Talend again and hit refresh. The XML descriptor file is now visible under the tFirstComponent folder. Double-click on the file and it’ll open in the editor of the Component Designer perspective. Paste the following structure in the file:

<?xml version="1.0" encoding="UTF-8"?>
<COMPONENT>
<HEADER
AUTHOR="Jeremy Boterdael"
COMPATIBILITY="ALL"
PLATEFORM="ALL"
RELEASE_DATE="20160920"
SERIAL=""
STARTABLE="true"
STATUS="BETA"
VERSION="0.1">
<SIGNATURE/>
</HEADER>
<FAMILIES>
<FAMILY>IntoData Tutorial</FAMILY>
</FAMILIES>
<DOCUMENTATION>
<URL>https://intodata.eu/talend-open-studio-how-to-create-a-custom-component-part-1/</URL>
</DOCUMENTATION>
<CONNECTORS>
<CONNECTOR CTYPE="FLOW"/>
</CONNECTORS>
<PARAMETERS>
</PARAMETERS>
<CODEGENERATION/>
<RETURNS>
</RETURNS>
</COMPONENT>

Let’s discuss the different tags which are present in this structure. The first tag we see here is the HEADER tag. All of the parameters within this tag are mandatory, however you can give most of them a random value of your choice. It won’t affect your component in any way, they just have to be filled in. There is one parameter, called STARTABLE, which cannot just get a random value. This parameter does affect the component directly. Whether STARTABLE is supposed to be true or false depends on the role of the component you’re creating. As you know, some components are classified as input components. These components fetch data from files for example and then pass the data to the next component via a connection. Input components are always located at the start or beginning of a subjob. If you connect the dots, you’ll come to a conclusion: these kind of components require the STARTABLE parameter to get the value true. A component that receives data, transforms it and then sends it to the next component or a component which can be classified as an output component does not require the value true but the value false.

On a side note, you might notice one parameter is called PLATEFORM. I have no idea why there’s a grammatical error in this word, but you have to write it this way. PLATFORM won’t work at all so don’t try and adjust it.

The next tag is called FAMILIES. The inner tag is called FAMILY and that’s the one I’m going to talk about. Basically, the use of this tag is very easy to explain. When you go to the Integration perspective and create a job there or open an existing job, you’ll see the Palette appear on the right side. This specific tag is related to the Palette itself. In general, you’ll see the Palette is divided into certain categories or “families” such as Big Data, Business Intelligence, Business, Cloud, … The FAMILY tag allows you to specify your very own category or “family”. Later on, we’ll push our component to the Palette and your own specified family will then become visible.

There’s also a DOCUMENTATION tag present. The use of this tag should be quite clear. It allows you to add documentation or useful information by specifying a URL between the URL tags. As a point of reference I’ve added a link to the first part of this tutorial. 

Next up is the CONNECTOR tag. This tag allows you to specify the inputs and outputs of your component. It indicates how components interact with each other depending on their role. As you can see, we specified a CTYPE parameter and given it the value “FLOW”. This is the most common connector type. It matches with the usual “row<number>” connections between components. Additional information can of course be added to this tag, but we’ll get into that later.

The PARAMETERS tag allows you to specify input parameters for your component. These parameters are used to pass information to the component itself based on the needs of the user. In other words, they’re the options most of the standard Talend components provide. You can choose which parameters apply to your own component. As an example you can take a look at the parameters of the tFileInputDelimited component:

The CODEGENERATION tag will not be discussed because it won’t be part of this tutorial. The final tag is called RETURNS. The inner tag, called RETURN, allows you to specify the output of the component, which will be used afterwards. The most commonly used return parameter is a global variable called NB_LINE. This returns the amount of records that has been handled by the component. I’ve provided an example of the usage below.

<RETURN NAME="NB_LINE" TYPE="id_Integer" AVAILABILITY="AFTER"/>

Adding the message property file

The XML descriptor file has been completely discussed. Now, let me explain a very important step in the process of creating a component. If you switch over to the Integration perspective, you’ll notice that your component isn’t available yet in the Palette. In order for that to happen you have to push your component to the Palette. To push the component to the Palette you have to right-click the subfolder, which is present in the Component Designer perspective called tFirstComponent, and then click on “Push Components to Palette”. Remember to always hit refresh whenever you add a file to the subfolder before you push your component to the Palette. Right now pushing your component to the Palette won’t do anything at all. It’ll say that the component has been published but it won’t be visible just yet. The reason for that is the fact that you need another file to actually make it appear. That file is the next one in line: the message property file.

This file is absolutely mandatory. Go ahead and create another file in the tFirstComponent subfolder called tFirstComponent_messages.properties. After that switch over to Talend and open the Component Designer perspective. The first thing you have to do is refresh the tFirstComponent subfolder. You’ll see that the newly created file’s been added. Now open the file and paste the following text in it:

LONG_NAME=My very own first component

You can add comments in this file using the “#” mark. What this line provides is a label for the LONG_NAME parameter of the component. It’s that simple. Later on we’ll see an example of this sort of labeling. This is just a standard line, which is always present in this file.

It’s now possible for you to push the component to the Palette so go ahead and do that. If you haven’t created an empty job yet, then please do it right now. Switch over to the Integration perspective. If you’ve followed this tutorial very carefully and you’ve pushed your component to the Palette, then you should see the family you’ve previously defined appear in the Palette. For me that would be “IntoData Tutorial”, for you it might be something else. If you click on this family, you should see your component in the shape of an element belonging to it.

Adding an icon

So there you have it! You can see your component in the Palette! However, do not drag your component to the project canvas and use it. I cannot stress this enough. If you would do that, a null pointer exception will pop up and ruin your day. Why exactly is this null pointer exception taking place? Well, because we’re missing another element: an icon for the component. Remember, you need an icon with a size of 32*32 and it has to be in a PNG format. Go ahead and make an icon or download one, whichever you prefer. Once you’ve done that, you have to rename the icon to tFirstProject_icon32.png and put it in the tFirstComponent folder. After you’ve done that, all you have to do is hit refresh and push the component to the Palette once more. Switch over to Integration perspective, check your component again and you’ll notice that the icon has changed. Drag the component to the job canvas and then run the job. Of course, as stated before, nothing happens but there’s no exception either. As a reference, my component looks like this when dragged to the canvas:

Adding the JET files

The basic required files are all set up. The component is visible in the Palette and it can be used in a job. I’d say it’s about time to write some code! If you’ve followed this tutorial from beginning to end so far, you’ll notice a set of important files hasn’t been discussed yet: the JET files. Remember that I’ve told you that a component consists of sections, namely the begin, main and end section. Each section is represented by a JET file. Let’s start creating these files. What I want you to do is: create three files and call them tFirstComponent_begin.javajet, tFirstComponent_main.javajet and tFirstComponent_end.javajet. Switch over to Talend, open the Component Designer perspective and hit refresh. The final setup of your folder should like this:

Okay, let’s proceed. Open the tFirstComponent_begin.javajet file. In each of the JET files classes are imported at the beginning of the file. These classes vary and are entirely based on your needs. If you need a certain class, you just import it there. In this tutorial we’re going to use some basic import statements. These statements are present when you use the Component Designer wizard to create these files so we can state that they’re default. Go ahead and paste the code below in the file:

<%@ jet
imports="
org.talend.core.model.process.INode
org.talend.core.model.process.ElementParameterParser
org.talend.core.model.metadata.IMetadataTable
org.talend.core.model.metadata.IMetadataColumn
org.talend.core.model.process.IConnection
org.talend.core.model.process.IConnectionCategory
org.talend.designer.codegen.config.CodeGeneratorArgument
org.talend.core.model.metadata.types.JavaTypesManager
org.talend.core.model.metadata.types.JavaType
"
%>

Please note that you’ll need these import statements in every JET file. Of course it’s just a reference and they will always vary in each file separately depending on your needs. So go ahead and open the tFirstComponent_main.javajet and tFirstComponent_end.javajet as well and paste the above code in those files.

The example I’ve given at the beginning of the tutorial, to show you guys how the sections work using the tJavaFlex component, will be used here again. I’d like you to open the tFirstComponent_begin.javajet file, if it isn’t open already, and paste the following code below the import statements (after the closing tag):

System.out.println(“First section has been executed --> Start”);

This is just general java output code. The point is to print something on the console when the component is used in a certain job and that job gets executed. Hit refresh and then push the component to the Palette. Switch over to the Integration perspective. If you’ve already created a job, you can drag the component to the job canvas and execute the job. If the component is on the canvas already then you’ll get a pop up saying the component has to be reloaded when executing the job. You should be able to see the above message appear on the console.

I’d like to note that I’ve had quite a few problems in the beginning with this exact step. For some reason the message was never displayed on the console, meaning that the changes weren’t pushed to the Palette for some reason. Possible solutions for this are:

  • If the component was already present in a certain job, delete the component and then drag it back to the job canvas. After that execute the job again and it should work.
  • If the above doesn’t work, then you have to add an extra step to the solution. Basically all you have to do is go back to the Component Designer first, hit refresh again and push the component to the Palette. Then follow the above steps.
  • If these steps don’t solve your problem either, you can add one more step to the process. Just restart Talend first and then follow all of the previously described steps.

To complete the example, I’d like you to open the tFirstComponent_main.javajet and paste the following code below the import statements:

System.out.println("Second section has been executed --> Main");

After that open the tFirstComponent_end.javajet and paste the following code below the import statements:

System.out.println("Third section has been executed --> End");

Then follow the usual two steps of refreshing and pushing the component to the Palette. Go to your job and execute it. The output should be as follows:

I’d like to expand the example with a loop to prove a certain point. In this loop we’ll just run a certain text 5 times. I’d like you to go to the tFirstComponent_begin.javajet file and paste the following code below the previously written code:

for (int counter=1;counter<=5;counter++)

Open the tFirstComponent_main.javajet file and change the previously written code to:

System.out.println("The main section has been executed " + counter + " time(s)");

Open the tFirstComponent_end.javajet file and add a closing tag (“}”) there:

}
System.out.println("Third section has been executed --> End");

Perform the usual steps in the Component Designer, switch over to the Integration perspective and execute your job. You should get the following output:

The reason why I’ve asked you to add this loop was to point out something very interesting. I hope you noticed we opened the loop in the begin section, then added the output code in the main section and finally closed the loop in the end section. The point I’m trying to make here is that these sections are executed as one big block, one after another. Behind the scenes code snippets are joined together in that block. This allows for a certain amount of flexibility when writing code in these files.

Let’s take a look at an interesting feature provided by Talend and change the code once again. When you use a certain component multiple times in the same job, there has to be a way to differentiate which variables in the code belong to which component. If that wouldn’t be the case, you’d get errors all over your Talend job stating that variables are the same in multiple elements and thus creating contradictions. The solution to this problem is UNIQUE NAME. This element adds a unique name to variables. This unique name is given to an instance of a component. So, if you have two of the same components in the same job these components will each have a different unique name. So always add the unique name when creating a local variable. I’ll show you exactly how to do that. I can already give away that this is really easy.

First of all, open the tFirstComponent_begin.javajet file and add the following piece of template code between the import statements and the written code below it:

—–import statements—-
<%
CodeGeneratorArgument codeGenArgument = (CodeGeneratorArgument) argument;
INode node = (INode)codeGenArgument.getArgument();
String cid = node.getUniqueName();
%>
—-previously written code—-

When you use the Component Designer wizard, this piece of template code is added to every JET file automatically. Of course we’ve decided to do this manually so you have to add it yourself. I’m not going to get into too much detail. What you need to know is that an instance in a job is called a node. With the getUniqueName() method you can retrieve the unique name or the instance of that specific node. The unique name is then stored in the variable called cid (Component ID).

Let’s adjust the code we’ve previously written too. Change the code to the following:

System.out.println("First section has been executed --> Start");
for (int counter_<%=cid %>=1;counter_<%=cid %><=5;counter_<%=cid %>++)
{

With the <%= %> we’re passing the values from the template code to the java output code. Now go to the tFirstComponent_main.javajet file and add the above template code to it. Then change the previously written code to the following:

System.out.println("The main section has been executed " + counter_<%=cid %> + " time(s)");

Follow the usual steps in the Component Designer then go ahead and execute your job again. Everything should still work. I’d like you to take a look at the code of the job itself by selecting the “Code” tab, which is right next to the “Designer” tab in the job window. Look for the defined variable called counter. You’ll notice that the addition of “_<%=cid %>” in the code has led to the transformation of the cid variable into a string representing the unique name of the component in the java output code of the job. Use this best practice and necessity at all times!

In the third part of this tutorial we’ll add some more functionality to the component. This will include parameters.

 

Download the full tutorial here

 

Used sources:

Geef een reactie

Je e-mailadres zal niet getoond worden. Verplichte velden zijn gemarkeerd met *