Talend: tips and tricks part 4

Welcome to our next batch of Data Tips & Tricks. As official partner of Talend we have more information to share on using the Data Integration and Data Management solutions in the Talend platform. That is why we are continuing our educational content with more tips for aspiring data engineers and IT professionals.

We are also sharing all our tips and tricks on channels like Instagram, TikTok, LinkedIn and YouTube. Feel free to join the communities over there and keep up to date on Talend, SQL, Data Integration, Analytics and much more!

In this post:

      • Talend: Temporary Files vs Hash Storage
      • Talend/Java: Using Context Variables in Routines
      • Talend/Java: Checking for Empty Files in a flow
      • Talend: Quickly Remapping in tMap components
      • Talend/Cloud: Connection Parameters in Talend Cloud

Talend: Temporary Files vs Hash Storage

When developing a job on the Talend platform you create a flow of data from start to finish. Connectors between components are called rows and there is always a ‘main’ row. This is the ‘main’ flow of data from component to component. Which data is passed through depends on the schema of the sending and receiving components.

Sometimes you pass through a limited set of data but might need to access all the starting rows later in the job. Especially for batch jobs or jobs with multiple processes and checks it can be useful to store and pass data to different parts. There are two easy solutions for this problem.

Temporary Files

When creating jobs for larger batches of data you can use a temporary file to load the data into, as to not overload the memory of the machine that will be running the job. Talend has a component called ‘tCreateTemporaryFile’ which creates a real file on the system to hold the data.

You can load the data into the file with any of the tFileOutput components and then reload the data through the file into other parts of the flow. While creating and loading the Temporary File might involve a few more steps and may sometimes be slower, it is a stable solution for use on large data sets.

Hash Storage

If you want to pass data to a different stage of your job besides using the main flow you can use the tHashOutput and tHashInput components that Talend provides. These components are considered technical because they load into the memory of the machine and can create issues when loading too much data. You can enable them in the project settings.

If you want to pass data to a different stage of your job besides using the main flow you can use the tHashOutput and tHashInput components that Talend provides. These components are considered technical because they load into the memory of the machine and can create issues when loading too much data. You can enable them in the project settings.

 

Talend / Java: Using Context Variables in Routines

When you want to create reusable Java code in Talend you can create what is called a Routine. This is a Java class in Talend that you can call from almost anywhere to use in your data integration flows. When creating factorized code in your Routine you might need to use Context Variables to have even more execution flexibility.

When developing your User Routine you can parameterize your functions to pass in Context Variables when you call it from a job. Instead of hard coding the parameters when calling the routine, add in the context variables. Keep in mind that the routine needs certain data types for the parameters so you might need to do some type casting when passing in your context variables.

Talend / Java: Checking for Empty Files in a Flow

It can often be good practice (and save resources) to only process files which contain data. Checking if a file is empty without starting the complete flow can be much more efficient. This tip shows two similar ways using a Talend component with some Java code or just Java code to only start processing if a file is not empty.

The Talend Component Way

We are using a tFileProperties component to assess the properties of each file coming into the process. We also dropped in a tJavaRow component in the design space to implement the logic that sets a Global Variable to a Boolean depending on the output of the properties component. Below is the simple snipped of Java code used.

If (input_row.size == 0) {
globalMap.put(“isEmpty”, true);
} else {
globalMap.put(“isEmpty”, false);
}

 

The Java code Way

In the Java code way, we use java to read the file instead of the Talend component. While reducing the number of components in a flow it might become harder to figure out later what the goal of the java code was. Since Talend is a low-code platform and one of the goals of using Talend is to make data integration modular it might be easier in the long run to use the Talend components. Even so, it is always good to know multiple solutions and understand the purpose of the integration.

java.io.File newFile = new java.io.File(“C:/ExampleFile”);

if (newFile.length()==0) {
System.out.println(“File is empty.”);
globalMap.put(“isEmpty”, true);
} else {
System.out.println(“File is not empty.”);
globalMap.put(“isEmpty”, false);
}

Talend: Quickly remapping in tMap Components

Mapping is an important part of a Data Integration solution and can be sensitive to errors when changes need to be made. Typing changes to source and target columns by hand isn’t always viable when a lot of mapped items need to be changed. That is why Talend has the option to drag and drop your mappings. This will normally add a source column to a target but when holding the CTRL key, it will update the mapping.

As an example, we have three source rows and three target rows, which are mapped in a 1-1 fashion. Our goal is to change the mapping by shifting down all source columns one place. Below our original mapping (A) and our goal (B).

(A)

(B)

Normally, dragging a source column onto an already mapped target column will add the field behind it in the mapping in a mode called ‘append mode’. It would take multiple actions to either type in the new source column or remove the mapping and drag the new column in. An easier way would be to hold the CTRL key while dragging the new source column on top of the target, this is called ‘overwrite mode’. It will show a popup explaining you are going to replace a mapping (C).

(C)

Talend / Cloud: Connection Parameters in Talend Cloud

When using Talend v8 you can use Talend Cloud as your Management Tool and it is accessible anywhere with an internet connection. As a part of Talend Cloud you can easily manage and share connections between the jobs you develop. To set up connections in the Talend Cloud you can use the Connection Parameters in your jobs. These are special context parameters that are recognized in the Management Console on the Cloud.

To use the connection parameters a certain formatting is needed when creating your Context Variables. This ensures that any job using these parameters van access the connection in the cloud, making operational management a lot easier and centralized. The format is as follows:

“connection_<applicationName>_<parameterName>”

There are some preset keywords for the applicationName that can be recognized in Talend Cloud to set up connections to common database types or file transfer systems. They also use common parameterNames to recognize for example a host, username, or password. Import to notice here is the use of camelCase (connect all words and capitalize each first letter of each new word).

 

When you’ve published your job to the Cloud you can create the connection and access it through those context variables. The application you set up in the cloud is the applicationName in the format of the context variable, making it recognizable for Talend Studio. Now, any job you develop can use these context variables to make use of the connection in Talend Cloud.

 

That’s it for another 5 Data Tips from IntoData. There are more Data Tips and other educational content available on our blog. You can follow us on Instagram, TikTok, LinkedIn, YouTube and for more information don’t hesitate to contact us!

Eens van gedachten wisselen?

Heb je vragen over deze tips en tricks dan kan je ons hier bereiken. Stuur je vragen en opmerkingen op dan helpen we je graag!