About using Talend Data

    Some time ago, I was working on ETL and came into contact with Talend, a software that specializes in ETL process.

    There are several versions of ETL with different functions, and the versions of several functions have different functions.

    Talend itself has several versions developed based on Eclipse.

    Here I mainly use Data Quality and Big Data. The use of Data Quality itself is not a big problem, but the use of Big Data will be stuck and crashed. If you have the condition of solid state drive, try to put Big Data on the solid state drive to run.

    The function of Data Quality is simple, and it is quick to get started. Let me briefly explain: you can probably know what it means by looking at the name, and simply analyze the data of the table. For example, when we usually want to see any abnormal data in the table, we will Let’s go through the data in the table one by one. This kind of traversal can improve the accuracy of filtering abnormal data. A small amount of data is acceptable, but a large amount of data appears in a section, so the way of traversing one by one is very different. It’s reliable, but the time cost is too high, after all, people’s energy is limited. The role of Data Quality is to quickly let us see roughly what data is in this table.

    Of course, you can't expect a software to tell you all the data in question at one time. It can only roughly distinguish which fields and values ​​appear. The main principle is to rely on SQL and statements. To put it simply, the SQL statements that we need to wash ourselves are packaged for you to use. In this way, we can use specific SQL statements without writing them ourselves, but by clicking a few button to use the function. The early stage of ETL should be an iterative process. It probably occurs once, and sees whether there is still abnormal data in the result, and then continues to iteratively change the conditions of ETL.

   

 

 

 

 

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326466439&siteId=291194637