Reading .csv files with and without headers using different delimiters in Mule 4

Flat files especially .csv files are very common for data loading and data transfers. We generally come across scenarios where we want to read a .csv file, which may or may not contain the header row. Also, there is a chance .csv files don't have ','(comma) as delimiter rather has ''|"(pipe) as a delimiter or any other character as a delimiter. This article will discuss various ways to read such files in Mule 4 and create a JSON output of it.

Mule Flow

This will be a simple flow, where we will be reading a file using a file connector. Then convert the read data to JSON using transform message. Finally, log the JSON payload to the console using Logger. Here, we are not doing any complex transformation, it will be more like a MIME type conversion from application/csv to application/json, for better understanding the structure of input data.

Steps:

  • Create a flow in your mule application.
  • Drag and drop file connector with operation "On New or Updated file" on the source of mule flow.
  • In file connector properties under the general tab, specify or create a new connector configuration. Provide input directory, and polling frequency(here it is kept as 10 seconds). Specify the post-processing action like deleting the file, moving the file, or renaming the file. As per the below configurations, the mule application will look for a file in the inbound folder every 10 seconds. If the file is available, then it will process and move the file to the archive directory, deleting the file from the inbound location.
  • Drag and drop "Transform Message" from mule palette, and have output mime type as "application/json" as below.
  • Drag and drop logger from mule palette. Here we will be logging payload after transformation.
  • The final mule flow will look as below.

.csv file having header row in the file content

Let us take below employeeData.csv file as an example. For simplicity, we are taking dummy data of only 5 employees.
id,first name,last name,email,gender,designation
1,Dom,Lissett,dlissett0@test.com,Male,Recruiter
2,Susan,Moogan,smoogan1@test.com,Female,Executive Secretary
3,Carl,Cotesford,lcotesford2@test.com,Male,Tech Architect
4,Debor,Sprakes,dsprakes3@test.com,Female,Account Coordinator
5,Priscella,Genny,pgenny5@test.com,Female,Environmental Tech

In this example, data in the first row signifies the headers separated by "," as a delimiter. Rest all other rows represent employee data.
  • Since the above .csv file contains a header row, all we need to do is to set the MIME type of file connector as "application/csv". To do so, click on the file connector. Click on MIME Type and select "application/csv" from the drop-down list of values.
  • Place the file as defined in the file connector. Run the application. We will see the JSON response logged in the console as shown below.
INFO  2021-07-27 23:08:55,916 [[MuleRuntime].uber.03: [csv-read].csv-readFlow.CPU_INTENSIVE @1e42a374] [processor: csv-readFlow/processors/1; event: 895d1200-efb1-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
  {
    "id": "1",
    "first name": "Dom",
    "last name": "Lissett",
    "email": "dlissett0@test.com",
    "gender": "Male",
    "designation": "Recruiter"
  },
  {
    "id": "2",
    "first name": "Susan",
    "last name": "Moogan",
    "email": "smoogan1@test.com",
    "gender": "Female",
    "designation": "Executive Secretary"
  },
  {
    "id": "3",
    "first name": "Carl",
    "last name": "Cotesford",
    "email": "lcotesford2@test.com",
    "gender": "Male",
    "designation": "Tech Architect"
  },
  {
    "id": "4",
    "first name": "Debor",
    "last name": "Sprakes",
    "email": "dsprakes3@test.com",
    "gender": "Female",
    "designation": "Account Coordinator"
  },
  {
    "id": "5",
    "first name": "Priscella",
    "last name": "Genny",
    "email": "pgenny5@test.com",
    "gender": "Female",
    "designation": "Environmental Tech"
  }
]

Data in the first row became the keys for the JSON output.

.csv file without header row in the file content

Let us assume there is no header row in the content of employeeData.csv
1,Dom,Lissett,dlissett0@test.com,Male,Recruiter
2,Susan,Moogan,smoogan1@test.com,Female,Executive Secretary
3,Carl,Cotesford,lcotesford2@test.com,Male,Tech Architect
4,Debor,Sprakes,dsprakes3@test.com,Female,Account Coordinator
5,Priscella,Genny,pgenny5@test.com,Female,Environmental Tech

If we try to run with the same configurations, it will take the first row as keys and console output will look like.
INFO  2021-07-27 23:21:15,609 [[MuleRuntime].uber.03: [csv-read].csv-readFlow.CPU_INTENSIVE @1e42a374] [processor: csv-readFlow/processors/1; event: 4254cbd0-efb3-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
  {
    "1": "2",
    "Dom": "Susan",
    "Lissett": "Moogan",
    "dlissett0@test.com": "smoogan1@test.com",
    "Male": "Female",
    "Recruiter": "Executive Secretary"
  },
  {
    "1": "3",
    "Dom": "Carl",
    "Lissett": "Cotesford",
    "dlissett0@test.com": "lcotesford2@test.com",
    "Male": "Male",
    "Recruiter": "Tech Architect"
  },
  {
    "1": "4",
    "Dom": "Debor",
    "Lissett": "Sprakes",
    "dlissett0@test.com": "dsprakes3@test.com",
    "Male": "Female",
    "Recruiter": "Account Coordinator"
  },
  {
    "1": "5",
    "Dom": "Priscella",
    "Lissett": "Genny",
    "dlissett0@test.com": "pgenny5@test.com",
    "Male": "Female",
    "Recruiter": "Environmental Tech"
  }
]

By default, after setting MIME type to "application/json", mule considers the first row as the header row. In such cases, we don't want to consider the first row as the header row.
  • Click on the file connector. Click on MIME Type. Click on add parameter. Set key as "header" and value as "false". Save the changes.
Now, by using the same file, we will be able to read data properly. But in this case, autogenerated values for keys like coloumn_0, column_1, etc. will be populated. Below is a sample log snippet.
INFO  2021-07-27 23:35:57,158 [[MuleRuntime].uber.05: [csv-read].csv-readFlow.CPU_INTENSIVE @6fbaa2fe] [processor: csv-readFlow/processors/1; event: 4fbf0590-efb5-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
  {
    "column_0": "1",
    "column_1": "Dom",
    "column_2": "Lissett",
    "column_3": "dlissett0@test.com",
    "column_4": "Male",
    "column_5": "Recruiter"
  },
  {
    "column_0": "2",
    "column_1": "Susan",
    "column_2": "Moogan",
    "column_3": "smoogan1@test.com",
    "column_4": "Female",
    "column_5": "Executive Secretary"
  },
  {
    "column_0": "3",
    "column_1": "Carl",
    "column_2": "Cotesford",
    "column_3": "lcotesford2@test.com",
    "column_4": "Male",
    "column_5": "Tech Architect"
  },
  {
    "column_0": "4",
    "column_1": "Debor",
    "column_2": "Sprakes",
    "column_3": "dsprakes3@test.com",
    "column_4": "Female",
    "column_5": "Account Coordinator"
  },
  {
    "column_0": "5",
    "column_1": "Priscella",
    "column_2": "Genny",
    "column_3": "pgenny5@test.com",
    "column_4": "Female",
    "column_5": "Environmental Tech"
  }
]

.csv file without header row and having a "|" as a delimiter

If we replace "," delimiter with "|" delimiter in employeeData.csv
1|Dom|Lissett|dlissett0@test.com|Male|Recruiter
2|Susan|Moogan|smoogan1@test.com|Female|Executive Secretary
3|Carl|Cotesford|lcotesford2@test.com|Male|Tech Architect
4|Debor|Sprakes|dsprakes3@test.com|Female|Account Coordinator
5|Priscella|Genny|pgenny5@test.com|Female|Environmental Tech

On running the same mule application with the above data, the output of the above employeeData.csv file will be.

INFO  2021-07-27 23:42:06,938 [[MuleRuntime].uber.05: [csv-read].csv-readFlow.CPU_INTENSIVE @6fbaa2fe] [processor: csv-readFlow/processors/1; event: 2c2f0f20-efb6-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
  {
    "column_0": "1|Dom|Lissett|dlissett0@test.com|Male|Recruiter"
  },
  {
    "column_0": "2|Susan|Moogan|smoogan1@test.com|Female|Executive Secretary"
  },
  {
    "column_0": "3|Carl|Cotesford|lcotesford2@test.com|Male|Tech Architect"
  },
  {
    "column_0": "4|Debor|Sprakes|dsprakes3@test.com|Female|Account Coordinator"
  },
  {
    "column_0": "5|Priscella|Genny|pgenny5@test.com|Female|Environmental Tech"
  }
]

Now, we are getting only one key-value pair for each row. By default mule considers "," as delimiter for "application/csv" MIME type data. We need to explicitly define "|" as a delimiter in parameters under the MIME type.

  • Click on the file connector. Click on MIME Type. Click on add parameter. Set Key as "separator" and Value as "|". Save the changes.


Now, by using the same file, we will be able to read data properly as shown below.

INFO  2021-07-27 23:51:57,557 [[MuleRuntime].uber.06: [csv-read].csv-readFlow.CPU_INTENSIVE @6efd4175] [processor: csv-readFlow/processors/1; event: 8c31a710-efb7-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
  {
    "column_0": "1",
    "column_1": "Dom",
    "column_2": "Lissett",
    "column_3": "dlissett0@test.com",
    "column_4": "Male",
    "column_5": "Recruiter"
  },
  {
    "column_0": "2",
    "column_1": "Susan",
    "column_2": "Moogan",
    "column_3": "smoogan1@test.com",
    "column_4": "Female",
    "column_5": "Executive Secretary"
  },
  {
    "column_0": "3",
    "column_1": "Carl",
    "column_2": "Cotesford",
    "column_3": "lcotesford2@test.com",
    "column_4": "Male",
    "column_5": "Tech Architect"
  },
  {
    "column_0": "4",
    "column_1": "Debor",
    "column_2": "Sprakes",
    "column_3": "dsprakes3@test.com",
    "column_4": "Female",
    "column_5": "Account Coordinator"
  },
  {
    "column_0": "5",
    "column_1": "Priscella",
    "column_2": "Genny",
    "column_3": "pgenny5@test.com",
    "column_4": "Female",
    "column_5": "Environmental Tech"
  }
]

References:


Please share your valuable feedback 😊
This is a repost of the LinkedIn article....



Comments

Popular posts from this blog

DateTime formatting using xp20:format-dateTime ()

Create Delimited String from XML Nodes and Vice Versa in SOA 12c

Import and Export MDS artifacts in SOA 12c