How To Remove Double Quotes From Data In HIVE Table?

Hadoop Logo
Image Source: Internet

Hello! When you are dealing with data ingestion in Hadoop via hive tables, we often incur a situation where there are “double-quotes” in the raw-data. It looks something like this:

useridnameoccupation
“102030”“Paul”“Manager”
“405067”“Alex”“Technical Lead”
“124567”“Raj”“Software Engineer”
Double quotes in data within each cell of the HIVE Table

Of course, if the datasets are small, excel will be our go-to tool. But, what if you are dealing with huge datasets? ( More than 10GB, etc )

Well not to worry! In order to remove double quotes from data in HIVE table, HIVE offers an in-built table property/feature, that should be used during the table definition.

The feature I’m referring to is the OpenCSV SERDE property.

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

The create table statement, which handles the removal of double quotes from the hive table is given below:

CREATE TABLE IF NOT EXISTS user_info_tab
(userid bigint, 
 name string, 
 occupation string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\;",
"quoteChar" = '\"'
);

Please note:

"separatorChar" = "\;" 

This is to specify the delimiter of the raw-datasets. In this case, the delimiter is a semi-colon. Hence, we are “escaping” the semi-colon. Otherwise, HIVE will throw an error.

If the delimiter is a comma, or a pipe, then you don’t need to escape it.

"quoteChar" = '\"'

This is to remove the double quotes from data in HIVE table. Again, we have to “escape” the double-quote while specifying. This applies to any special character than you want to remove.

So, with this simple, yet effective method, you can easily remove the double quotes in data from HIVE table. It will look something like this :

useridnameoccupation
102030PaulManager
405067AlexTechnical Lead
124567RajSoftware Engineer
Double quotes removed from data in HIVE table

Please let me know if this was useful in the comments below. 🙂 Also, let me know if there are any queries/issues with the above method in the comments. 😀

Also, check out this cool Hadoop project, to enhance your HIVE skills!

Peace!

Hive
Image Source: Internet