April 24th 2022

When you import data to HDFS using ‘sqoop import’ command, all the columns data is separated by ‘,’.

How to customize the field or column separator?

Using --fields-terminated-by option, we can customized the field separator.

The Ultimate Guide to Cloud Gaming: D…
best projectors for home
A Sustainable Solution for a Greener …
Is the Euphoria Around Electric Vehic…
Realme GT Neo 6 Snapdragon 8s Gen 3 S…

Example

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username "root" \
--password "cloudera" \
--table "customers" \
--target-dir /field_separator_demo \
-m 1 \
--where "customer_id \
--fields-terminated-by '|'

[cloudera@quickstart ~]$ sqoop import \
> --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
> --username "root" \
> --password "cloudera" \
> --table "customers" \
> --target-dir /field_separator_demo \
> -m 1 \
> --where "customer_id \
> --fields-terminated-by '|'
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/03 21:27:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
22/04/03 21:27:40 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/03 21:27:41 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/04/03 21:27:41 INFO tool.CodeGenTool: Beginning code generation
22/04/03 21:27:41 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
22/04/03 21:27:41 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
22/04/03 21:27:41 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/d2f02433d03afccf7129901fd95ce877/customers.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/04/03 21:27:43 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/d2f02433d03afccf7129901fd95ce877/customers.jar
22/04/03 21:27:43 WARN manager.MySQLManager: It looks like you are importing from mysql.
22/04/03 21:27:43 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
22/04/03 21:27:43 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
22/04/03 21:27:43 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
22/04/03 21:27:43 INFO mapreduce.ImportJobBase: Beginning import of customers
22/04/03 21:27:43 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
22/04/03 21:27:43 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/04/03 21:27:44 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/04/03 21:27:44 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/04/03 21:27:45 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1281)
	at java.lang.Thread.join(Thread.java:1355)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/03 21:27:46 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1281)
	at java.lang.Thread.join(Thread.java:1355)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/03 21:27:46 INFO db.DBInputFormat: Using read commited transaction isolation
22/04/03 21:27:46 INFO mapreduce.JobSubmitter: number of splits:1
22/04/03 21:27:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1649003113144_0006
22/04/03 21:27:46 INFO impl.YarnClientImpl: Submitted application application_1649003113144_0006
22/04/03 21:27:47 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1649003113144_0006/
22/04/03 21:27:47 INFO mapreduce.Job: Running job: job_1649003113144_0006
22/04/03 21:27:53 INFO mapreduce.Job: Job job_1649003113144_0006 running in uber mode : false
22/04/03 21:27:53 INFO mapreduce.Job:  map 0% reduce 0%
22/04/03 21:27:59 INFO mapreduce.Job:  map 100% reduce 0%
22/04/03 21:27:59 INFO mapreduce.Job: Job job_1649003113144_0006 completed successfully
22/04/03 21:27:59 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=171819
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=673
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=3727
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=3727
		Total vcore-milliseconds taken by all map tasks=3727
		Total megabyte-milliseconds taken by all map tasks=3816448
	Map-Reduce Framework
		Map input records=9
		Map output records=9
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=48
		CPU time spent (ms)=580
		Physical memory (bytes) snapshot=139567104
		Virtual memory (bytes) snapshot=1510182912
		Total committed heap usage (bytes)=60751872
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=673
22/04/03 21:27:59 INFO mapreduce.ImportJobBase: Transferred 673 bytes in 14.7801 seconds (45.5341 bytes/sec)
22/04/03 21:27:59 INFO mapreduce.ImportJobBase: Retrieved 9 records.
[cloudera@quickstart ~]$

Let’s query the folder ‘/field_separator_demo’ and check whether | is used as field separator or not.

[cloudera@quickstart ~]$ hadoop fs -cat /field_separator_demo/*
1|Richard|Hernandez|XXXXXXXXX|XXXXXXXXX|6303 Heather Plaza|Brownsville|TX|78521
2|Mary|Barrett|XXXXXXXXX|XXXXXXXXX|9526 Noble Embers Ridge|Littleton|CO|80126
3|Ann|Smith|XXXXXXXXX|XXXXXXXXX|3422 Blue Pioneer Bend|Caguas|PR|00725
4|Mary|Jones|XXXXXXXXX|XXXXXXXXX|8324 Little Common|San Marcos|CA|92069
5|Robert|Hudson|XXXXXXXXX|XXXXXXXXX|10 Crystal River Mall |Caguas|PR|00725
6|Mary|Smith|XXXXXXXXX|XXXXXXXXX|3151 Sleepy Quail Promenade|Passaic|NJ|07055
7|Melissa|Wilcox|XXXXXXXXX|XXXXXXXXX|9453 High Concession|Caguas|PR|00725
8|Megan|Smith|XXXXXXXXX|XXXXXXXXX|3047 Foggy Forest Plaza|Lawrence|MA|01841
9|Mary|Perez|XXXXXXXXX|XXXXXXXXX|3616 Quaking Street|Caguas|PR|00725

Previous Next Home

This post first appeared on Java Tutorial : Blog To Learn Java Programming, please read the originial post: here