Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Sqoop export: Export data from HDFS to a database table

In this post, I am going to explain how to export data from HDFS to a database table.

 

Step 1: Let’s create a database and employee table.

 

Login to mysql shell.

[cloudera@quickstart ~]$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 75
Server version: 5.1.73 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

 

‘cloudera’ is the password for cloudera quick start vm instance.

 

Create a database ‘export_demo’.

mysql> CREATE DATABASE export_demo;
Query OK, 1 row affected (0.00 sec)

Create an employee table in export_demo database.

CREATE TABLE employee (id INT, name VARCHAR(20), age INT);

mysql> USE export_demo;
Database changed
mysql>
mysql> CREATE TABLE employee (id INT, name VARCHAR(20), age INT);
Query OK, 0 rows affected (0.01 sec)

mysql>
mysql> DESCRIBE employee;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(20) | YES | | NULL | |
| age | int(11) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

Step 2: Let’s create emp.csv file and copy it to HDFS.

 

emp.csv

1,Ram,31
2,Krishna,32
3,Joel,41
4,Shankar,38
5,Shanthi,48
6,Sameer,29

Copy emp.csv file to HDFS location /export_demo.

[cloudera@quickstart ~]$ hadoop fs -mkdir /export_demo
[cloudera@quickstart ~]$

[cloudera@quickstart ~]$ hadoop fs -copyFromLocal emp.csv /export_demo
[cloudera@quickstart ~]$

[cloudera@quickstart ~]$ hadoop fs -ls /export_demo
Found 1 items
-rw-r--r-- 1 cloudera supergroup 70 2022-04-04 09:37 /export_demo/emp.csv
[cloudera@quickstart ~]$

[cloudera@quickstart ~]$ hadoop -fs -cat /export_demo/emp.csv
Error: No command named `-fs' was found. Perhaps you meant `hadoop fs'
[cloudera@quickstart ~]$

[cloudera@quickstart ~]$ hadoop fs -cat /export_demo/emp.csv
1,Ram,31
2,Krishna,32
3,Joel,41
4,Shankar,38
5,Shanthi,48
6,Sameer,29

Step 3: Export the content of emp.csv file to employee table.

sqoop export \
--connect "jdbc:mysql://quickstart.cloudera:3306/export_demo" \
--username "root" \
--password "cloudera" \
--table "employee" \
--export-dir /export_demo/emp.csv

[cloudera@quickstart ~]$ sqoop export \
> --connect "jdbc:mysql://quickstart.cloudera:3306/export_demo" \
> --username "root" \
> --password "cloudera" \
> --table "employee" \
> --export-dir /export_demo/emp.csv
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $
ACCUMULO_HOME to the root of your Accumulo installation.
22/04/04 09:42:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
22/04/04 09:42:37 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/04 09:42:38 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/04/04 09:42:38 INFO tool.CodeGenTool: Beginning code generation
22/04/04 09:42:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
22/04/04 09:42:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
22/04/04 09:42:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/c9f4658e25d6663f2b3cec91b53fba16/employee.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/04/04 09:42:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/c9f4658e25d6663f2b3cec91b53fba16/employee.jar
22/04/04 09:42:40 INFO mapreduce.ExportJobBase: Beginning export of employee
22/04/04 09:42:40 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
22/04/04 09:42:40 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/04/04 09:42:41 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
22/04/04 09:42:41 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
22/04/04 09:42:41 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/04/04 09:42:42 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/04/04 09:42:43 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$
DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/04 09:42:43 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$
DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$
DataStreamer.run(DFSOutputStream.java:894)
22/04/04 09:42:43 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$
DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/04 09:42:43 INFO input.FileInputFormat: Total input paths to process : 1
22/04/04 09:42:43 INFO input.FileInputFormat: Total input paths to process : 1
22/04/04 09:42:43 INFO mapreduce.JobSubmitter: number of splits:4
22/04/04 09:42:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
22/04/04 09:42:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1649003113144_0010
22/04/04 09:42:44 INFO impl.YarnClientImpl: Submitted application application_1649003113144_0010
22/04/04 09:42:44 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1649003113144_0010/
22/04/04 09:42:44 INFO mapreduce.Job: Running job: job_1649003113144_0010
22/04/04 09:42:51 INFO mapreduce.Job: Job job_1649003113144_0010 running in uber mode : false
22/04/04 09:42:51 INFO mapreduce.Job: map 0% reduce 0%
22/04/04 09:43:05 INFO mapreduce.Job: map 25% reduce 0%
22/04/04 09:43:08 INFO mapreduce.Job: map 50% reduce 0%
22/04/04 09:43:09 INFO mapreduce.Job: map 100% reduce 0%
22/04/04 09:43:10 INFO mapreduce.Job: Job job_1649003113144_0010 completed successfully
22/04/04 09:43:11 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=683716
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=799
HDFS: Number of bytes written=0
HDFS: Number of read operations=19
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=56387
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=56387
Total vcore-milliseconds taken by all map tasks=56387
Total megabyte-milliseconds taken by all map tasks=57740288
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=596
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=615
CPU time spent (ms)=2250
Physical memory (bytes) snapshot=502194176
Virtual memory (bytes) snapshot=6032302080
Total committed heap usage (bytes)=243007488
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
22/04/04 09:43:11 INFO mapreduce.ExportJobBase: Transferred 799 bytes in 29.0627 seconds (27.4923 bytes/sec)
22/04/04 09:43:11 INFO mapreduce.ExportJobBase: Exported 6 records.
[cloudera@quickstart ~]$

Let’s query employee table and confirm the same.

mysql> SELECT * FROM employee;
+------+---------+------+
| id | name | age |
+------+---------+------+
| 6 | Sameer | 29 |
| 1 | Ram | 31 |
| 2 | Krishna | 32 |
| 3 | Joel | 41 |
| 4 | Shankar | 38 |
| 5 | Shanthi | 48 |
+------+---------+------+
6 rows in set (0.00 sec)


Note

Sqoop jobs either export or import, are map only jobs. There is no reducer phase.




Previous                                                    Next                                                    Home


This post first appeared on Java Tutorial : Blog To Learn Java Programming, please read the originial post: here

Share the post

Sqoop export: Export data from HDFS to a database table

×

Subscribe to Java Tutorial : Blog To Learn Java Programming

Get updates delivered right to your inbox!

Thank you for your subscription

×