Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Remove duplicate lines from text files (with sort)

A quick method to Remove Duplicates from text files - including for example CSV files - where multiple records have been added (perhaps automatically) at different times resulting in multiple copies of the same record scattered throughout the file. Here is a simple one-liner bash command to remove duplicates using sort.

This method is sensitive to the line endings of the file. If you have files editing in a combination of Unix/Linux/Mac/Windows you may have a variety of line-endings in place.

The simplest route is to run the file through dos2unix before attempting the sort/unique filter.




In a bash shell enter:

sort -u file.csv -o file.csv

This takes your file, sorts it (using sort), gets the unique entries (-u) and writes it to the outfile (-o) which here is the same as the initial file.

This post first appeared on Martin Fitzpatrick – Python Coder, Postgraduate, please read the originial post: here

Share the post

Remove duplicate lines from text files (with sort)


Subscribe to Martin Fitzpatrick – Python Coder, Postgraduate

Get updates delivered right to your inbox!

Thank you for your subscription