Hi all,

I'm hoping I can have a little light shed on this subject, I'm trying to write a script that searches through a massive csv (over 6GB in size!!!! ) and searches for any duplicate lines in that file. The ideal situation would be that the script reads the first line of the csv then searches the entire file for any copies of that line thus indicating duplication, then moves onto the second line etc. I'm hoping this won't add too much of a complication but id like the script to delete any duplications it finds thus leaving a substantially smaller sized csv with a single instance of the string.

Any help would be greatly appreciated.

Cheers \:\)