DEAR PEOPLE FROM THE FUTURE: Here's what we've figured out so far...

Welcome! This is a Q&A website for computer programmers and users alike, focused on helping fellow programmers and users. Read more

What are you stuck on? Ask a question and hopefully somebody will be able to help you out!
+2 votes

My CSV is like this

123, xxx, yyy
abc, xxx, yyy
000, xxx, yyy
555, xxx, yyy
000, xxx, yyy
000, xxx, yyy
def, xxx, yyy
ghi, xxx, yyy
746, xxx, yyy

I have to count how many (distinct) values in the first field are numbers.
What I've tried:

grep "^[0-9]" input.csv | sed "s/,.*//" | uniq | wc -l

The problem: it prints 5 but the numbers are only 4: 123, 000, 555, 746


2 Answers

+2 votes
Best answer

From man uniq:

Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or standard output).
Note: 'uniq' does not detect repeated lines unless they are adjacent.

You want to sort before counting:

$ grep "^[0-9]\+," input.csv | sed "s/,.*//" | sort | uniq | wc -l


$ grep "^[0-9]\+," input.csv | sed "s/,.*//" | sort -u | wc -l
selected by
+2 votes

If you are OK with using Ruby as a text processor:

ruby -r csv -e 'p, col_sep: ",").filter_map {|e| e.first if e.first =~ /\d+/}.uniq.length' FILENAME 

+1 Thanks this also works great but I've selected the other as best answer because it only uses bash.


Technically tools like grep, sed, sort and wc are not part of Bash but rather of coreutils! :)

Contributions licensed under CC0