Question
Answer and Explanation
To sort a file and print only unique values using shell commands, you can combine the sort
and uniq
utilities. Here's a breakdown of how to do it:
1. The `sort` Command:
- The `sort` command arranges the lines of a file in ascending order, by default. This is crucial because `uniq` only detects consecutive duplicate lines.
- The basic syntax is: sort filename.txt
.
2. The `uniq` Command:
- The `uniq` command filters out adjacent duplicate lines. Without sorting, `uniq` would not identify non-consecutive duplicates as unique entries.
- To print only unique lines, use: uniq
.
3. Combining `sort` and `uniq`:
- You use a pipe (|) to send the output of `sort` as input to `uniq`. This combination ensures that all identical lines are adjacent for `uniq` to process effectively.
- The command to achieve this is: sort filename.txt | uniq
.
4. Example:
- Suppose you have a file named data.txt
:
apple
banana
apple
orange
banana
grape
- Running the command sort data.txt | uniq
will produce the following output:
apple
banana
grape
orange
5. `uniq` Options:
- The `uniq -c` option counts occurrences and displays them: sort data.txt | uniq -c
will show how many times each unique line appears.
- The `uniq -d` option prints only the duplicate lines: sort data.txt | uniq -d
will list the lines that appeared more than once in the original file.
- The `uniq -u` option prints only unique lines, which is the same as simply using uniq
without arguments.
In Summary:
By combining the sort
command with uniq
, you can effectively sort a file and extract unique values using simple and powerful shell commands. This is incredibly useful for data cleaning and analysis tasks. Remember, the `sort` operation is essential because `uniq` relies on sorted input to identify duplicates.