How to use Ruby instead of sed and awk
Many unix utilities like sed, awk and grep provide powerful ways to manipulate text. But I always need to dig through the man pages and tutorials before I can do anything with them.
This morning, I needed to remove all the empty lines from a text file. Searching for ways to do this using unix tools turned up a few options:
awk 'NF' input.txt
sed -i '/^$/d' input.txt
grep -v '^$' input.txtRemembering how to use these tools is always a challenge, so I decided to look at how to do this in Ruby. Ruby allows us to pass one-liner scripts from the command line, which lets us use it in the same way we would use awk.
Before we try replacing sed or awk with Ruby, let’s look at how we can run simple Ruby one-liners from the command line. For example:
$ ruby -e 'puts 42'
42Running this prints “42” to the console,
as you might have guessed.
The -e flag tells Ruby to read the script
from the command line,
and therefore executes puts 42.
Next, let’s look that the -n flag
which lets you pipe in text to Ruby,
and execute some code for each line of text.
$ echo 'foo' | ruby -n -e 'puts $_.upcase'
FOO$_ is a special variable that contains the last line read from STDIN.
In this case, it prints out ‘FOO’.
This also works with multiple lines of input.
Say we have a file foo.txt
with the words foo, bar and baz on each line:
$ touch foo.txt
$ echo 'foo' >> foo.txt
$ echo 'bar' >> foo.txt
$ echo 'baz' >> foo.txt
$ cat foo.txt
foo
bar
bazAnd we want to print them in uppercase.
$ cat foo.txt | ruby -n -e 'puts $_.upcase'
FOO
BAR
BAZHere, the -n flag takes each line being piped in,
and puts it in $_.
This is the equivalent of doing this:
while gets
puts $_.upcase
endThere are other interesting things we could do with this.
We could use BEGIN and END blocks to sort the lines in a file.
$ cat foo.txt | ruby -ne 'BEGIN{ $x=[]}; $x << $_.chomp; END { puts $x.sort }'
bar
baz
fooThe BEGIN block is executed before it starts processing the lines,
so we initialize a global variable to contain the lines.
The $x << $_.chomp line adds each line to the array.
The END block is executed after all lines have been processed.
Now, let’s look at the -a flag
that splits the input and stores it in
a variable $F.
If we put the following text in a file:
$ touch matz.txt
$ echo 'matz:ruby' >> matz.txt
$ echo 'guido:python' >> matz.txt
$ echo 'brendan:js' >> matz.txt
$ echo 'jose:elixir' >> matz.txt
$ cat matz.txt
matz:ruby
guido:python
brendan:js
jose:elixirand we need to extract the programming language names, we could do it like this:
$ cat matz.txt | ruby -a -F: -ne 'puts $F[1]'
ruby
python
js
elixirThat finally brings me to the original problem that I was trying to solve - remove empty lines from a text file:
$ touch empty_lines.txt
$ echo 'lorem ipsum' >> empty_lines.txt
$ echo '' >> empty_lines.txt
$ echo 'lorem ipsum' >> empty_lines.txt
$ echo '' >> empty_lines.txt
$ echo 'lorem ipsum' >> empty_lines.txt
$ echo '' >> empty_lines.txt
$ echo 'lorem ipsum' >> empty_lines.txt
$ cat empty_lines.txt
lorem ipsum
lorem ipsum
lorem ipsum
lorem ipsumAnd now, all we need to do to remove those empty lines is:
$ cat empty_lines.txt | ruby -ne 'puts $_ unless $_.chomp.empty?'
lorem ipsum
lorem ipsum
lorem ipsum
lorem ipsumAnd if we wanted to write it to a file, we can just pipe the output.
$ cat empty_lines.txt | ruby -ne 'puts $_ unless $_.chomp.empty?' >> out.txtAlthough special purpose tools like awk are very powerful, we can still use Ruby as a unix utility if we want to.
Links
- Awk-ward Ruby (an excellent essay by Ryan Tomayko about Ruby’s awk-like features)
- Text processing one-liners: Ruby vs. Awk