TextProcessing Flashcards
How can you pass code as an argument to the ruby interpreter?
Use the -e flag when invoking Ruby:
ruby -e 'puts "Hello world"'
Implement cat
ruby -ne ‘puts $_’ file.txt
The -n switch acts as though the code you pass to Ruby was wrapped in the following:
while gets
` # code here `
end
In short, this means that the code you pass in the -e argument is executed once for each line in your input. So, imagining that you had a file called foo.txt, with the following content:
foo
bar
baz
Then invoking Ruby like so:
$ ruby -ne 'puts $_' file.txt
Will output:
foo
bar
baz
Congratulations! You’ve just implemented cat in Ruby.
The -n switch
The -n switch acts as though the code you pass to Ruby was wrapped in the following:
while gets
` # code here`
end
In short, this means that the code you pass in the -e argument is executed once for each line in your input.
$_: what is it?
Throughout these examples, you’ll perhaps have noticed the use of the special global variable $_. When you invoke Ruby this way, it sets $_ to the current line that’s being processed; so if you wanted to do something like only print lines that start with “f”, that would be very easy:
ruby -ne 'puts $_ if $_ =~ /^f/' file.txt
Print all the lines in a file which begin with the letter f?
ruby -ne 'puts $_ if $_ =~ /^f/' file.txt
Explain this one liner.
The -p switch
The -p switch acts similarly to -n, in that it loops over each of the lines in the input. However, it goes a bit further: after your code has finished, it always prints the value of $_. So, you can imagine it as:
while gets
` # code here `
` puts $_ `
end
It’s really useful, then, for doing transformations on the input. If you wanted to take every line you were given, but replace every instance of the letter e you found with the letter a, you could do:
echo "eats, shoots, and leaves" | ruby -pe '$_.gsub!("e", "a")' aats, shoots, and laavas
Here, we modify the value of $_, and this modified value is what’s printed to the scree
BEGIN block
Of course, our code here runs in a loop; what if we wanted to run something just once, before our loop starts? We might want to initialise a variable, for example.
In Ruby, we can use BEGIN
blocks to do this. They allow us to execute code just once, at the start of the program.
So, to output line numbers from your input, you could do:
echo "foo\nbar\nbaz" | ruby -ne 'BEGIN { i = 1 }; puts "#{i} #{$_}"; i += 1'
Here, we initialise i to 0 at the start of the script. The ` BEGIN ` block executes only once, so is ignored on subsequent loops; we can then increment i, producing the following output:
1 foo
2 bar
3 baz
Double-space only non-blank lines.
ruby -ne 'print; puts unless ~/^$/'
Explain your code.
Precede each line by its file-specific line number (left-aligned)
ruby -pe 'print $<.file.lineno, "\t"'
How is the -p switch helping?
Count the number of lines in a file
ruby -ne 'END{printf "%8d %s\n", $., $FILENAME}'
How can we solve this problem using a builtin Unix utility?
Print the sums of the fields of every line (expects fields to be integers).
ruby -ane 'puts $F.reduce(0){|sum,x| sum+x.to_i}'
Print the number of fields on each line, followed by the line.
ruby -ane 'BEGIN{$,="\t"}; print $F.size, $_'
ruby -ane 'printf "%3d %s", $F.size, $_'
Explain what is going on in the code above.
Print the last field of the last line.
ruby -ane 'END{puts $F.last}'
What will happen if your run this code on a CSV file?
Print every line with more than 4 fields.
ruby -ane 'print if $F.size > 4'
Print every line for which the value of the last field is > 4.
ruby -ane 'print if $F.last.to_i > 4'
Explain every word of your code.