20 August 2007

redirect file list to grep, ls, less, more and such...

Quite soon I understood that the problem is that grep, ls, less, more and other commands like that, do not take input from STDIN. They possibly take the file list as a command line argument. So you can not redirect a file list (made, for example, by some other command chain) to such commands:
find . -name "*.foo" | grep bar #IT DOESN'T WORK!
A short solution is to make use of the backtick command-output substitution:
grep bar `find . -name "*.foo"`
If you have a keyboard layout that doesn't have the backtick, this may be quite cumbersome. Ok, I set a keyboard shortcut to change on-the-fly the keyboard layout, but still in this way I have to repeatedly switch back and forth...
The next solution is to make use of the other syntax for command-output substitution:
grep bar $(find . -name "*.foo")
but finally I discovered the right solution, which preserves the piping redirection order of the commands, namely xargs:
find . -name "*.foo" | xargs grep bar
Enjoy.

23 comments:

Edo said...

great! I didn't know that!

Marco said...

Mmm... sounds a bit complicated... 'find' has a -exec options that can perfectly workaround the 'grep' limitation:

find . -name "*.foo" -exec grep bar {} \;

Have fun. M.

hronir said...

Thanks, Marco.
The point is when you're not using find to get the file list for the next command. For example when you are getting the file list from a 3 lines long chain of awk'n'grep'n'cut commands that parse a single log file... :)
Anyway, I later find xargs quite less useful than it seems from this post. In fact it does not correctly handle, for example, a star made up file list. Let me give you a simplified example from a 3 lines chain of commands of mine:
echo "*.pdf" | xargs ls #it doesn't work!
So I came back to the tick solution
ls `echo "*.pdf *.doc"`
which is not so elegant (mainly because it break the pipe flow of my 3 lines chain), but at least it worked for every situation I dealt with...
Cheers.

Cristian ++ said...
This comment has been removed by the author.
Cristian ++ said...

You don't have to change the keyboard layout to find the backstick: in the italian keyboard you find it with
ALT GR + ' (apostrophe).
Enjoy!!

hronir said...

Great!
Thanx, Cristian Cantoro! :)

treponemanichols said...

Computers are so strange... ;)

The command

find . -name "*.foo" | grep bar

works flawlessy on my (plain vanilla) linux box.

Some commands that usually don't take their input from STDIN may be persuaded to do it using a "-" (dash) as the filename, e.g. you may want to try this:

find . -name "*.foo" | grep bar -

But I never had any problem feeding grep, less or more with data from STDIN. Maybe your situation is more complex than what I can figure from the post...

Bye, Enrico

hronir said...

Hi Enrico,
with the command
find . -name "*.foo" | grep bar
you get the filelist of .foo files which have the "bar" (sub)string in their name.
What I was looking for is to grep the string "bar" inside each of the *.foo files. This is what the xargs trick make you able to do.
Anyway yes, later on I discovered just the existence of the "trailing minus" trick... :)
Thanx, bye

treponemanichols said...

>What I was looking for is to grep the string "bar" inside each of the *.foo files.

Yup! Now it's clear, I knew I was missing something.

Just for the sake of mentioning every single way of doing this same thing, for jobs like that I have always tended to use a "for" loop, in bash:

for filename in `find . -name "*.foo"`; do grep bar $filename ; done

but this breaks the piping chain - I find it good for readability, but this may just be the way my brain is wired. In this form it's just as powerful as the backtick trick, but loops (with tests and bash's mathematical capabilities) can grow pretty powerful.

Well, this small exchange at least opened my eyes on the usefulness of xargs :-)

Bye, Enrico

hronir said...

Yes, the for loop is more readable.
The point is when you are working on a bottom-up strategy, building your many-lines command step-by-step, adding one filter-command after the other. There is nothing mythical in the pipe-chain so that you never have to break it :) it simply goes in your same direction when you are not writing a script but a deeply-nested command-line statement...
Concerning opening eyes, your misunderstand finally make it clear to me why such commands do not work (and couldn't ever work) in my situations: in fact, how ever they could even understand whether I want to grep the file list itself as a text, or each file this text is referring to?
BTW, I wrote just the wrong thing in the opening of this post: these commands just do take input from STDIN, while what I was looking for is to give them arguments from STDIN! -- ok, the name xargs was supposed to give my some hint on that... :)

Cheers!

Mau said...

The fact that those commands were not accepting input from STDIN sounded strange to me, in fact... but I did not want to make the "saputello". As a little contribution, I can add that you can use $() instead of the backticks. Did you know?

hronir said...

Mau, Mau... you are not supposed, you are required to make "saputello"!
Otherwise I will.
For example, you fail: the $() syntax for command-substitution was already mentioned in this post itself! Moreover, in the comments, Christian++ give me the trick for getting the backthick even within an italian-layout'ed keyboard...!
Mau Mau... :)

Anonymous said...

Thanks!

hronir said...

You are welcome!

Mohammed AbdelRahman said...

Thanks, that doesn't resolve the need for STDIN feed because arguments is always limited in characters by shell. This applies to xargs and the backtick and the $(command) syntax.

hronir said...

Yes, Mohammed, there is the "Argument list too long" problem, which is related to the number of pages that are allocated within the kernel for command-line arguments (see this article Beyond Arguments and Limitations on Linux Journal).

Thanks for your comment!

jade said...
This comment has been removed by the author.
jade said...

How about a nested backtick?

$ grep secondstring `grep -l searchstring *`

works for me
-joviano dias

hronir said...

Yep, Jade, I think this is just the "short solution" mentioned in the post...

PS
Happy to see how this more-than-5-years-old post still keeps on being a hit, here at the outer boundary of the blogosphere.

BernardK said...

xargs works fine for me on OS X Mountain Lion as I have lost a file, so I want to find all files containing a certain string I know :

find / -ls | sed -E 's/ +/,/g' | cut -d , -f 3,11 | grep '^-' | cut -d , -f 2 | xargs grep 'fn_size'

Thanks for all these tips.

hronir said...

Thank you for your feedback, BernardK!

BernardK said...

oops ! I haven't check carefully enough before posting ...
It doesn't work if directory names contain spaces. The corrected command could be :

find / -ls | cut -c 17-26,89- | grep '^-' | cut -c 11- | sed -E 's/ /\\ /g' | xargs grep 'fn_size'

but it ends up with the error : xargs: unterminated quote

I tried with the for loop, but starting at / browses too many system folders, the output is cluttered with Permission denied messages and the result is not satisfactory. Eventually I wrote my own filter :

ruby -w /.../search_in_files.rb / fn_size

The program is far from being perfect, has not been rigorously tested with a test suite, but satisfies my current needs. I give it for those who arrive on that page with a Google search for passing a list of files to grep in a piped command, and come up against difficulties, as I did.

# search_in_files.rb : Search a given string inside files, starting at a given directory.
# If starting from root (/), do not search system directories.
# Display only the name of the file containing the search string.

require 'find'

if ARGV.size < 2
then
puts "Usage : ruby -w /#{File.basename(__FILE__)} search-directory string-to-search"
exit
end

search_directory = ARGV[0]
search_string = ARGV[1]
search_directory = '/' if search_directory == '.' && Dir.pwd == '/' # or else doesn't traverse

puts "Searching for `#{search_string}` within files, starting at #{search_directory} ..."
paths = Array.new

Find.find(search_directory) do |path|
if search_directory == '/'
then # exclude system files and symbolic links when searching from root directory
Find.prune if File.directory?(path) && (path =~ /^\/[A-Z]/ ||
path =~ /^\..+/ ||
path =~ /^\/bin/ ||
path =~ /^\/cores/ ||
path =~ /^\/dev/ ||
path =~ /^\/home/ ||
path =~ /^\/net/ ||
path =~ /^\/private/ ||
path =~ /^\/sbin/ ||
path =~ /^\/usr/ )
Find.prune if File.symlink?(path)
Find.prune if File.file?(path) && File.basename(path) =~ /^\./ # hidden files
Find.prune if path == '/mach_kernel'
end

paths << path if File.file?(path) # collect names only if the path is a strict file
end

paths.each do |path|
begin
File.open(path, 'r').each do |line|
if line.index(search_string)
then # found first search_string
puts path
break # we are interested only in the file name, stop searching
end
end
rescue Exception => e
puts "Couldn't open #{fn} #{e}"
end
end

BernardK said...

oops ! indentation completely lost. Is there a way to preserve spaces at the beginning of lines ?