20 August 2007

redirect file list to grep, ls, less, more and such...

Quite soon I understood that the problem is that grep, ls, less, more and other commands like that, do not take input from STDIN. They possibly take the file list as a command line argument. So you can not redirect a file list (made, for example, by some other command chain) to such commands:
find . -name "*.foo" | grep bar #IT DOESN'T WORK!
A short solution is to make use of the backtick command-output substitution:
grep bar `find . -name "*.foo"`
If you have a keyboard layout that doesn't have the backtick, this may be quite cumbersome. Ok, I set a keyboard shortcut to change on-the-fly the keyboard layout, but still in this way I have to repeatedly switch back and forth...
The next solution is to make use of the other syntax for command-output substitution:
grep bar $(find . -name "*.foo")
but finally I discovered the right solution, which preserves the piping redirection order of the commands, namely xargs:
find . -name "*.foo" | xargs grep bar
Enjoy.

23 comments:

  1. great! I didn't know that!

    ReplyDelete
  2. Anonymous30/8/07 09:30

    Mmm... sounds a bit complicated... 'find' has a -exec options that can perfectly workaround the 'grep' limitation:

    find . -name "*.foo" -exec grep bar {} \;

    Have fun. M.

    ReplyDelete
  3. Anonymous30/8/07 09:54

    Thanks, Marco.
    The point is when you're not using find to get the file list for the next command. For example when you are getting the file list from a 3 lines long chain of awk'n'grep'n'cut commands that parse a single log file... :)
    Anyway, I later find xargs quite less useful than it seems from this post. In fact it does not correctly handle, for example, a star made up file list. Let me give you a simplified example from a 3 lines chain of commands of mine:
    echo "*.pdf" | xargs ls #it doesn't work!
    So I came back to the tick solution
    ls `echo "*.pdf *.doc"`
    which is not so elegant (mainly because it break the pipe flow of my 3 lines chain), but at least it worked for every situation I dealt with...
    Cheers.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. You don't have to change the keyboard layout to find the backstick: in the italian keyboard you find it with
    ALT GR + ' (apostrophe).
    Enjoy!!

    ReplyDelete
  6. Anonymous24/9/07 14:22

    Great!
    Thanx, Cristian Cantoro! :)

    ReplyDelete
  7. Computers are so strange... ;)

    The command

    find . -name "*.foo" | grep bar

    works flawlessy on my (plain vanilla) linux box.

    Some commands that usually don't take their input from STDIN may be persuaded to do it using a "-" (dash) as the filename, e.g. you may want to try this:

    find . -name "*.foo" | grep bar -

    But I never had any problem feeding grep, less or more with data from STDIN. Maybe your situation is more complex than what I can figure from the post...

    Bye, Enrico

    ReplyDelete
  8. Anonymous28/9/07 21:32

    Hi Enrico,
    with the command
    find . -name "*.foo" | grep bar
    you get the filelist of .foo files which have the "bar" (sub)string in their name.
    What I was looking for is to grep the string "bar" inside each of the *.foo files. This is what the xargs trick make you able to do.
    Anyway yes, later on I discovered just the existence of the "trailing minus" trick... :)
    Thanx, bye

    ReplyDelete
  9. >What I was looking for is to grep the string "bar" inside each of the *.foo files.

    Yup! Now it's clear, I knew I was missing something.

    Just for the sake of mentioning every single way of doing this same thing, for jobs like that I have always tended to use a "for" loop, in bash:

    for filename in `find . -name "*.foo"`; do grep bar $filename ; done

    but this breaks the piping chain - I find it good for readability, but this may just be the way my brain is wired. In this form it's just as powerful as the backtick trick, but loops (with tests and bash's mathematical capabilities) can grow pretty powerful.

    Well, this small exchange at least opened my eyes on the usefulness of xargs :-)

    Bye, Enrico

    ReplyDelete
  10. Anonymous29/9/07 18:31

    Yes, the for loop is more readable.
    The point is when you are working on a bottom-up strategy, building your many-lines command step-by-step, adding one filter-command after the other. There is nothing mythical in the pipe-chain so that you never have to break it :) it simply goes in your same direction when you are not writing a script but a deeply-nested command-line statement...
    Concerning opening eyes, your misunderstand finally make it clear to me why such commands do not work (and couldn't ever work) in my situations: in fact, how ever they could even understand whether I want to grep the file list itself as a text, or each file this text is referring to?
    BTW, I wrote just the wrong thing in the opening of this post: these commands just do take input from STDIN, while what I was looking for is to give them arguments from STDIN! -- ok, the name xargs was supposed to give my some hint on that... :)

    Cheers!

    ReplyDelete
  11. Anonymous29/9/07 23:36

    The fact that those commands were not accepting input from STDIN sounded strange to me, in fact... but I did not want to make the "saputello". As a little contribution, I can add that you can use $() instead of the backticks. Did you know?

    ReplyDelete
  12. Anonymous30/9/07 00:11

    Mau, Mau... you are not supposed, you are required to make "saputello"!
    Otherwise I will.
    For example, you fail: the $() syntax for command-substitution was already mentioned in this post itself! Moreover, in the comments, Christian++ give me the trick for getting the backthick even within an italian-layout'ed keyboard...!
    Mau Mau... :)

    ReplyDelete
  13. Thanks, that doesn't resolve the need for STDIN feed because arguments is always limited in characters by shell. This applies to xargs and the backtick and the $(command) syntax.

    ReplyDelete
  14. Yes, Mohammed, there is the "Argument list too long" problem, which is related to the number of pages that are allocated within the kernel for command-line arguments (see this article Beyond Arguments and Limitations on Linux Journal).

    Thanks for your comment!

    ReplyDelete
  15. This comment has been removed by the author.

    ReplyDelete
  16. How about a nested backtick?

    $ grep secondstring `grep -l searchstring *`

    works for me
    -joviano dias

    ReplyDelete
  17. Yep, Jade, I think this is just the "short solution" mentioned in the post...

    PS
    Happy to see how this more-than-5-years-old post still keeps on being a hit, here at the outer boundary of the blogosphere.

    ReplyDelete
  18. xargs works fine for me on OS X Mountain Lion as I have lost a file, so I want to find all files containing a certain string I know :

    find / -ls | sed -E 's/ +/,/g' | cut -d , -f 3,11 | grep '^-' | cut -d , -f 2 | xargs grep 'fn_size'

    Thanks for all these tips.

    ReplyDelete
  19. Thank you for your feedback, BernardK!

    ReplyDelete
  20. oops ! I haven't check carefully enough before posting ...
    It doesn't work if directory names contain spaces. The corrected command could be :

    find / -ls | cut -c 17-26,89- | grep '^-' | cut -c 11- | sed -E 's/ /\\ /g' | xargs grep 'fn_size'

    but it ends up with the error : xargs: unterminated quote

    I tried with the for loop, but starting at / browses too many system folders, the output is cluttered with Permission denied messages and the result is not satisfactory. Eventually I wrote my own filter :

    ruby -w /.../search_in_files.rb / fn_size

    The program is far from being perfect, has not been rigorously tested with a test suite, but satisfies my current needs. I give it for those who arrive on that page with a Google search for passing a list of files to grep in a piped command, and come up against difficulties, as I did.

    # search_in_files.rb : Search a given string inside files, starting at a given directory.
    # If starting from root (/), do not search system directories.
    # Display only the name of the file containing the search string.

    require 'find'

    if ARGV.size < 2
    then
    puts "Usage : ruby -w /#{File.basename(__FILE__)} search-directory string-to-search"
    exit
    end

    search_directory = ARGV[0]
    search_string = ARGV[1]
    search_directory = '/' if search_directory == '.' && Dir.pwd == '/' # or else doesn't traverse

    puts "Searching for `#{search_string}` within files, starting at #{search_directory} ..."
    paths = Array.new

    Find.find(search_directory) do |path|
    if search_directory == '/'
    then # exclude system files and symbolic links when searching from root directory
    Find.prune if File.directory?(path) && (path =~ /^\/[A-Z]/ ||
    path =~ /^\..+/ ||
    path =~ /^\/bin/ ||
    path =~ /^\/cores/ ||
    path =~ /^\/dev/ ||
    path =~ /^\/home/ ||
    path =~ /^\/net/ ||
    path =~ /^\/private/ ||
    path =~ /^\/sbin/ ||
    path =~ /^\/usr/ )
    Find.prune if File.symlink?(path)
    Find.prune if File.file?(path) && File.basename(path) =~ /^\./ # hidden files
    Find.prune if path == '/mach_kernel'
    end

    paths << path if File.file?(path) # collect names only if the path is a strict file
    end

    paths.each do |path|
    begin
    File.open(path, 'r').each do |line|
    if line.index(search_string)
    then # found first search_string
    puts path
    break # we are interested only in the file name, stop searching
    end
    end
    rescue Exception => e
    puts "Couldn't open #{fn} #{e}"
    end
    end

    ReplyDelete
  21. oops ! indentation completely lost. Is there a way to preserve spaces at the beginning of lines ?

    ReplyDelete