14 March 2007

get environment variables of a process with ps

Have you ever heard of the e option of ps? As man says, it shows the environment after the command. Used in conjunction with the twice wide-output option, ww, it gets ps to swamp the screen with the whole information about the environment setup at the moment each process was launched. Try it yourself, for example:
ps eww | tail -1
As the tail -1 says, the huge mess you get is a single line of output, corresponding to a single process (most likely it is just the last tail process itself...). If you take a look at that mess, you can see that it is a long list of pairs VAR=value, and it actually represents, as said, a snapshot of the environment at the moment the process started. For example you can find SHELL=/bin/bash or USER=hronir, or HOME=/home/hronir and so on.
Yeah, all this information is too much, but there are situation where one or few of those info could be valuable. Imagine, for example, that you are launching the same script from different directories (for example in order to analyze different set of data, placed in those different directories). Then, maybe after some period, you poor find out that some of these scripts, for any reason, failed to get done. For example you check with top or ps, grepping for your script, an it turns out that there are less scripts running than you launched. The big question, now, is: which script is died, and which not? Which data are still under analysis and which need to be re-submitted? (Your analysis take a lot of time, and you hope to find a way not to resubmit all the scripts...!)
Well, the answer for all these questions is just in the eww options of ps. And in particular in the PWD=/full/path/ pair, which will tell you where was the still running script launched from.
The point, now, is to make it easy to read out the value of this pair among the many others, since a huge number of screen-lines for each ps-output line is very very cumbersome to handle.
Well, after a full afternoon struggling with sed, awk and regex, I came out with this very poor result. Take your ps call grepping your scripts
ps -ely --forest | grep myscript.pl
make sure to add the eww options:
ps -ely --forest eww | grep myscript.pl
and pipe its output to sed as follows:
ps -ely --forest | grep myscript.pl | sed -r 's/(.* )? ([A-Z|_]+)=.* (PWD=\S* ).*/\1 \3/ ' | grep --color=auto PWD
It would be too boring and pedantic to explain the full path to that regular expression pattern. Let me notice only a few things.
First of all, I still not understand the behavior of this regex pattern, in particular the (.* )? (which is supposed to be related to the greedy expressions, but I think to well understand this concept and not the particular behavior I find in this case) and and some fair variations I tried...
Moreover, most of the time I wasted was spent trying to get a parametric version of this solution. A way, I mean, of asking to take out any of the pairs ps eww streams out. Actually I tryed to make a (ba)sh script (function), a perl script... but I didn't find the way to make something very svelte to be used beside the ps command. The utmost I get is to define an alias like this:
alias selectPWD="sed -r 's/(.* )? ([A-Z|_]+)=.* (PWD=\S* ).*/\1 \3/ ' | grep --color=auto PWD"
to be used as follows:
ps -ely --forest | grep myscript.pl | selectPWD
From this, of course, I could easily get any selectXXX I would need, but... can you find a parametric solution?!?
 
PS
Have you ever heard of the --color option for grep? Long ago I set alias grep='grep --color=auto' in my .bashrc...

7 comments:

Mau said...

Sincerely, I don't undestand the (.*)? that should match every char 0 or more time and then that expression one ore more time...???
However, did you try to use sed just putting double quotes instead single quotes? I mean:
ENVVAR=PWD
ps -ely --forest eww | sed -r "s/(.* )? ([A-Z|_]+)=.* ($ENVVAR=\S* ).*/\1 \3/ " | grep $ENVVAR


Thank you for pointing me to the grep --color option!!

P.S.: Is there the rss of the comments to this blog?

hronir said...

Ok, I'm quite new with sed, and there's more that I don't understand
than I said... Actually I don't understand also the substitution
section (I tried to print out only the environment variable I'm
interested in, putting only \3 and no \1, but sed prints almost
everything anyway...). Coming to what you say, maybe you missed the
space after the star. My idea, or what I figure out, of the "(.* )?"
is that it finds any char 0 or more times FOLLOWED BY ONE SPACE
CHARACTER, and then that expression one or more time. And this, I
suppose, would match the space-separated fields of the ps output. But
this only until it will match a VAR=value pair, because this will
match the next "([A-Z|_]+)=.*". In turns, this will match all the
pairs until the one which starts with PWD, because it match the next
"(PWD=\S* )". Yes, I don't understand why I was forced to put that
brackets to exclude the other pairs to be printed out but the PWD
ones.. But I don't understand the whole substitution mechanism...

For what concern your suggest: is it the expansion of variables
content, the only difference between single/double quotes? And what's
the advantage you are pointing out of this way? You mean to set up an
alias like the one I said but with the ENVVAR variabile in place of a
fix PWD string? In this way you still have to set/change the ENVVAR
variable before using that alias... but, you are right, it's an
improvement!


PS
Yes, you can find it here:
http://feeds.feedburner.com/hronircomments
otherwise, you can use the following service I regularly use:
http://co.mments.com/

Mau said...

Uhm... I noticed I get different behaviour on different machines... maybe it depends on sed version...

On a Debian sarge that is what I obtain:

apachexen:~# sed --version
GNU sed version 4.1.2
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.
apachexen:~# echo "ciao esempio=cavolo quando=a/merenda dove=giardino" | sed -r 's/(.*)? ([a-z].*)=.* (quando=\S* ).*/\1 \3/'
ciao esempio=cavolo quando=a/merenda dove=giardino

that is: no substitution at all!!

On my PC, Debian sid, that was what happened:

maurizio:~# sed --version
GNU sed versione 4.1.5
Copyright (C) 2003 Free Software Foundation, Inc.
Questo è software libero; si veda il sorgente per le condizioni di copiatura.
NON c'è garanzia; neppure di COMMERCIABILITA' o IDONEITA' AD UN PARTICOLARE
SCOPO, nei limiti permessi dalla legge.
maurizio:~# echo "ciao esempio=cavolo quando=a/merenda dove=giardino" | sed -r 's/(.*)? ([a-z].*)=.* (quando=\S* ).*/\1 \3/'
ciao quando=a/merenda


So... I don't think it's just a problem of understanding sed... Could it be a matter of configuration?

But... what is actually your aim?

Mau said...

Ok... I just saw you solved with perl! Brilliant!

hronir said...

Yes, the few times I get in touch with sed was always a big mess... I was supposed to start with my dear perl since from the beginning... :)
Thanx for being involved!

Edo said...

Can't you do it like this


ps eww |tail -1 | gawk -F " " ' {for (i=0;i < NF;i++){ print $i} }'| gawk -F "=" '{if ($1=="PWD") print $2}'


You just separate the fields with awk; first you separate the recursion on VARIABLE=value that are separated by a space; second you separate the fields of each VARIABLE-value pair that are separated by a "=", and you decide to print out only the line that matches VALUE="PWD" (in this case).

Would it do the job?

hronir said...

Yes, but in this way you completely loose all the (first) relevant infos of the process (the PID, the name etc etc...). In fact the most difficult part was taking apart the first fileds of the ps output from the whole list of the VAR=value pairs. My script act just the ps command alone PLUS the single VAR=value you need to know...