usernamenumber: (devi)
[personal profile] usernamenumber
I've got a thing running that involves downloading a whoooole lot of data. Unfortunately the tool in question (apt-mirror) doesn't always provide frequent feedback, leading to the age-old question: is the program stuck, or just taking a while?

This turned out to be a situation where the random bits of Linux trivia I've gathered over the years came in handy enough that I thought I'd write up a quick explanation.

Ok, so...

PROBLEM:
A program is either hung, or just taking a while to download data.

SOLUTION (the short version):

  1. Use ps and less to find the process IDs (PIDs) of the program and related processes

  2. Use strace -p PID to eavesdrop on the program's communications with the OS

  3. Bonus: Use /proc/PID/fd/ to get more information about download progress



SOLUTION (details):

  1. In Linux/Unix, every program gets a numeric ID called a PID (if you're bothering to read this you probably already know that, but I might as well start at the beginning). Anyway the first thing I did was get the PID.

    Run ps -afx | less to pull up a searchable list of processes (I use less here instead of grep, because sometimes a process will have child processes with different names, which grep would omit)


  2. In less, type /apt-mirror (or whatever you're looking for). This will bring you to the first match, which in my case looked something like this:

    8028 pts/4 S+ 0:18 | \_ /usr/bin/perl -w /usr/bin/apt-mirror
    8105 pts/4 D+ 0:10 | \_ wget --no-cache --limit-rate=100m -t 5 -r -N -l inf -o /media/TUNAPANDA/apt//var/archive-log.0 -i /media/TUNAPANDA/apt//var/archive-urls.0
    8106 pts/4 D+ 0:08 | \_ wget --no-cache --limit-rate=100m -t 5 -r -N -l inf -o /media/TUNAPANDA/apt//var/archive-log.1 -i /media/TUNAPANDA/apt//var/archive-urls.1
    ...and a bunch more of these...

    So, 8028 is the PID of the master process, but the actual work is being done by a bunch of wget processes that it spawned, like 8105 and 8106.


  3. Now, the question is: are those wget processes hung, or just taking a while?

    If you've ever wished you could just stick your head inside the computer and see what a program is doing, and you don't already know the power and glory that is strace, today is your lucky day.

  4. Whenever a program wants to interact with hardware (saving data from the network, for example),it has to ask the OS to act on its behalf, since only the OS can talk directly to hardware (generally speaking). Programs use special functions called system calls to make these requests.

    The strace tool shows you the system calls that a process is making in (practically) real-time. The -p option lets you specify the process by PID, so let's grab one of those wget instances and take a look:



    $ strace -p 8106
    Process 8106 attached - interrupt to quit
    select(5, [4], NULL, NULL, {900, 0}) = 1 (in [4], left {899, 999992})
    read(4, "\n\246\32\214\317\376/\276\276\306;\376l\24\316B]\247I\24\324/\334\351\310\35v\312S\21c\t"..., 8192) = 8192
    clock_gettime(CLOCK_MONOTONIC, {56434, 320446651}) = 0
    write(5, "\n\246\32\214\317\376/\276\276\306;\376l\24\316B]\247I\24\324/\334\351\310\35v\312S\21c\t"..., 8192) = 8192
    ... and lots more...

    This will keep producing output as long as the process is running. You can get out of it with Ctrl+c


  5. If you've never seen strace's output before, this might look like a lot to take in, but as is often the case in Linux administration, the key isn't understanding every little bit of output, but knowing what to look for and ignoring the rest.

    In this case, here's what you need to know...


    • Each line represents a system call, in the format system_call_name(arguments)

    • ...so, in the excerpt above we see examples of the select, read, and write system calls.

    • ...this alone is enough to answer our question. The program is reading from one location (presumably the network), and writing to another (presumably the disk), which is exactly what it's supposed to be doing. The process is <i>not</i> hung, I'm just impatient. Good to know.


  6. However, you can take this a step further! The first argument to the read() and write() system calls is a file descriptor or FD. Every time a process opens a data source for reading or writing, it is assigned an FD. We can see from the strace output that wget is reading from FD4 and writing to FD5. Perhaps it would be useful if we could see what those point to, maybe even use them to track progress.

    Now enters the field of battle another powerful ally: the /proc pseudo-filesystem. /proc is crazy cool if you like being able to dig into the fiddly bits of your OS. Basically it looks and acts like a tree of folders and files that live on your hard disk, but it's actually not (hence, pseudo-filesystem). /proc is actually just a clever way for Linux to share real-time information about the processes running under it with you. If you look in /proc, you'll find that most of the directories therein have numbers for names. These numbers correspond to running process IDs. So, if we want more info on PID 8106...


    $ ls -l
    attr cmdline environ latency mem ns personality smaps syscall
    autogroup comm exe limits mountinfo oom_adj root stack task
    auxv coredump_filter fd loginuid mounts oom_score sched stat wchan
    cgroup cpuset fdinfo map_files mountstats oom_score_adj schedstat statm
    clear_refs cwd io maps net pagemap sessionid status

    Each of these "files" contains some kind of information about the process. I won't go into all of them here (you're welcome), but they're all nicely documented if you run man proc


  7. Notice that there is an fd directory in there. Do an ls -l on it, and...

    $ ls -l /proc/8106/fd
    lrwx------ 1 teacher teacher 64 Mar 18 13:50 0 -> /dev/pts/4
    lrwx------ 1 teacher teacher 64 Mar 18 13:50 1 -> /dev/pts/4
    lrwx------ 1 teacher teacher 64 Mar 18 13:49 2 -> /dev/pts/4
    l-wx------ 1 teacher teacher 64 Mar 18 13:50 3 -> /media/TUNAPANDA/apt/var/archive-log.1
    lrwx------ 1 teacher teacher 64 Mar 18 13:50 4 -> socket:[189936]
    l-wx------ 1 teacher teacher 64 Mar 18 14:29 5 -> /media/TUNAPANDA/apt/mirror/archive.ubuntu.com/ubuntu/pool/main/l/language-pack-gnome-ko-base/language-pack-gnome-ko-base_12.04+20120417_all.deb

    The fd directory shows all of a process' file descriptors, represented as symbolic links (basically, shortcuts) to show where each one is pointed.

    Remember how strace showed that wget was reading from FD4 and writing to FD5? Now we can see that FD4 is a network socket, and FD5 is the specific file that is being written to the disk right now.


  8. Ok, one last trick. I decided I wanted to keep a closer eye on the progress of the download(s), so I did this:

    $ watch -n1 'for i in /proc/8106/fd/*; do ls -l $i ; du -Dh $i; echo ""; done'
    lrwx------ 1 teacher teacher 64 Mar 18 13:50 /proc/8106/fd/0 -> /dev/pts/4
    0 /proc/8106/fd/0

    lrwx------ 1 teacher teacher 64 Mar 18 13:50 /proc/8106/fd/1 -> /dev/pts/4
    0 /proc/8106/fd/1

    lrwx------ 1 teacher teacher 64 Mar 18 13:49 /proc/8106/fd/2 -> /dev/pts/4
    0 /proc/8106/fd/2

    l-wx------ 1 teacher teacher 64 Mar 18 13:50 /proc/8106/fd/3 -> /media/TUNAPANDA/apt/var/archive-log.1
    1.9M /proc/8106/fd/3

    lrwx------ 1 teacher teacher 64 Mar 18 13:50 /proc/8106/fd/4 -> socket:[208541]
    0 /proc/8106/fd/4

    l-wx------ 1 teacher teacher 64 Mar 18 14:55 /proc/8106/fd/5 -> /media/TUNAPANDA/apt/mirror/archive.ubuntu.com/ubuntu/pool/main/l/language-pack-gnome-ml-base/language-pack-gnome-ml-base_12.04+20120417_all.deb
    640K /proc/8106/fd/5

    ... refreshed every second ...

    Basicaly, it does the following:


    • Once every second... (watch -n1 '...')

    • For each of the process' FDs... (for i in /proc/8106/fd/* ; do ... ; done)

    • Show where the FD is pointing... (ls -l $i)

    • ...and the size of the the thing it's pointing to (du -Dh $i)

    • ...followed by a blank line, just to make things look nicer (echo "")


    This provides a nice way to keep an eye on what the process is writing where (and roughly how quickly).




This whole experience really reminded me of a lot of the reasons why I love working in Linux. It's not always the most intuitive or friendly environment, but the more progress you make along its learning curve, the more interesting little nooks and crannies there are, and the more useful things you can do with them!
This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

usernamenumber

October 2016

S M T W T F S
      1
2345678
9101112131415
16171819202122
232425 26272829
3031     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 4th, 2025 02:42 pm
Powered by Dreamwidth Studios