Saturday, July 27, 2013

Command-line Sound Recording

Chances are if you're running the common varieties of Linux or Windows or MacOS you've got a GUI tool already for recording from either your sound card or an internal/external microphone or whatever.  Turns out I don't since a while back I gave up on my former infatuation with Ubuntu and Xubuntu.  Switching to Crunchbang is a decision I would make again in a heartbeat, but it does mean that I'm back to the days of searching for tools and deciding between options rather than finding them all pre-installed.

This is, for what it's worth, a good thing in my mind.

So my situation is a little complex.  The summary is I wanted to record a phone call and turn it into a sound file that's in some way useful to me.  That is, some raw audio format or high quality MP3 or something.  Should be easy enough, right?  Time was I'd just use Google Voice to record my call, or Google Talk, but the former has become much more difficult to use in Canada these days and the latter no longer appears to support recording of outgoing calls.  I searched, trust me.  I suspect that's a workaround for problems they had with enabling recording and having someone try to dial in to conference bridges or use automated menu systems or such.  I don't miss the loss of that feature at all, by the way, except in this one scenario.

Okay, so I can't use Google Voice or Google Talk.  Next obvious option is hunt through the Google Play store and find something to record calls there, then make the call on my phone.  I won't link to the various apps I tried, I'm sure most of them are quite good and have probably earned their 4+ star ratings.  I tried six different ones before I gave up on that angle.  Not a single one successfully recorded any audio on my phone.  Well, two of them might have, but they only produced 3gpp files and I wasn't able to find anything that played back 3gpp files for me on my laptop.  Or if I did, those files were full of silence as well, it's kind of tough to tell.

So back to the desktop, then.  There are a lot of audio recorder programs in Linux.  The best of them, though, either seem to depend on Gnome or KDE and there is no way I wanted to drag in all of that onto my nice, relatively clean platform just to record a single phone call.  That's ridiculous.

Back to searching, then.

Now anyone who talks to me for any length of time about operating systems knows I absolutely hate two things about Linux.  All of them suck without qualification of any kind.  Printers.  Printing in Linux is a disaster and it shows little sign of improving over where it was in the late 1990s when I started in with Linux.  And sound.  When it works well, it's okay.  Often, though, it barely works or produces inconsistent results.  That infuriates me.  Almost enough to make me learn something about sound architectures in Linux and try to fix one of them enough to be usable.  The barrier, there, is the distributed nature of sound in Linux.  It's not like "sound" is even a thing.  There's kernel support for at least two architectures, and there's userspace tools that interact with some set of those architectures.

Of those tools, the one that I have the most distaste for -- though probably because it is the one I encounter the most, not because it's any worse than any other -- is Pulse.  Pulseaudio is gawdawful.  The only redeeming quality it has, if I can feel that it has any, is it is installed by default on my last three chosen distributions and oftentimes it successfully manages to produce sounds from my speakers.  Frequently it is even the sounds I want it to produce (though if I ever sort out the agony it is currently causing me when playing music from Chrome, I'll have another post about that little slice of hell...).

But today it actually turns out to be a good thing for me.  Some refinements on my search criteria turned up pacat.  I've never heard of this thing before, but within a minute of finding the manpage I was done, I had exactly what I needed.

Since I wanted to capture audio that was playing through my speakers, I was looking for monitor on an alsa output device (I knew ahead that I was using Alsa, if you don't know if you're using Alsa or OSS or something else, you'll need to google that too, I suppose), so I did this:

% pactl list | grep Monitor\ Source
Monitor Source: alsa_output.pci-0000_00_1b.0.analog-stereo.monitor

And searched for monitors.  In my case, there happened to be only one.  That helps.  So then I make my phone call and start up pacat with flac as the file format, because why not?  The call is short and I wanted good quality:

% pacat --verbose --device=alsa_output.pci-0000_00_1b.0.analog-stereo.monitor\
--record --file-format=flac phone.flac
Opening a recording stream with sample specification 's16le 2ch 44100Hz' and channel map 'front-left,front-right'.
Connection established.
Stream successfully created.
Buffer metrics: maxlength=4194304, fragsize=352800
Using sample spec 's16le 2ch 44100Hz', channel map 'front-left,front-right'.
Connected to device alsa_output.pci-0000_00_1b.0.analog-stereo.monitor (0, not suspended).
Got signal, exiting

And there you are.  phone.flac is a capture of what was playing on the speakers from the time I started until the time I ^C'd it.

Definitely want to remember this one.  By far the easiest way to record audio in Linux I've ever encountered.  So, y'know, score one for pulseaudio.

Thursday, July 25, 2013

Stuipd shell trick (or: who's been up-chucking all over my /tmp?)

This isn't actually something going mental on my own machine, though it could very well have been, I suppose.  I'm doing a lot of work with bitbake these days for work.  One of the features I've been spending time with is the PR Service, which, in the absence of any other configuration changes, starts up a process that queries and feeds an sqlite3 database with hashes and revision numbers.  The intent is to get around problems that arise when you don't properly update the PR value in your recipe.

And if that's all gibberish to you, don't worry, that's just setting the stage.

Here's the thing.  If the server doesn't shut down cleanly -- for whatever reason -- it leaves a pidfile in /tmp.
Aside: I briefly considered providing a patch to bitbake that would move the pidfile out of /tmp mostly on religious grounds. I'm offended by anything that leaves plain files in /tmp as a matter of course, but I opted to not bother since I didn't think most people would care or would welcome the change if I did submit it and really, so long as you're not having problems with your build, bitbake actually does clean up after itself and removes the pidfile. The problem is, of course, the only reason I get dragged into this stuff is when builds don't go well. So, observer's bias, I guess.
So then I log in to a particularly ill machine and discover this:
% ls -l /tmp/PRServer_127.0.0.1_* | wc -l599
... yeah.  We do not have 599 active PR Servers running on this machine.  But it's also, let's say, challenging to find anything I actually happen to care about in /tmp now, thanks to this heap of detritus.

Ah.  At last we're getting near the point.

The pidfile has, encoded in the name, the PID of the server.  So for starters, let's collect up a list of potential PIDs.  I figure that's a good first step, then I can look to see if those PIDs are still active and, if they are, if they happen to be a PR Service (since given the length of time this has obviously been running amok, there's no reason to assume that an active PID is actually the same one that was given to the PR Server in question).

First step is easy.  Of course I typed this all on the command line, I don't actually create scripts ... well, much of ever, honestly.  I create shell functions more often than that by far, but I don't even bother enshrining most of these quick hacks in functions.  If they age out of my shell history, I recreate them from scratch.  There's no magic here, anyway.  But for the sake of readability, let's look at this as if it were a script or function:
for i in /tmp/PRServer_127.0.0.1_*
    cat $i
Why not just 'cat /tmp/PRServer_127.0.0.1_*' you ask?  Because that destroys any ability I have to operate on individual files.  Bear with me.

Right.  Got a list of PIDs now.  Conveniently, 599 of 'em.  Now what?

My first thought was send something relatively benign like SIGCONT to each, check the return value and hopefully harvest some info from each of them at the same time about their command line so I could tell which of the active ones were still the processes I was interested in.  That would look something like this:
for i in /tmp/PRServer_127.0.0.1_*
    for j in $(cat $i)
        if [[ kill -SIGCONT $j ]]
            # appears to be an active process, check it
            if [ -n "$(ps aux | grep $j | grep -v grep | grep prserv)" ]
                # active and a PR server, probably the right one.
                # remove the dead pidfile
                rm /tmp/PRServer_127.0.0.1_$
Don't use that.  It almost certainly won't work.  I never even typed it into the shell.

I got thinking a bit more about this.  Considered using -p rather than SIGCONT, googled a bit for some tool to help, then thought:  this is dumb, I have the info I need in the shell.  So here's what I ended up with:
for i in /tmp/PRServer_127.0.0.1_*
    for j in $(cat $i)
        if [ -d /proc/$j ]
            if grep -q bitbake /proc/$j/cmdline
        rm $i
So how'd that work out?
% ls -l /tmp/PRServer_127.0.0.1_* | wc -l9
Still a suspiciously large number, but one I can live with.
Turns out, though, there's another (in this case better) way to do it.  The pidfile contains a port number where the server is listening.  If there's nothing still listening on that port, it's safe to assume the server is dead.
for i in /tmp/PRServer_127.0.0.1_*
    for j in $(basename $i | awk -F_ '{ print $3 }' | cut -f1 -d.)
        if netstat -l | grep -q localhost:$j
            rm $i
This relies on behaviour of the PR Service, though, whereas my first attempt works for any ridiculously large list of processes.  I think I like it better, though, because I like anything that lets me bring awk into the equation.

Tuesday, March 12, 2013

find sucks

There are certainly times when find is useful, but nearly every time I end up doing something like this:
# find . -type f -name foo
That's my age showing, probably. Regardless, one of the old 2¢ Tips I remember from back in the day that I hardly ever used until recently was using locatedb to find stuff on your disk.

Yes, there's absolutely a lot of options in "modern" distributions for finding files, often going as far as indexing contents and letting you search a single interface to find them all. That's great for a lot of people, I'm sure. But I'm a software developer most of the time, meaning I would rather have my machine sit idle most of the time than be busy doing something else when I want it to be compiling. Similarly, I need my disk space because I'm going to have a lot of intermediate files. They don't live long, sure, but if I run out of disk space in the middle of a build, that's kinda bad. So I don't want all-singing-all-dancing indexers running in the background (or worse still, on boot) and leaving databases lying around full of stuff I'll never be searching for anyway.

But back to the point.  The old tip that I've only recently started adapting for my own purposes.  The essence of it is this:
Create a custom locatedb containing only the stuff you want to search, update it on demand, then search that.
That's it.  So the simplest possible implementation of the idea is this:
# cd $HOME
# updatedb -o $HOME/.homedir.db -l 0 -U .
That's it.  After that, all subsequent searches are of the form:
# locate -d $HOME/.homedir.db --regex ".*foo$"
Simpler?  Maybe not.  Faster?  Actually, not greatly on a local disk with a small hierarchy.  Consider that by small I mean something like this: 
# locate -S -d $HOME/.homedir.db
Database /home/joe/.homedir.db:
        24,135 directories
        196,394 files
        16,616,477 bytes in file names
        6,598,887 bytes used to store database
For example, my search for all mp3s in my home directory results in this:
skynet ~ time locate -d $HOME/.homedir.db --regex ".*mp3$"
[output deleted]
    1.83s real     1.73s user     0.00s system

skynet ~ time find . -type f -name  "*.mp3"

[output deleted]

    0.55s real     0.23s user     0.28s system
Not really a win.  But let's look at something more "real world".  In my case this is a project source tree.  Any mp3s in there?
turd src time locate -d ./.src-list.db --regex ".*mp3$"
    0.65s real     0.65s user     0.00s system
turd src time find . -type f -name  "*.mp3"
   83.28s real     1.40s user     4.15s system
Nope.  Didn't expect any, but there's the speed-up we were looking for.  FWIW, the database is significantly different than my home directory one:

turd src locate -S -d ./.src-list.db 
Database ./.src-list.db:
40,534 directories
412,043 files
51,894,579 bytes in file names
11,188,602 bytes used to store database
turd src time locate -d ./.wrlinux-list.db --regex ".*layer.conf$"
[output deleted]
    0.56s real     0.56s user     0.00s system
turd src time find . -type f -name "layer.conf"
[output deleted]
    1.25s real     0.60s user     0.65s system

So there, again, we see considerable speed-up.  If you're unfortunate enough to be doing anything over shared filesystems, you're going to see improvements in the range of a couple of orders of magnitude.

Beware:  I'd been using this trick for a while before I realized just how fragmented the *locate world has become.  What I have here works for mlocate but not necessarily for any other version.

Also, note that this isn't quite the same as the old tip that first enlightened me about locate.  That was for indexing CDs (or, more likely, floppies) where you wanted to be able to search a library of them without manually popping each into the drive.  That still works, though I'm hard pressed to think of a practical use for it now that we live in the age of the cloud.  But for the sake for conversation and completeness, here's how that worked.

  1. Mount your CD to a custom location (eg. /mnt/cd-label-1)
  2. updatedb -d $HOME/cd-database.db -l 0 -U /mnt/cd-label-1
  3. search at your leisure as above
Now you need only search the database that the path name will actually tell you what CD your files are on.  Except who backs stuff up to CD anymore in the age of cheap, effectively infinite cloud storage?  Hmm.