Backtrack 5r3: Capturing Voice from Remote Mic and Converting it to Searchable Text

Okay, I introduced the cool capability of using Metasploit to capture remote voice via mic and then converting it into keyword searchable text in the last post. As promised, we will take a closer look at setting it up to work on Backtrack 5r3 in this post.

I am going to warn you up front, this can be quite a process, but well worth it.

In this tutorial we will be using a Windows 7 laptop as our target, Backtrack 5r3 as our “attacker” system, the Social Engineering Toolkit (SET), Metasploit, AT&T’s voice to text developer platform, and a proof of concept AT&T interface script by Metasploit developer Sinn3r.

Getting a .Wav file from Remote Mic

First thing your going to need is a remote shell. I have covered this A LOT on this blog, so I won’t spend time on it here. In this instance, I just used the Social Engineering Toolkit (SET) to create a Java based backdoor session to the Windows 7 Laptop:

Active Sessions 2

Just connect to the session by typing “session -i 1” and then type “record_mic” at the Meterpreter prompt. This will turn on the remote mic and record any sound and save it as a .wav file on the Backtrack 5 system.


Okay, let me stop right here for a minute. When you run “record_it” for some odd reason it only records 1-3 seconds of audio. Not really a lot of time to get anything useful from it.

(NOTE: You can use the post module “record_mic” mentioned in sinn3r’s article if you would like. I just found that running the built in script one is easier. And yes, they are a bit different, even though the name is the same.)

So, one thing we need to do is change the “record_mic” script so it will provide us some useful length .wav files.

It took me a while to find the actual “record_mic” script. The problem was that it isn’t in its own file, but included in the STDAPI webcam.rb script file!

The easiest way to find it is to perform a drive search for the file: “webcam.rb”.

It should find several, we are looking for the one in the STDAPI directory. Once you find it, edit the file and look for the following section:

Record_Mic Change

As you can see, here is our problem. The recording duration is set to one second! Change this to something more reasonable and save it.

I chose 20 seconds on mine:

Record_Mic New Value

Okay, now when we run “record_mic” we will get 20 seconds of recording time instead of a whole one second.

Much better!

Because we used SET to create the backdoor, it will save any .wav file to the “Program Junk” directory as seen below:


We now have a sound recording from the target laptop. That is actually all we need from the target system.

Setting up the AT&T Program Interface

The next thing we need to do is feed that .wav file into AT&T’s Speech to text system. So, let’s take a look at getting Sinn3r’s program interface to work on Backtrack 5.

If you haven’t done so already, grab Sinn3r’s proof-of-concept program from the link on the Metasploit article page. You will need a couple things to get the program to work right:

  1. You need to sign up for the AT&T Developer’s Free Trial
  2. AT&T will give you an API and Secret key you will need these later.
  3. You need to install ffmpeg to convert the Metasploit .wav files into AT&T readable files.
  4. You will also need Ruby Gems “att_speech” module
  5. and Ruby 1.9.3 installed

I’ll let you figure out steps one and two, they should be self explanatory.

For step three just install ffmpeg by typing, ‘apt-get install ffmpeg.’

Steps four and five can be a pain.

When you download the Gems “att_speech” module and try to install it by typing “gem install att_speech” command you are probably going to get this error in BT5r3:

Ruby Celluloid Error

You need Ruby 1.9.2 or greater installed. That is really odd, as you most likely already do.

If you type “update-alternatives –config ruby” you will see all the Ruby installed versions as below:

Ruby is 192

Well, looky there, we HAVE 1.9.2 installed!

What to do?

I got around this by installing Ruby 1.9.3. To do it the easy way, I just installed RVM:

(Note: I followed step 1 from this website to get RVM installed.)

RVM Install 1

RVM Install 2

RVM Install 3

RVM Install 4

RVM Install 5

Now that we have Ruby 1.9.3 installed, let’s try “gem install att_speech” again:

RVM Install 6


Almost done now!

Okay, if we have Sinn3r’s script, Ruby updated and ffmpeg installed, we should be all set.

Well, not quite. If you are running on OSx, the script will buzz right through and work great. On Linux, not so much. There are a couple more changes we need to make.

First set ffmpeg as the audio decoder and then remove the .tmp extensions in the code or it will confuse the poor ffmpeg program. I’ll make it easy for you, just open Sinn3r’s script, find the ffmpeg section and make it look like this:

wav analyzer change mmpeg tmp to wav

Now we are ready!

Running the AT&T API Script

Now we need to execute the script, don’t forget to point it at the directory that contains the “record_mic” .wav files, give it an output directory to store converted files in and most importantly, put in your AT&T provided API and secret numbers:

Ruby ./d3v_wav_analyze.rb -i ~/.msf4/loot/ -o /tmp -a [API_KEY] -s [SECRET_KEY]

When run it will look something like this:

wavalyzer working on Backtrack

As you can see it worked!

Here is a closeup of the voice that was captured remotely from the mic, turned into text and keyword searched by the AT&T system, looking for the keyword – “Password”:

Backtrack Voice to Text Close Up

Okay, I must confess, the process isn’t perfect…

I said, “open secured file, password is 7743-9824…” and it translated it to “Open picture file, password is”

You will also notice that the AT&T program thought the password was a phone number, so it tagged it as a PHONE variable, dropping the first number.

Was it perfect no…

And the cool thing is that it knows it wasn’t perfect in the translation. Notice the confidence rate: .480…

Basically it knew that it only got about 50% of the translation right. But when you think that Metasploit grabbed the voice from a remote laptop, converted it to a wav file, uploaded it to AT&T’s voice recognition software that converted it to searchable text, and correctly found that I said “password” – That is amazing!

I’ve had mixed results with the translations, one as low as 11%. But if you look at the Metasploit article, when Sinn3r tried it he got up to a 95% translation rate!

Pretty amazing stuff, and I am sure it will only get better. Hopefully you can see how this could be used to do other security related things. I can already think of another pretty sweet idea how this could be used and will hopefully get another tutorial up by the end of this week or next… If it works, lol!

Sorry about the length of this tutorial. But in all things in Security, nothing is perfect and I wanted to save you a ton of time by explaining the workarounds for the snags that I encountered when trying to get it to run on Backtrack 5.

I hope it works well for you and you enjoy it!

(For securing against this type of attack, don’t let your users run Java! Also, disable the webcam and microphone on your system if they are not needed!)

One thought on “Backtrack 5r3: Capturing Voice from Remote Mic and Converting it to Searchable Text”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.