Finding and Reverse Engineering Deleted SMS Messages


Recovering deleted SMS messages from Android phones is a frequent request I get. Luckily, there are several places and ways to recover these on an Android phone.  After working a case that involved manually carving hundreds of juicy, case making messages, I collaborated with cheeky4n6monkey on a way to automate the process.  A huge thank you to Adrian, because I think the only way to truly appreciate the script is to do the manual work first.

That being said, in my last post Dude, Where's my Data I explored the importance of knowing what your automatic tools are doing and digging deeper as there may be critical information these tools are not parsing.   Harlan Carvey contributed a great comment which I think sums it up nicely: “Tools provide a layer of abstraction over the data itself, often hiding the data from the analyst who is not curious.”

I am not trying to give these tools a bad rap.  In fact, I use my "all in one" tools every day. However, by understanding the raw data, you can leverage these tools to help you find and understand critical data not automatically provided.

Recently I used Cellebrite to understand the structure of SMS messages, which I could then apply to SMS fragments found in unallocated space and the mmssms.db-journal file.  Although Cellebrite recovers deleted messages, it does not do so from areas outside of the SMS database (to my knowledge).  Of course, these "other places" contained the most important data for my case.

In this post, I am going to cover some common locations in the file system to recover deleted text messages.  Additionally, because the SMS structure can vary across Android devices, I am going to show how I deconstructed the SMS message, and then applied the information to SMS messages found in unallocated space.  I am sure there is more than one way to skin this cat, some may even be better; this is just the way I did.

For this example, I used a Samsung GSM SGH-T959V Galaxy S.  Even if you don't do Mobile forensics, the principles of this example can be applied to determine structured data found in unallocated space.


Where the Messages are hiding

When working with cell phones, several types of acquisitions may be taken:  logical, file system and physical.  A logical acquisition is usually the information as the end user sees it.  Text messages, call logs etc.  It does not include deleted data.  A file system acquisition is the next step up. It provides access to the files system, but not unallocated space.  A physical acquisition is a bit by bit copy of the flash memory and thus, includes unallocated space. For more information on these three types of acquisitions, check out this page on Mobile Forensics on Wikipedia.

For recovering deleted text messages a physical extraction is the best.  However, there are several locations in a file system extraction that can yield deleted text messages: the SMS Database, the SMS journal file and a log database.

SMS Database

Text messages are stored in an SQLite database named mmssms.db typically under the location /Root/data/com.android.providers.telephony/databases/. These SQlite databases retain deleted data. If you are using a program like Cellebrite, it will "automatically" recover deleted text messages from this database. However, I also manually check for fragments using a Hex viewer, or an SQLite Viewer like Oxygen Forensics SQLite Viewer. They offer a 30 day free trial if you want to play around with it. This SQlite Viewer show blocks of deleted data:



SMS Journal File
The mmssms.db-journal file is a roll back journal file written to by the SQLite Database. If this file exists, it will be in the same directory as the mmssms.db file.  It can contain numerous deleted text messages. Since cheeky4n6 monkey helped develop a script to parse this, he has done an excellent write up on the format and structure which I will post a link to once its up.  I won't go into too much detail here, except to say the text messages contain the same structure as in the SMS database.

logs.db
This file appears to be a database that logs various activities on the phone such as calls and SMS messages.  The field holding the SMS messages appears to contain the first 50 or so characters of a text message.  Because it is an SQLite database, you can view deleted data as explained above.

Unallocated Space
Glorious unallocated space - my favorite location to find deleted text messages.  I have found hundreds of deleted text messages here through various keyword searches.  If you are lucky, you can carve a whole mmssms.db SQLite database from unallocated space as explained here by Richard Drinkwater.  This will allow you to open the complete file with an SQLite viewer and view information such as the phone number, date, sent date, status and folder.  If not, this is where leveraging your tools can help you parse this information manually.

Determining the SMS structure

Case Setup:
Let's say a bowl of Trix has been eaten, and you need to determine who did it and when.  A keyword search for "trix" brings you to the following message in unallocated space (phone numbers have been changed for this example):



 As you can see from the example above, the phone number and SMS body are readily apparent.  However, just by looking at the ASCII data on the right, it is not possible to tell if this was a sent or received message. In this case, that is critical information.  If the message was sent, the suspect whose phone you're examining ate the Trix.  If this text message was received from somebody, the sender ate the Trix.  We also need to figure out what the date of the message was.

The steps I use to determine the structure of the SMS are as follows: View the SMS database to see the schema and the format of the data. Do some pattern matching using existing SMS messages and apply the pattern to deleted messages.

First, I use one of my 'all in one' tools to see if it was able to parse the existing messages on the phone. Below is a text message in the existing inbox of the phone (picture from Cellebrite):



By viewing the mmssms.db file in Hex view and locating the text message above, I can begin to see how this data is stored in a raw format. I can see information such as the phone number, body and SMSC (the phone numbers are normally 10 digits long, but messages from T-Mobile come through with a phone number of 456). I know the date information must be hiding in here somewhere:


By viewing the Schema of the mmssms.db SQLite database table that holds the SMS messages and viewing the message within the database, I can gather more information about this message:

Schema 



Message


 By looking at the database, I can see that the date is stored in Epoch, and there are some flags that look like they might correspond to whether the message was sent or received – the fields named “read” and “type”.

By using all of this information, I can begin to figure out the structure.  The order of the fields in the raw data follows the order of the SMS schema. For clarity's sake, I have only included the most common fields in the picture below:


Address:
The Phone number associated with the message.

Date Field
To determine how the date is stored, I viewed this value in the database and noted it was stored in Epoch.  I tried converting the Hex value 01 3C 5D BC 32 76 to decimal which is 1358782280310. This value matches the value in the database. Converting to UTC yields Mon, 21 January 2013 15:31:20 UTC, which matches the value displayed by Cellebrite.

Read/Type
By using my all in one tool, I looked through the existing SMS Messages and cross referenced these values to what I was seeing in the database to establish what the flag values of "read" and "type" mean.
    Read:
        00 = Message was unread
        01 = Message was read
    Type (Aka"folder"): 
        01 = Inbox
        02 = Sent

Now I can apply the same structure to the message found in unallocated space and parse it manually:


 Date: Hex value 01 39 22 20 57 66 to decimal which is 1344897308518. Converting to UTC yields Mon, 13 August 2012 22:35:08 UTC

Type:  02 which means the message Sent.

So now I have some valuable information about this deleted text message. This message was sent by the user of the phone indicating they ate the Trix, and the message was sent on August 13, 2012 at 3:35PM AZ time.

I should also mention that within Cellbrite, you can view an existing message in the mmssms.db file in Hex view.  Cellebrite color blocks data it has parsed, and when you hover over a block, it will show you what values it parsed. I wanted to show the more "manual" way first in case some other tools do not do this, or if for some reason the database was not parsed, which I have ran into. Please note, it does not do this for messages in unallocated space.

 
Depending on the phone, the SMS structure may be totally different. Also, because the information is stored in a SQLite database and deleted fragments may be located outside of an SQLite database, there does not seem to be a "header" or "magic number" to carve for.

Here are three SMS schemas from three different Android Phones. Some different fields of interest have been circled. The same method used above could be used to determine how these values are stored "in the raw" to parse deleted messages found in unallocated space.


Samsung CDMA SPH-D720 Nexus S




Samsung GSM SGH-T839 Sidekick 4G

Samsung GSM SGH-T959V Galaxy S

 
As you can see from the three phones above, each phone has a different way of storing the SMS messages which can make it difficult to write a script to "recover all deleted text messages from all Android phones" -I have left that up to the monkeys - well at least one cheeky one.

Addendum [added 02/18/2013]
Please make sure and read the comment posted by Brian below. He makes a great point.  For any of the above to work, the data generated by the forensic tool/software needs to be validated against the actual device. I had done that in this case, but failed to mention it.

[added 02/25/2013]
Adrian's script is now available, make sure and check it out.

Comments