personal financial analysis
We’re buying a house. We’re being fairly conservative on the amount of house we’re buying so as not to impact our lifestyle too much as we really enjoy it. That being said, I had a desire to crunch the numbers and see how I am spending my money.
First, I needed to determine what I was going to use as a source for my data. I specifically wanted to get the transaction details for my checking account. Luckily, my bank allows you to download up to a year’s worth of transaction history in Quicken’s .qif format. I downloaded everything since July 1, 2008. I opened the file with my text editor to see what I was dealing with. It was a fairly straight forward text-based format. I could have parsed it myself with PERL, but I decided that someone has probably already done this and there was no sense in re-inventing the wheel. I jumped on CPAN and found the Finance::QIF module. I copied the example from the CPAN page verbaitem and ran it against my file. It dumped the details of my history to STDOUT.
This was nice, but I didn’t want to have to write the code to do the calculations and stuff…I wanted to load it into a mysql database. I could have done this load via the DBI module, but I opted to just dump the data to a tab-delimited text file that I could then use the MySQL loader to ingest. I decided to do it this way because this was a one time analysis and the SQL to load data from a file is pretty easy, much easier than writing code to process a file and make a bunch of DB inserts.
I processed the file twice. First, I pulled out all of the unique payees (places I’ve spent money). I opened the resulting text file up in Excel and went through a painstaking process of categorizing each entry. For example, I created a Restaurants - Fast Food
category for places like McDonalds and a Gas
category for service stations. I then told Excel to save this file as a tab delimited text file. I loaded this data into a table called payee_category
that had two fields: the payee and the category.
Next, I processed my input file again and dumped the date of the transaction, the payee, and the amount into another tab delimited text file. I loaded this data into a table called transactions
. Now I could do cool stuff like group my transactions by category for a certain time period. To do this, I used a query similar to this:
SELECT
pc.category,
SUM(t.amount)
FROM
transactions t,
payee_category pc
WHERE
t.payee=pc.payee
AND t.date BETWEEN '2009-07-01' AND '2009-07-31'
GROUP BY pc.category;
This would give me a result set like the following (I’ve changed some of the data and omited some categories):
+------------------------+---------------+
| category | SUM(t.amount) |
+------------------------+---------------+
| Bills - Cable | -169.30 |
| Bills - Cell Phone | -102.38 |
| Bills - Insurance | -263.10 |
| Car Payment | -430.00 |
| Cash | -43.75 |
| Clothing | -15.00 |
| Coffee | -4.40 |
| Gas | -53.56 |
| Grocery | -417.83 |
| Gym | -58.64 |
| Movies | -19.50 |
| Restaurant - Fast Food | -25.22 |
| Restaurant - Sit Down | -41.99 |
| Retail | -18.00 |
| Student Loan | -200.00 |
| To Savings | -700.00 |
| Toys | -44.68 |
+------------------------+---------------+
Here’s the DDL for my tables:
CREATE TABLE payee_category (
payee VARCHAR(255),
category VARCHAR(32)
);
CREATE TABLE transactions (
date DATE,
payee VARCHAR(255),
amount DECIMAL(10,2)
);
Did I learn anything from this exercise? Yes, and no. Over the course of a year, I found that we spend slightly more money eating out than we do on eating at home. I also found I spend about 20% of my take-home pay via my credit card. I hardly ever carry a balance on my credit card from month to month, so this means that I’m simply using my credit card to pay for stuff and then paying it off quickly. This makes me want to do a similar analysis on my credit card transactions to see where I’m spending the money, but I haven’t had a chance to see if my credit card company offers a similar transaction details download.
Another thing I learned - well, not really learned but more reinforced by this exercise - was the power of CPAN. It took me almost no effort to set up and parse the data file. The most tedious task was going through and categorizing the 200+ payees manually.