Project Home

Writing Ledger in Python

Author: Martin Blais <blais@furius.ca>

Contents

Why I'm Rewriting Ledger

IMHO 'Ledger' is slightly falling short in the following ways:

([Re-)Implementation of Ledger / TODO

Ideas for Reconciliation

  • How difficult would it be to make it possible to find two transactions with imbalances and the same dollar amount and to propose merges and to pair them up automatically? Not very. This could even be a cmdline tool working on a database.

Implementation details

  • I want to talk about a Python version of the parser, which would be slow, but I have a way (that I used in Nabu) to do incremental loading of text chunks by using a checksum. I want to discuss this with you.
  • C++ is ... well... C++ sucks. I've managed to coredump ledger a few times today, but I think the bug is in the parsing, because it's easy to fix by twiddling the syntax. CL is really hard to deploy and sucks in different ways (ways that Schemers will be quick to point out). Why not try Python or Haskell? Python is slow, but I have a few tricks I could suggest for incremental updates, and besides, does it really have to be so fast? For me as long as it's < 3 seconds it's ok. I'll be writing a prototype parser.

Notes:

  • Make the account primary key include the base security

  • Rename the confusing "journal" to "posting" in db model

  • remove the "memo" column, merge with the description.

  • consider naming posting description as "payee"

  • do not support a date+time syntax for importing entries right away. We need only make the transation have a date/time for now.

  • Do not support times for now either, let's see if dates are good enough.

  • I like to keep the signs pure (unlike in accounting), it will make calculations easier, and it avoids the need to make accounts into a debit account vs. a credit account (other than for the purpose of reporting).

  • if you leave the last entry empty, it should automatically complete.

  • implement "50 AAPL @ 101.52 USD" syntax

  • support a comment line

  • support automated rules, like "=" rules, that automatically add components to transactions.

  • how does it deal with stock splits?

  • if loading the entire database is too slow,

    1. see if we can avoid parsing the text file by simply computing checksums on each paragraph, that would be saved in the DB and cross-checked.
    2. how about adding an --update switch, and we could select-region and send it just the update to be made?
    3. how about checksumming every line in the file, and storing that in a temp file for figuring out which updates to send?
  • each entry could have a potentially different syntax

  • support a "cleared" flag: Cleared transactions are indicated by an asterix placed just before the payee name in a transaction. The meaning of this flag is up to the user, but typically it means that an entry has been seen on a financial statement. Pending transactions use an exclamation mark in the same position, but are mainly used only by reconciling software. Uncleared transactions are for things like uncashed checks, credit charges that haven't appeared on a statement yet, etc.

  • Real transactions are all non-virtual transactions, where the account name is not surrounded by parentheses or square brackets. Virtual transactions are useful for showing a transfer of money that never really happened, like money set aside for savings without actually transferring it from the parent account.

  • flag auto-generated postings vs. "actual" postings; this can be useful for generating a budget report.

  • maybe we can support a statement that "declares" all the valid accounts, so that validation can occur later and also partial/regexp matching.

  • support tags:

    ... ; :holidays:

  • entering a transaction similar to a previous one: support an "entry" command:

    Here are a few more examples of the 'entry' command, assuming the above journal entry:

    ledger entry 4/9 viva 11.50
    ledger entry 4/9 viva 11.50 checking # (from `checking')
    ledger entry 4/9 viva food 11.50 tips 8
    ledger entry 4/9 viva food 11.50 tips 8 cash
    ledger entry 4/9 viva food $11.50 tips $8 cash
    ledger entry 4/9 viva dining "DM 11.50"
    
  • look at detailed file format description, lots of clever ideas in there.

Testing

About testing for speed:

(16:36:46) johnw: that results in about 400,000 transactions
(16:36:53) blais: 400k transactions, I'll try that.
(16:36:58) johnw: I make a 10Mb file, a 50Mb file, and a 100Mb file
(16:37:09) blais: Jey, Haskell is a beautiful langauge for this kind of problem.

Proposal for New Directives

Some ideas that I will implement in my Python parser and which you may or may not find useful. I have implemented some of these as comments already. They are complementary to the existing functionality.

Account Declarations

Declaration of valid accounts:

I'm afraid of mistyping the name of an account, or to accidentally forget to rename accounts when I move stuff around. I know it'll show up in the balance sheet hierarchy if I generate it, but I would much rather restrict the set of valid accounts and have the parser check for them. Does it not make sense to add a directive to declare account and an option to enforce that all accounts match one of those previously declared? This should be optional, obviously, I still like the power of just having accounts appear out of just using them (although IMO that should be the alternative behaviour).

  • Also related, but I'd like to be able to restrict the kinds of commodities that can be contained by some accounts. The same declaration should allow the user to do that.
  • I would like to keep the sort order of the hierarchy of accounts intact, and a declaration would solve that problem by establishing a priority.
  • Accounts could also be declared as 'debit' or 'credit' accounts, for the purpose of later reporting things in the traditional way.
  • I want to do this without making my files incompatible with Ledger, but I would prefer to not make them a "comment" either. Is there a good way?

As mentioned above, I want to validate that my entries always fit a fixed set of accounts, which I declare near the top of the document. Moreover, those declarations further constrain the kinds of commodities that can be deposited in an account (possibly wildcard). It looks like this:

;X De  Assets:Current:Cash
;X De  Assets:Current:RBC                        CAD
;X De  Assets:Current:RBC:Savings                CAD
;X De  Assets:Current:RBC:Checking               CAD
;X De  Assets:Current:RBC:US-Checking            USD
;X De  Assets:Current:HSBC                       USD
;X De  Assets:Current:HSBC:Checking              USD
;X De  Assets:Current:HSBC:Secured               USD
;X De  Assets:Current:HSBC:Savings               USD
;X De  Assets:Loans
;X De  Assets:Loans:Loan-Pierre-Blais            CAD
;X De  Assets:Fixed:Home                         CAD
;X De  Assets:Investments
;X De  Assets:Investments:Furius
;X De  Assets:Investments:RBC-Broker
;X De  Assets:Investments:RBC-Broker:Account-USD         USD
;X De  Assets:Investments:RBC-Broker:Account-CAD         CAD
;X De  Assets:Investments:RBC-Broker:Account-RSP         CAD
;X De  Assets:Investments:HSBC-Broker
;X De  Assets:Investments:OANDA
;X De  Assets:Investments:London-Life-Policy             CAD
;X De  Assets:Investments:Private
;X De  Assets:Investments:Private:Safehouse Shares       AUD
;X De  Assets:Investments:Private:Safehouse Options      AUD
;X De  Assets:AccountsReceivable
;X De  Assets:Furius-Expenses
;X De  Assets:Furius-Expenses:ForVISA
;X Cr  Liabilities
;X Cr  Liabilities:AccountsPayable
;X Cr  Liabilities:RBC
;X Cr  Liabilities:RBC:Credit-Line
;X Cr  Liabilities:RBC:Mortgage
;X Cr  Liabilities:RBC:Mortgage:Loan
;X Cr  Liabilities:RBC:Mortgage:Credit-Line
;X Cr  Liabilities:Credit-Card:RBC-VISA
;X Cr  Liabilities:Credit-Card:HSBC-MasterCard
;X Cr  Equity
;X Cr  Equity:Opening-Balances

Note that I could simplify further:

  • If the commodity is unspecified, any commodity can be inserted in that account. Lists of allowable commodities can be specified as a comma-separated list.
  • A child account's commodity should default to that of its parent.
  • Note that accounts are marked as "debit" (De) or "credit" (Cr) only for reporting purposes; you still enter the numbers with the appropriate sign.

I'm also toying with the idea of giving accounts longer, more descriptive names, which would be used in reporting, but I'm not sure it won't just make things more complicated than necessary. Each account as a specific name that the bank uses for it, for example "High-Power e-Savings Account". I'd like to be able to see those in the final report. Maybe I can leave it out of the ledger format and just use a join to an anciliary table in my reporting code.

Account number mapping

I wrote a script that converts an OFX file into Ledger syntax. I simply insert that file in my document in order to reconcile that account. The import script does need to be able to map each OFX account-id to a Ledger account, so in the file, I added this directive:

;  account-id              account-name
;M 000067632326            Assets:Current:RBC:Checking
;M 000023245336            Assets:Current:RBC:US-Checking
;M 000018783275            Assets:Current:RBC:Savings
;M 3233762676464639        Liabilities:Credit-Card:RBC-VISA

"Check" feature

Here is a cool idea: on my statements, I sometimes have precise amounts that each account should be at, for examnple, the statement will give me the "Closing Balance" amount.

Right now, I enter all the transactions and "visually" make sure that those amounts match the ones from my statements (usually there is a warm fuzzy feeling that ensues)... but it would be super nice if I could just insert a check in the input file, something like this:

@check 2008/01/31  Assets:Investments:RBC-Broker:Account-CAD   121.41 CAD

This would instruct Ledger to assert amounts at those precise dates. I'm going to add this to my Python version.

johnw:

This is a cool feature, but the only reason I wouldn't do this in the data format is that I don't compute running totals for each account (expensive) while parsing the data file.

However, one could very easily write a script to validate amounts after certain dates. It would just have to invoke Ledger to do the calculations for each account you care about, and then verify the amounts returned at certain points in time.

Threads:

On Wed, 9 Apr 2008 00:15:30 -0400, "John Wiegley" <johnw@newartisans.com> said: > Thought you'd like to know that I implemented @cleared today, and it > works great. I just need to think through what will happen for

Awesome! You're a machine :-)

> multiple @cleared tags applied to the same account (but for different > dates), and how best to output the informational warnings.

A check occurs at a specific date anyway, so checks for the same account but at different dates are multiple checks, they need to check the balance at those dates.

IMO there should be no output unless there is an error. An error in a check should be of equivalent magnitude as for an imbalanced transaction.

Also, I think the word "cleared" may not be most appropriate... there is already a concept of "cleared" transactions/postings in Ledger, which has a different meaning than "it balances". "Cleared" is supposed to reflect the user's confidence that a transaction is accurate. The directive above is definitely not about that, so I would instead suggest "@check" or "@check" or something else. Overloading the meaning of "cleared" will create a lot of confusion IMO.

> I realized that @cleared needs to accept balances (your "wallets") > instead of just amounts, which I solved by allowing value expressions: > > @cleared DATE ACCOUNT $100.00 + 200 CAD > > Just chain all the totals you expect to make up the account balance > using addition operators.

Actually, I had an idea for this one: it would take a single commodity number (this is consistent with the rest other entries), and it would check the wallet for just that commodity. If you want multiple checks, you would input multiple lines. If you want 0, you would have a line for 0. If you don't have a line, there is no check. I prefer that to the expressions (I don't have expressions anywhere in my Ledger, and I would like it if the format allowed you to use the system even if you did not having support for expressions -- I think the simplistic Python loader I will write would not support expressions for a little while).

@cleared DATE ACCOUNT $100.00 @cleared DATE ACCOUNT 200 CAD

In the version you propose above, if there is a non-zero balance EUR in that account, does your check fail? If so, how do you check for just $ and CAD?

What do you think?

Marking for reconciliation

I would like to be able to flag an account as having been reconciled up to a specific date. I would like for this to be recorded somehow, and to be able to query the system to find out what work I have to do to bring the system up-to-date. This is a very important part of the process.

Note that this is only valid for certain accounts, i.e., "real" accounts.

One problem I have is that it is not obvious to figure out at a glance which accounts need updating. I would like a directive that says "this account is up-to-date as of DATE". For this purpose, we reused the @check directive mentioned above. You can do this:

... some activity

@check 2008/03/01  Assets:Current:RBC:Checking-US          1.44 USD

@check 2008/04/01  Assets:Current:RBC:Checking-US          1.44 USD

@check 2008/05/01  Assets:Current:RBC:Checking-US          1.44 USD

... some activity

@check 2008/06/01  Assets:Current:RBC:Checking-US          343.23 USD

Only accounts for which such a directive has been seen should be included in the list of valid date ranges. Note that only the minimum and maximum values matter. Using a command to generate the ranges, it's really easy to see that you forgot to update a specific account (I have so many accounts now, I get very confused; I like the computer to help out.)

One-over syntax

Q: I'd like to be able to express exchange rates with the inverse rate, and for the computer to automatically take care of it, e.g.:

2006/02/01 * Change dollars into euros and place in safe.
  Assets:Cash                       -100.00 USD @ 1/1.5823 EUR
  Assets:Safe

It would be great if you could specify your prices with 1/RATE:

Assets:Investments:RBC-Broker:Account-CAD 121.41 CAD @ 1/@ 0.96760 USD

You can do this with an expression.

Support for Merging

We should be able to associate an id with an OFX file, so that we can detect previously entered entries.

Pages

Another way to associate a custag tag/field to entries is by virtue of their organisation in the file. We could tag a sequence of consecutive entries in a block, like this:

@page_begin Vacations

...

@page_end

This gives us yet another dimension of tagging of transactions:

  1. The account in which a transaction belongs
  2. The page in which a transaction was declared.
  3. The "notes" at the end of postings
  4. The description of the transaction
  5. The file in which a transaction was defined.

These are all fields that can be used for selecting a subset of transactions. Some of these fields may allow us to simplify our accounts hierarchy to some extent.

Report generation

Mini-language / Aspects

from DATE : specify the date range to filter transactions by to DATE

maxlevel NO : level to render down-to in the tree of accounts

cumul : whether or not the amounts shown are the cumulative
amounts between parent and child or whether they show just the balances within that account.
basis : whether we convert the value of all assets contained
within an account to a basis currency, using the prices at the given dates of the transaction.

Experiences Using Other Softwares

GnuCash Problems

Note on my attempts at using GnuCash. Verdict: not ready for prime time.

  • There is no undo!!!!!!!!! This is unreal!! I have to send SIGKILL and restart, and redo some of my work (hoping that auto-save didn't save between the time I realized by mistake and the time I killed).
  • The system does not enforce matching underlying currencies between account transactions... you can easily screw up by having a journal entry with CAD on one side and USD on the other.
  • The general ledger doesn't seem to work properly: there are several entries that are missing (isn't this thing supposed to show ALL transactions?).
  • The GUI is broken (in a very Gnome-ish way): each time you open a new account, you have to fiddle the headers to be able to see the entire account.
  • When you finish editing an entry, the GUI jumps around, it is very disorienting. I've seen it drop transactions at least once.
  • Importing does not take into account the GUIDs that are supposed to prevent duplicate transactions.
  • I can't figure out how to match two transactions so that I don't have to 1) fix one of the them, and then 2) delete the other.

FiveDash Problems

[2008-04-02]

I could not get it to work. The main page displays this:

Error
'NoneType' object has no attribute 'rollback'
Continue

Hacking the code, this is what I now get:

Error
permission denied for schema system
Continue

Ledger

Questions Answered

Actual dates vs Effective Dates

In "2.7 File format", the part about per-unit transaction costs is impossible to understand:

The `ACCOUNT' may be surrounded by parentheses if it is a virtual
transactions, or square brackets if it is a virtual transactions
that must balance.  The `AMOUNT' can be followed by a per-unit
transaction cost, by specifying ` AMOUNT', or a complete
transaction cost with `@ AMOUNT'.  Lastly, the `NOTE' may specify
an actual and/or effective date for the transaction by using the
syntax `[ACTUAL_DATE]' or `[=EFFECTIVE_DATE]' or
`[ACTUAL_DATE=EFFECtIVE_DATE]'.

What is an "actual" date? What is an "effective" date? How are they being used?

(There is little or no mention of these concepts in the rest of the documentation.)

Also, what does =EDATE refer to? There is no mention of it on the page:

A line beginning with a number denotes an entry.  It may be
followed by any number of lines, each beginning with whitespace,
to denote the entry's account transactions.  The format of the
first line is:

     DATE[=EDATE] [*|!] [(CODE)] DESC
Q: IMPORTANT what is the role of [date] and [=date] optional dates?

How are they handled?

  • trade vs. settlement?
  • what is ACTUAL DATE and EFFECTIVE DATE for transactions and postings?

I assume it has something to do with effective dates, but what are they, actual vs. effective dates?

A: I'll never use the effective date, it's only for budgetting, just

an alternative date field.

(16:45:55) blais: I don't understand the dates.
(16:45:59) blais: let me explain
(16:48:03) blais:           DATE[=EDATE] [*|!] [(CODE)] DESC
(16:48:10) blais: (for an entry)
(16:48:13) blais: what is EDATE for?
(16:48:16) blais: how is it treated?
(16:48:18) johnw: the effective date
(16:48:25) blais: also, for transactions (within an entry):
(16:48:32) blais: Lastly, the `NOTE' may specify
(16:48:32) blais:      an actual and/or effective date for the transaction by using the
(16:48:32) blais:      syntax `[ACTUAL_DATE]' or `[=EFFECTIVE_DATE]' or
(16:48:32) blais:      `[ACTUAL_DATE=EFFECtIVE_DATE]'.
(16:48:36) johnw: if you use the option --effective, it will report in terms of EDATE instead of DATE
(16:48:41) blais: there are two thinigs: ACTUAL DATE and EFFECTIVE DATE
(16:48:46) blais: what's the point of this?
(16:48:54) johnw: yes, actual date is the date you actually did it -- what your bank will see
(16:48:59) blais: THe manual doesn't explain that, I think it may be worth a cuople of paragraphs.
(16:49:00) johnw: the effective date is the date you did it "for"
(16:49:16) johnw: for example, if you pay June's rent on May 29, you would set the effective date to 6/1, but the actual date to 5/29
(16:49:16) blais: What's a use case?
(16:49:34) johnw: this is needed for budgeting to make sense compared to reality sometimes
(16:49:34) blais: also, why can a transaction have an actual date different than it's parent entry?
(16:49:47) blais: oh wait, the last one can make sense actually.
(16:49:58) blais: for two sides where one "lags" on the other.
(16:50:02) johnw: exactly
(16:50:08) blais: but I can't see how I would use the effective date.
(16:50:16) johnw: if not, just don't use it
(16:50:25) johnw: but when you need it, it's there :)
(16:50:48) blais: sure.
(16:50:54) blais: You only treat it as an alternative date field?
(16:51:03) blais: all  otehr processing is the same, mutatis mutandis?
(16:51:05) johnw: it only comes into play when --effective is used
(16:51:14) johnw: otherwise, it is read/stored but ignored

Actual dates and "money in transit"

(16:51:46) blais: now let's talk about the actual date.
(16:52:06) blais: If you set an actual date for a transaction != than for its parent, don't you run the risk that the books be imbalance for a short period of time?
(16:52:33) johnw: imbalance?
(16:52:46) blais: were you to render the book between those dates, you would have a book that is not coherent.
(16:52:58) johnw: reporting is only done in terms of transactions, mind you
(16:53:01) blais: (I thought this was the whole point of this double-entry mechanism.)
(16:53:07) blais: ah, right.
(16:53:07) johnw: so take an entry E with two transactions X and Y
(16:53:16) johnw: normally X and Y both take on E's date, that's a convenience
(16:53:23) johnw: but you can specify the dates for X and Y manually if you prefer
(16:53:26) blais: so, a different actual date for a transaction would only affect the register view?
(16:53:35) johnw: yes
(16:53:41) blais: NOT the balance view.
(16:53:42) blais: right?
(16:53:53) johnw: it can affect the balance view if you use -l to filter the component transactions by date
(16:53:58) blais: actually, I've got some real world cases like that already in my file.
(16:54:08) blais: oh
(16:54:10) blais: then I'm right
(16:54:17) blais: the book *may* be incoherent.
(16:54:26) johnw: oh, you mean it doesn't balance
(16:54:30) johnw: yes, you are right
(16:54:49) blais: so if I'm using --end, how do I know if I have a coherent book?
(16:54:56) blais: (you don't, I guess)
(16:55:00) johnw: there are times when the money might be "in transit" and Ledger does not presently have a way to report that to you

(16:58:05) blais: ABout actual dates: IMHO we should never show a balance that uses "actual dates". Balance views only make sense when both sides of every included entry are present.

Default account feature

The default account directive does not appear to be documented:

; Default account for withdraw is cash account.
A Assets:Current:Cash

johnw:

This feature is not documented yet (2.6).

Per-Transaction Pending state

In the Emacs support file, there seems to be an hint that individual "transactions" (within entries) could be marked as pending. In fact, when I try this in a file, it doesn't barf:

2007/12/31 * Start of year.
  ! Assets:Current:RBC:Checking           168.82 CAD
  * Equity:Opening-Balances

I don't see a need for marking individual sides of a transaction as pending... is this a remnant from the past?

(17:05:40) blais: In the Emacs support file, there seems to be an hint that individual
(17:05:40) blais: "transactions" (within entries) could be marked as pending. In fact,
(17:05:40) blais: when I try this in a file, it doesn't barf::
(17:05:40) blais:   2007/12/31 * Start of year.
(17:05:40) blais:     Assets:Current:RBC:Checking           168.82 CAD
(17:05:40) blais:     Equity:Opening-Balances
(17:05:40) blais: I don't see a need for marking individual sides of a transaction as
(17:05:40) blais: pending... is this a remnant from the past?
(17:06:08) johnw: no, that's part of the present
(17:06:09) blais: (sorry, the example is missing the !
(17:06:14) johnw: individual transaction marking was new in 2.5
(17:06:20) johnw: I use it all of the time
(17:06:21) blais: right before the account names, ! or *
(17:06:25) blais: I see.
(17:06:36) blais: Undocument I believe.
(17:06:38) blais:

Q: Periods don't work

Questions for John:

(18:08:59) blais: banane:~$ ledger-blais -E -s -b 2008/01/01 -e 2008/02/01 bal
(18:09:03) blais: this gives me 0
(18:09:10) blais: i have a hard time getting -b and -e to work.
(18:09:12) blais: what is the format?

all the period options don't seem to work:

This is a bug; Use DEBUG_CLASS=ledger.config.predicates and compile
with --enable-debug on.

Using the DEBUG_CLASS:

// The result has to be an iostream.
DEBUG_PRINT("class", foo << bar, << baz);
DEBUG_PRINT_(foo << bar, << baz);
DEBUG_CLASS("class");

assert()
CONFIRM() // More expansize than assert(), slower.
VERIFY() // Much slower.

How do you reset the "default account" feature?

You can't, right now, but ACK that this is a problem.

(17:07:37) blais: about the default account feature
(17:07:41) blais: A <account>
(17:08:00) blais: I need to be able to "reset" it to "no account", whithin the file.
(17:08:02) blais: How do I do that?
(17:08:24) blais: It's a cool feature, but the side-effect is that if you screw up and forget a transaction, it just fills it in.
(17:08:30) blais: I need to be able to reset.
(17:08:42) johnw: you can specify a "block"
(17:08:59) blais: how?
(17:09:11) johnw: one sec
(17:09:24) johnw: oops, I guess you can't
(17:09:28) johnw: there is no way to reset it
(17:09:31) blais: oopsy.
(17:09:35) johnw: you'd have to use a separate file
(17:09:36) blais: I can't use it then... too dangerous.
(17:09:43) blais: This one is way important IMO.
(17:10:04) johnw: i think i'd rather remove it
(17:10:31) johnw: it's dangerous for entries to self-balance like that

Currency and prices

Q: How does currency and prices work? @ <PRICE> syntax, what does it
do exactly?
(17:23:43) blais: Q: How does currency and prices work?  @ <PRICE> syntax, what does it
(17:23:43) blais:    do exactly?
(17:23:57) johnw: what's the context?
(17:23:59) blais: I'm not entirely sure what happens (technically) when you specify a price.
(17:24:06) johnw: lots of stuff happens, actually
(17:24:10) johnw: let me enumerate
(17:24:13) blais: it's used for balancing, and somethign it affects the choice of the empty entry's commodioty
(17:24:18) blais: please!
(17:24:35) johnw: 1. It records a historical price for that commodity at the date of the entry (or transaction, if it has its own actual date)
(17:24:53) johnw: 2. It create a lot-qualified (i.e., annotated) commodity with that price data as its qualifying details
(17:25:15) johnw: 3. It sets the basis cost of the transaction for the purpose of balancing against the remainder of the entry
(17:25:19) johnw: end

Slash date format

  • Your date format is not ISO, why not use "-" instead of "/"? In any case, the parser could easily be modified to be able to support both ways. Just curious.
Q: Is there any reason you chose to use the '/' format for dates? This

is rather unusual... ISO-8601 specifies '-', and '/' is normally used for the american format (mm/dd/yyyy). I've never seen it like in Ledger. What's the story?

(17:35:34) johnw: you can use -
(17:35:36) johnw: i'm just used to /

Real vs virtual transactions

Q: How are real vs. virtual transactions handled?

(17:36:30) johnw: --real strips virtuals, otherwise they are the same as real
(17:36:33) blais: and is there anything different btw () virtual and [] virtual?
(17:36:37) johnw: it's just a single bit on the transaction, that's the only difference
(17:36:39) blais: jsut a filter.
(17:36:41) blais: that's it?
(17:36:45) johnw: [] must balance within the entry, () does not need to balance
(17:36:47) johnw: yes, it's just a filter
(17:36:49) blais: cool.

Coherent Ordering

When we parse the file, we should keep some sort of ordering integer so that the order that appears in the file is the same order that appears when we list the register. This is just for coherence, while comparing statements.

2.6 does this, apparently.

Syntax for comments vs. notes

The syntax for comments should be distinct from the syntax for notes, using ; for both is misleading.

Placing many entries together

Adding in these entries like this works fine:

2007/12/31 * Cost basis.
  Assets:Investments:RBC-Broker:Account-CAD     8.00 CRA   ; lot:ba8c951719fd
  Equity:Opening-Balances:Cost               1395.43 CAD

2007/12/31 * Cost basis.
  Assets:Investments:RBC-Broker:Account-USD     70.00 GLD   ; lot:25dac39a5583
  Equity:Opening-Balances:Cost                5152.25 USD

but together it doesn't:

2007/12/31 * Cost basis.
  Assets:Investments:RBC-Broker:Account-CAD     8.00 CRA   ; lot:ba8c951719fd
  Equity:Opening-Balances:Cost               1395.43 CAD
  Assets:Investments:RBC-Broker:Account-USD     70.00 GLD   ; lot:25dac39a5583
  Equity:Opening-Balances:Cost                5152.25 USD

Are you figuring out the price automatically from the trades?

As much as possible, but in this case it's impossible.

Q: Generating reports with debit/credit columns

  • Isn't there a way to generate a report with the signs inverted (in two columns), in the normal way that accountants like to have that? I understand that we're trying to abstract the whole debit account/credit account confusion, but for reporting purposes, it would make sense to show it in the "usual" manner.

    (17:48:14) johnw: you can invert with "-t -a"
    (17:49:52) johnw: "-t /account/ ? -a : a"
    (17:50:06) johnw: surround everything after -t with single quotes
    (17:50:30) johnw: for a truly custom report, you'll have to export as XML and then read it in; or finish your Python port :)
    (17:50:50) johnw: ok, my wife and I are heading home, and will probably watch a TV show, I'll see you on IRC when I have time again!
    

About reports

  • I want to use columns for different currencies. The current display is very difficult to read.

  • Why do the nodes collapse like this? It seems to be the intermediary nodes should be shown in this case instead:

    10028.69 CAD  Assets:Current
    -8410.85 CAD    Cash
    18439.54 CAD    RBC:Checking
    -6971.44 CAD  Equity:Opening-Balances
     1750.73 CAD
    

About Emacs Support

  • The comment-syntax should be set appropriately.
  • The highlight mode does not work on posting lines which do not have a price. They should get highlighted, and possibly of a different color.
  • I want to setup error parsing so that emacs can jump to errors in the file when I run ledger from within Emacs.
  • I will build a function to align an entry's contents so it easily looks nice.
  • I want to be able to have emacs sort the entries by date for me. I would select a region, and then invoke the function. All the entries should just sort (I know, it's not really necessary).
  • Comments should appear grayed out! An unambiguous comment syntax would solve this problem.
  • ledger-align-amounts needs to be improved so that it automatically figures out what the widest element is. When a region is not selected, it should automatically select the current paragraph.
  • You need to write a little function to pipe the paragraph under point into ledger and visualize the output in order to see what Ledger is doing with inference.

Phone conversation

@word !word

@include @account @alias @def : lets you create a new function. @end

The flag could be anything.

Think about changing the syntax.

numbers in symbols.

Writing a Python output: look at emacs.c for an example of createing a Python.

transaction: is a posting entry: contains transactions journal: file

Put up the file on the wiki

Getting pricing data: ledger calls out to getquote().

ftp://ftp.newartisans.com/pub/python/catalog.py

auto-reconciler: goal number, look for uncleared transactions which when summed together will give you your goal number.

Normalize.lisp has all the comments for the balancing algorithm.

There is an abstraction for a wallet of amounts.

what does -B do? -B is just a reporting thing.

-V just converts into the unit of the price, otherwise, keep it in the
original price

Ledger does not seem to be able to accept multiple -f options (further -f's get ignored silently). It should simply concatenate all the specified files in a single data set.

I wish I could just say:

2008/01/03=2007/12/28 * Sell -- RHT -- RED HAT INC CA TAUX DE CHANGE    .96590
  Assets:Investments:RBC-Broker:Account-RSP                              -4.00 RHT @      21.14 CAD
  Expenses:Financial:Commissions                                          9.95 USD @     .96590 CAD
  Assets:Investments:RBC-Broker:Account-RSP                              72.06 CAD
  Expenses:Financial:Fees                                                      CAD

... to tell Ledger which currency to use to complete the entry.

For all rounding, there should be a way to tell Ledger which precision to use (for each commodity).

Implementation Suggestions

From: Martin Blais <blais@furius.ca>
To:"John Wiegley" <johnw@newartisans.com>
Cc: "Filippo Tampieri" <filippo.tampieri@gmail.com>
Subject: Re: Haskell parser for Ledger
--text follows this line--
> >> I'm writing a simple parser in Python today. I want to see
> > how fast it is (I mean, most of the parsing will be regexps
> > anyway, which is fast-as-C, so I can't imagine it'll be much
> > worse than the C code, at least not on my files... I'll ask
> > someone on the forums to try to run it on a very large file,
> > to see how it holds up).
>
> I think a friend of mine wrote a Haskell parser based on regexps. I
> also have a complete parser in Common Lisp, if you're interested in
> reading that instead of the C++ one.

Can you point out the location of that code in CL? (I have a
checkout of all newartisans Mercurial projects).


> For certain usage scenarios (such as using Ledger as a backend library
> for a full application), I entirely agree with you. My model suits
> one class of users: command-line users. All choices have been made in
> their favor (even if the set of possibilities was seriously limited,
> like reporting options).
>
> But I'm all for extending the model based on a common data format. I
> mean, the really important part is that I have a complete, immutable
> data set going back 6 years now; the reporting and querying logic is
> secondary to that -- albeit where all the exciting stuff happens.

Your comment rings to true. I am 100% aligned with your way
to thinking. The "little text file" way is for me a way of
life, a way of dealing with information which, for people
who are able to master the art of editing text files, is
really, really powerful.

However, you seem to imply that this is incompatible with
using an SQL backend, and on that point only I would beg to
differ. Apart from the hassle of having to setup the
database server (a task which is by now very easy on all
three platforms), the mode of working is still very much a
cmdline oriented process. (Note that if/when sqlite obtains
a nice finite-precision fp storing class we can then always
assume an SQL backend, because setting up sqlite is a
trivial matter.)

In fact, my reporting code will generate abstract table
data, which will then be fed to either a text renderer (for
cmdline access), or to an HTML renderer (for creating web
pages).


> > 'ledger' has a ton of options for reporting, and I find them
> > difficult to use and understand (oftentimes the options don't do
> > what I expect them to, and their meaning depends on the actually
> > command in use). I'd rather write my own reporting from the data
> > model.
>
> Ah, you are right here. I find them confusing myself, sometimes.
> UNIX command-line options are a poor substitute for a rich reporting
> language.

I've gone down that avenue myself before :-)
It's exciting while your task is recent to memory, but over
time it becomes hard to remember what meant what...


> > * I want to make all the basic reporting available from my web
> > server as HTML pages (much prettier, and I can access it anywhere,
> > all the time).
>
> I started doing this same thing with the CL port of Ledger. I really
> liked this style of reporting, since it conduces to _reading_ more
> than paging through command-line output. Plus, you can use bold,
> italics and colors to much greater effect, not to mention charting.

... and most of all, nicely aligned tables!

Plus you can have links that you click on to obtain the
register view...

... and pie charts...

... and graphs of PnL windows over time...

... and I'm sure, lots of other things I can't think of now...


> > * If you fiddle with the price syntaxes, you can easily make ledger
> > dump core (I found it to be pretty stable, but there must be a few
> > bugs left to iron out with the price stuff). This might be a problem
> > when I start entering data for the investment accounts.
>
> I would love bug reports!!! :)

I'll take note the next time I crash it in the course of
using it, but I was under the impression that you were
moving on to the CL version.


> > * I think that the author has decided to move on and more-or-less
> > stop working on the C++ version and move to a CL implementation
> > (using SBCL). It's a cool idea, and as you know I love LISP, but
> > deployment is a real PIA, and I want to be able to run this
> > everywhere-all-the-time on all my platforms (Mac UNIX Windoze), with
> > minimal effort. You can't beat Python for that.
>
> The deployment issues finally beat me down. I'm turning back to
> making the C++ version 100% correct and stable, and then I will stop
> working on Ledger altogether.
>
> What you are proposing sounds like "the next chapter", and I would be
> very interested in supporting you in that endeavor.

Awesome! Let me flesh out a first version, and then I'll
setup a Mercurial repository from furius.ca, so you can
participate first-hand if you like.

I have a few questions about prices and such, things I do
not quite understand clearly, that would be best discussed
over voice.


> > As mentioned above, I want to validate that my entries always fit a
> > fixed set of accounts, which I declare near the top of the document.
> > Moreover, those declarations further constrain the kinds of
> > commodities that can be deposited in an account (possibly wildcard).
>
> Very interesting, kind of like an asserts mechanism for your Ledger
> data. That is really an awesome idea!!

Asserts: exactly what I want! (more below, I think this can
be integrated in a more general idea.)


> I'd like to see a more general syntax for this based on value
> expressions, which would offer a full constraints mechanism. For
> example, to constrain all transactions to being less than $10,000 in
> an account:
>
> ? Constrain all transactions to less than $10,000
> /Expenses:Food/ a < $10,000
>
> The "?" indicates a "constraints entry". Each transaction would have
> two value expressions: one to match every applicable transaction in
> the file, and another to provide the boolean logic of the constraint.
>
> Then, while the file is being parsed, any violations of a constraint
> would be treated as an error, the same as when an entry fails to
> balance to zero. I suppose making these warnings could be a
> possibility as well.
>
> Here is how you'd constrain commodities in this model:
>
> ? Guarantee commodities within accounts
> Assets:Checking comm(a) == $1.00
>
> (At the moment there is no value expr function that would allow:
> comm(a) == "$")
>
> Then, of course, there could be a specific declaration option -- such
> as you have above -- for just this case, which internally would be
> parsed as a constraints entry.

This is a fantastic idea; I like the idea of creating a
little language for constraints, but also this could be
written in the form of a simple Python script that gets run
as part of a checking stage. Once you're provided with easy
access to the data model, a simple loop written in Python to
do some custom checks is a 10 line program or something...
if we find a way to open up user-code from within Python,
you can leverage all the power of Python (and its libraries,
accessing the network, etc) and then you don't even have to
spend time on creating your own language.

Note that this is also true for reporting: your reporting
cmdline options are a form of language per se. IMO you would
be better off creating a "reporting language" instead of the
options.



Note that I think it belongs in a later stage, now think
about this: it will be much more practical if we separate
the concepts of "check" into distinct stages:

A. Loading phase: simple syntax checking that occur during
parsing:

At this stage, we don't even check if it balances, we
just parse and create the data structure, whether the
entries balance or not.

B. Checking phase, which includes:

1. "Balance check": check that each of the Transactions
balances individually.

2. "Account validity check": check that all the account
names are in the list of defined names.

3. "Constraint check": the idea you flesh out above. I'm
not sure I would use it myself--I would rather
implement a little Python loop in a small source file
that gets automatically imported and run--but I'm sure
it would be useful.

4. "User-defined checks": user code that gets loaded and
run, being handed the data structure.

I'm not 100% sure that all these checks should involve some
syntax defined in the file (e.g. my list of account checks
could feed from a separate file that defines the accounts,
your list of conditions could be cmdline args), but I'm
heavily biased towards the simplicity of keeping everything
in a single file.

Python as input syntax

Think about this silly idea: if we could make the input syntax Python code itself, the possibilities for syntax additions are endless. With suitable imports, here is a suggestion:

#!/usr/bin/env python from ledger import *

Trans('2008-04-05', '*', 'Movie tix for Hulk',
('Assets:Checking', -24.00), ('Expenses:Movies',))

or simpler versions of this (it's a matter of how much you want Python to do the parsing for you:

Trans('2008-04-05 * Movie tix for Hulk',
  ('Assets:Checking', -24.00),
  ('Expenses:Movies',))

Trans('2008-04-05 * Movie tix for Hulk',
  'Assets:Checking  -24.00'
  'Expenses:Movies')

Trans("""2008-04-05 * Movie tix for Hulk
  Assets:Checking         -24.00
  Expenses:Movies
""")

Add the end of your program you could just invoke some custom code to be run... then you don't have to create a language at all. You can just have this at the top:

# Define my list of valid accounts.
Account('Assets:Checking')
Account('Expenses:Movies')

You could even create a loop to generate the list of valid accounts, or feed it from the network or something.

In the end, I don't like it too much, because it removes a lot of the cosmetic elegance of your file format. But think about this: I could provide a function that parses the transactions format, and then your custom input script is a Python program, with a massive string defined at the end:

#!/usr/bin/env python
from ledger import *

trans = """

2008/01/07 * NSF ITEM FEE -- 1 @ $35.00
  Assets:Current:RBC:Checking                                      -35.00 CAD
  Expenses:Financial:Fees

2008/01/10 * WWW3RD PTY DEP-7983 -- Expenses payment
  Assets:Current:RBC:Checking                                     2000.00 CAD
  Assets:Expense-Account                                         -2000.00 CAD

... more transactions ...

"""

# Create my ledger and parse some transactions into it.
ledger = Ledger()
parse_transactions(trans, ledger)

# Check that all the transactions in the ledger balance.
check_balances(ledger)

# Check account validity.
my_accounts = """\
  Expenses:Financial:Fees
  Assets:Current:RBC:Checking
  Assets:Expense-Account
""".splitlines()

check_accounts(my_accounts)

# Check some user-defined constraints.
def my_constraints(ledger):
    maxamt = Decimal('10000.00')
    for posting in ledger.iter_all_postings():
        if not (posting.amount < maxamt):
            raise ValueError("Amount of posting %s too large!" % posting)

my_constraints()

# Generate some reports to files.
tables = generate_balance_report(ledger)
open('balance.txt').write(format_tables(tables, 80))

tables = generate_global_register(ledger)
open('register.txt').write(format_tables(tables, 80))

If you want to support your existing files, you write this simple utility (you can call it "ledger" :-) ):

#!/usr/bin/env python
from ledger import *
opts, command = handle_ledger_legacy_cmdline()
ledger = parse_ledger(opts.file)
command.run(ledger)

Simple, no? This is how I'm planning to implement my version: I want to leverage as much as the Python language as possible whilst remaining compatible with a simple, convenience syntax.

Note that with this approach a bunch of problems go way, i.e. you don't need include directives anymore. You can still provide something simple for persons who do not want to write a single line of code, but if something wants to do something a little more custom, they just write a little program in Python.

And I'm happy too:

#!/usr/bin/env python
from ledger import *
opts, command = handle_cmdline()
ledger = parse_ledger(opts.file)
load_sql_database(ledger,
                  dbname='myfinances',
                  user='blais',
                  hostname='furius.ca')

Balance checks may be optional

About the "balance check: Note that in the great majority of cases you always want to run this, but I like the idea of being able to leave some transactions unbalanced (as long as they are easy to find-- the program can do that).

Terminology

BTW after fiddling with a few systems I would suggest the
following names for the objects/tables:

- Posting: an individual entry in a specific account (the
basic register view is a list of postings).

- Transaction: the thing that contains the individual
postings.

The word "journal entry" seems to have been abused left and
right, and should be best avoided. "Item" is too generic.
The names above are what I'm using through this long email.



> > I'm also toying with the idea of giving accounts longer,
> > more descriptive names, which would be used in reporting,
> > but I'm not sure it won't just make things more complicated
> > than necessary. Each account as a specific name that the
> > bank uses for it, for example "High-Power e-Savings
> > Account". I'd like to be able to see those in the final
> > report. Maybe I can leave it out of the ledger format and
> > just use a join to an anciliary table in my reporting code.
>
> Account aliases are already possible, such as:
>
> @alias Savings = High-Power e-Savings Account
>
> Now you can using "Savings" in your ledger data, but real account name
> used will be the long form.

Ah, coool! I didn't know that. Awesome!!!


> > Reconciled Dates
> > ~~~~~~~~~~~~~~~~
> >
> > One problem I have is that it is not obvious to figure out
> > at a glance which accounts need updating. I would like a
> > directive that says "this account is up-to-date as of DATE".
> > From this, you can generate a little table that prints out
> > the last updated date ranges for each account for which it
> > matters. Only accounts for which such a directive has been
> > shown should be included in the list::
> >
> > ;B 2008/04/01 Liabilities:Credit-Card:RBC-VISA
> >
> > Note that only the minimum and maximum values matter. Using
> > a command to generate the ranges, it's really easy to see
> > that you forgot to update a specific account (I have so many
> > accounts now, I get very confused; I like the computer to
> > help out.)
>
> If you use the -U reporting option, it will show you all transactions
> which are not up-to-date yet. Can you describe the usage profile for
> this option?

Actually, I don't think the transactions embody all the
information: if an account has been inactive for two months,
e.g., it has no transactions in it, I still want to be able
to mark in the file that "I have checked until this date",
so that I won't have to go check again that really, nothing
happened since the last transaction. This is just to manage
the process of entering and reconciling the data (for the
human).