## Introduction

In my previous articles, I have discussed how to use pandas as a replacement

for Excel as a data wrangling tool. In many cases, a python + pandas solution

is superior to the highly manual processes many people use for manipulating data

in Excel. However, Excel is used for many scenarios in a business environment – not just data wrangling.

This specific post will discuss how to do financial modeling in pandas instead of

Excel. For this example, I will build a simple amortization table in pandas and show

how to model various outcomes.

In some ways, building the model is easier in Excel (there are many examples just

a google search away). However, as an exercise in learning about pandas, it is useful

because it forces one to think about how to use pandas strengths to solve a problem

in a way different from the Excel solution. In my opinion the solution is more powerful

because you can build on it to run multiple scenarios, easily chart various outcomes

and focus on aggregating the data in a way most useful for your needs.

## What is an amortization schedule?

Financial modeling can take many forms but for this article, I wanted to focus

on a problem that many people will encounter in their lifetime. Namely, the

finance aspects of a large loan.

The wikipedia page has a good explanation of an amortization schedule. In the

simplest terms, an amortization schedule is a table that shows the periodic

principal and interest payments needed to pay of a debt. The most common case

is the payoff schedule for a mortgage.

Using the mortgage as an example, in each period (typically monthly) a home owner

writes a check to their bank for a specified amount. This amount is split into a

principal and interest payment. The bank keeps the interest and the principal is

applied to the outstanding loan. Over a period of time the final balance will go to

0 and you will own the home.

Even with historically low interest rates, the amount of interest paid on a typical

15 or 30 year mortgage is very high. It is possible that you can pay almost as much

in interest as the original loan was worth. Because of the financial importance

of this purchase, it is important to understand all the financial ramifications

of a mortgage. In addition, there are many variables that can affect the mortgage payments:

- Interest rate
- Duration of the loan
- Payment frequency (monthly vs bi-weekly, etc)
- Additional principal payments

There are certainly many on-line calculators and examples that show how to build

tools in Excel. However, using the pandas solution is handy as a teaching tool to

understand pandas in more detail and in using pandas to build a simple way to model and compare

multiple scenarios. Before I go through the pandas-based solution, it’s helpful to see the

Excel based solution so we have a model to use as a basis for the pandas solution:

The basic model is simple. Each period results in a small decrease in the principal.

At the end of 30 years, the balance is $0 and the loan is complete.

This model assumes that an individual pays exactly the prescribed amount each period.

However, there can be financial benefits to paying extra principal and paying off

the loan faster. As I think about modeling my mortgage, I’m curious to understand

things like:

- How much do I save in interest if I contribute a little more principal each payment period?
- When will I pay off the loan?
- What is the impact of various interest rates?

Using the pandas solution can be useful for comparing and contrasting multiple options.

## Payment, Principal and Interest

Not surprisingly, the numpy library has all the built in functions we need to do the

behind the scenes math. In fact, the documentation shows one approach to build

the amortization table. This approach certainly works but I’d like to include

the results in a pandas DataFrame so that I can more easily dump the results to Excel

or visualize the results.

I am going to walk through the basic parts of the solution for a 30 year $200K mortgage

structured with a monthly payment and an annual interest rate of 4%. For an added

twist, I’m going to build the solution with an extra $50/month to pay down the

principal more quickly.

Get started with the imports of all the modules we need:

```
import pandas as pd
import numpy as np
from datetime import date
```

Define the variables for the mortgage:

```
Interest_Rate = 0.04
Years = 30
Payments_Year = 12
Principal = 200000
Addl_Princ = 50
start_date = (date(2016,1,1))
```

Now, let’s play with the basic formulas so we understand how they work.

Calculating the total payment requires us to pass the right values to the numpy

pmt

function.

```
pmt = np.pmt(Interest_Rate/Payments_Year, Years*Payments_Year, Principal)
```

-954.83059093090765

This means that every month we need to pay $954.83 (which matches the Excel solution above).

But, how much of this is interest and how much is principal? Well, it depends.

The payment stays constant over time but the amount applied to principal increases

and the interest decreases as we move forward in time.

For example, for period 1, here is the interest and principal:

```
# Period to calculate
per = 1
# Calculate the interest
ipmt = np.ipmt(Interest_Rate/Payments_Year, per, Years*Payments_Year, Principal)
# Calculate the principal
ppmt = np.ppmt(Interest_Rate/Payments_Year, per, Years*Payments_Year, Principal)
print(ipmt, ppmt)
```

-666.6666666666667 -288.163924264

In other words, the first payment of $954.83 is composed of $666.67 in interest and only $288.16

in principal. Ouch.

Let’s look at what the breakdown is for period 240 (20 years in the future).

```
per = 240 # Period to calculate
# Calculate the interest
ipmt = np.ipmt(Interest_Rate/Payments_Year, per, Years*Payments_Year, Principal)
# Calculate the principal
ppmt = np.ppmt(Interest_Rate/Payments_Year, per, Years*Payments_Year, Principal)
print(ipmt, ppmt)
```

-316.49041533656924 -638.340175594

In this case, we are paying much more towards the principal ($638.34) and much

less towards the interest ($316.49).

That should be fairly straightforward. But, what if I want to know what my

balance is at period 240? Well, then I need to understand the cumulative effect

of all my principal payments. This is not as straightforward in pandas.

This is where the Excel solution is a little simpler to conceptualize.

In Excel, it is easy to reference the row above and use that value in the current

row. Here is the Excel version for maintaining the balance due:

As you can see, in row 10, the balance formula references row 9. This type of formula

is simple in Excel but in pandas a reference like this seems difficult. Your first

instinct might be to try writing a loop but we know that is not optimal.

Fortunately there is another approach that is more consistent with pandas.

I will get to that in a moment. Before we go there, let’s get the basic pandas

structure in place.

are normally better options to accomplish the same goal.

## Building the Table

To answer the question about the balance change over time, we need to build

a pandas DataFrame from scratch. There are extra steps here (as compared to Excel)

but this is a useful adventure into some of the pandas functions I have not discussed previously.

First, let’s build a

DateTimeIndex

for the next 30 years based on

MS

(Month Start):

```
rng = pd.date_range(start_date, periods=Years * Payments_Year, freq='MS')
rng.name = "Payment_Date"
```

DatetimeIndex(['2016-01-01', '2016-02-01', '2016-03-01', '2016-04-01', '2016-05-01', '2016-06-01', '2016-07-01', '2016-08-01', '2016-09-01', '2016-10-01', ... '2045-03-01', '2045-04-01', '2045-05-01', '2045-06-01', '2045-07-01', '2045-08-01', '2045-09-01', '2045-10-01', '2045-11-01', '2045-12-01'], dtype='datetime64[ns]', name='Payment_Date', length=360, freq='MS')

This helpful function creates a range for the next 30 years starting on Jan 1, 2016.

The range will be used to build up the basic DataFrame we will use for the amortization schedule.

Note that we need to make sure the first period is 1 not 0, hence the need to use the

df.index += 1

:

```
df = pd.DataFrame(index=rng,columns=['Payment', 'Principal', 'Interest', 'Addl_Principal', 'Balance'], dtype='float')
df.reset_index(inplace=True)
df.index += 1
df.index.name = "Period"
```

Here is what the stub DataFrame looks like:

Payment_Date | Payment | Principal | Interest | Addl_Principal | Balance | |
---|---|---|---|---|---|---|

Period | ||||||

1 | 2016-01-01 | NaN | NaN | NaN | NaN | NaN |

2 | 2016-02-01 | NaN | NaN | NaN | NaN | NaN |

3 | 2016-03-01 | NaN | NaN | NaN | NaN | NaN |

4 | 2016-04-01 | NaN | NaN | NaN | NaN | NaN |

5 | 2016-05-01 | NaN | NaN | NaN | NaN | NaN |

This looks similar to what we have in Excel so we’re on the right track.

Adding the payment is easy because it is a simple formula that produces a consistent value.

```
df["Payment"] = np.pmt(Interest_Rate/Payments_Year, Years*Payments_Year, Principal)
```

However the interest and principal change over time. Fortunately the formula is based on

the period which we have available in our DataFrame as

df.index

. We can

reference it in our formula to get the unique values for the specified period:

```
df["Principal"] = np.ppmt(Interest_Rate/Payments_Year, df.index, Years*Payments_Year, Principal)
df["Interest"] = np.ipmt(Interest_Rate/Payments_Year, df.index, Years*Payments_Year, Principal)
```

The final step is to add the Additional Principal (as a negative number) and round the values:

```
# Convert to a negative value in order to keep the signs the same
df["Addl_Principal"] = -Addl_Principal
df = df.round(2)
```

The table is starting to come together:

Payment_Date | Payment | Principal | Interest | Addl_Principal | Curr_Balance | |
---|---|---|---|---|---|---|

Period | ||||||

1 | 2016-01-01 | -954.83 | -288.16 | -666.67 | -50 | NaN |

2 | 2016-02-01 | -954.83 | -289.12 | -665.71 | -50 | NaN |

3 | 2016-03-01 | -954.83 | -290.09 | -664.74 | -50 | NaN |

4 | 2016-04-01 | -954.83 | -291.06 | -663.78 | -50 | NaN |

5 | 2016-05-01 | -954.83 | -292.03 | -662.81 | -50 | NaN |

All that’s left is figuring out how to manage the

Curr_Balance

column.

Before I show you the better solution (I won’t say best because I would not

be surprised if there is an even better option), I am going to show you the

ugly approach I first took.

## Maintaining the Balance – Try 1

I am showing this example because I suspect many novice pandas users would

go down this path when trying to solve a similar problem. It also shows how a

little time spent thinking about the solution yields a much better approach than

just charging in with the first idea that comes to mind.

First, we calculate the balance for the first period by doing the calculation

for the first row:

```
df["Balance"] = 0
df.loc[1, "Balance"] = Principal + df.loc[1, "Principal"] + df.loc[1, "Addl_Principal"]
```

Payment_Date | Payment | Principal | Interest | Addl_Principal | Balance | |
---|---|---|---|---|---|---|

Period | ||||||

1 | 2016-01-01 | -954.830591 | -288.163924 | -666.666667 | -50 | 199661.836076 |

2 | 2016-02-01 | -954.830591 | -289.124471 | -665.706120 | -50 | 0.000000 |

3 | 2016-03-01 | -954.830591 | -290.088219 | -664.742372 | -50 | 0.000000 |

4 | 2016-04-01 | -954.830591 | -291.055180 | -663.775411 | -50 | 0.000000 |

5 | 2016-05-01 | -954.830591 | -292.025364 | -662.805227 | -50 | 0.000000 |

It works but it’s starting to get a little cumbersome.

My next step was to loop through each row and calcuate the balance:

```
for i in range(2, len(df)+1):
# Get the previous balance as well as current payments
prev_balance = df.ix[i-1, 'Balance']
principal = df.ix[i, 'Principal']
addl_principal = df.ix[i, "Addl_Principal"]
# If there is no balance, then do 0 out the principal and interest
if prev_balance == 0:
df.ix[i, ['Payment', 'Principal', 'Interest', 'Balance', 'Addl_Principal']] = 0
continue
# If this payment does not pay it off, reduce the balance
if abs(principal + addl_principal) <= prev_balance:
df.ix[i, 'Balance'] = principal + prev_balance + addl_principal
# If it does pay it off, zero out the balance and adjust the final payment
else:
# Just adjust the principal down
if prev_balance <= abs(principal):
principal = -prev_balance
addl_principal = 0
else:
addl_principal = (prev_balance - abs(principal_payment))
df.ix[i, 'Balance'] = 0
df.ix[i, 'Principal'] = principal
df.ix[i, 'Addl_Principal'] = addl_principal
df.ix[i, "Payment"] = principal + df.ix[i, "Interest"]
df = df.round(2)
```

Payment_Date | Payment | Principal | Interest | Addl_Principal | Balance | |
---|---|---|---|---|---|---|

Period | ||||||

1 | 2016-01-01 | -954.83 | -288.16 | -666.67 | -50 | 199661.84 |

2 | 2016-02-01 | -954.83 | -289.12 | -665.71 | -50 | 199322.71 |

3 | 2016-03-01 | -954.83 | -290.09 | -664.74 | -50 | 198982.62 |

4 | 2016-04-01 | -954.83 | -291.06 | -663.78 | -50 | 198641.57 |

5 | 2016-05-01 | -954.83 | -292.03 | -662.81 | -50 | 198299.54 |

Oh boy. That works but the code smell is quite intense. At this point, I almost

ditched this article because the solution was not very pretty.

I decided to regroup by doing some research and found this post by

Brandon Rhodes which helped me re-frame my problem and develop a much better solution.

## Maintaining the Balance – Try 2

described below are not correct in those instances where additional principal

payments are being made. Work is underway to resolve.

(26-Nov-2016)

After reading Brandon’s article, I realized that by adding an additional column

with my cumulative principal payments, I could very easily calculate the balance. The

pandas authors realized some of the challenges of calculating results based on

prior rows of data so they included several cumulative functions.

In this example, I will use

cumsum

to build a running total of my

principal payments.

```
df["Cumulative_Principal"] = (df["Principal"] + df["Addl_Principal"]).cumsum()
```

One thing that is interesting is that with the additional principal payments, I

end up with paying more in principal that I originally planned to.

Payment_Date | Payment | Principal | Interest | Addl_Principal | Curr_Balance | Cumulative_Principal | |
---|---|---|---|---|---|---|---|

Period | |||||||

356 | 2045-08-01 | -954.83 | -939.07 | -15.76 | -50 | NaN | -214012.32 |

357 | 2045-09-01 | -954.83 | -942.20 | -12.63 | -50 | NaN | -215004.52 |

358 | 2045-10-01 | -954.83 | -945.35 | -9.49 | -50 | NaN | -215999.87 |

359 | 2045-11-01 | -954.83 | -948.50 | -6.33 | -50 | NaN | -216998.37 |

360 | 2045-12-01 | -954.83 | -951.66 | -3.17 | -50 | NaN | -218000.03 |

This is obviously not correct so I need to put a floor (or

clip

) the results

so that I never exceed $200,000 in total principal payments:

```
df["Cumulative_Principal"] = df["Cumulative_Principal"].clip(lower=-Principal)
```

Now that I have that out of the way, the Current Balance for any given period is

very simple to calculate:

```
df["Curr_Balance"] = Principal + df["Cumulative_Principal"]
```

Payment_Date | Payment | Principal | Interest | Addl_Principal | Curr_Balance | Cumulative_Principal | |
---|---|---|---|---|---|---|---|

Period | |||||||

1 | 2016-01-01 | -954.83 | -288.16 | -666.67 | -50 | 199661.84 | -338.16 |

2 | 2016-02-01 | -954.83 | -289.12 | -665.71 | -50 | 199322.72 | -677.28 |

3 | 2016-03-01 | -954.83 | -290.09 | -664.74 | -50 | 198982.63 | -1017.37 |

4 | 2016-04-01 | -954.83 | -291.06 | -663.78 | -50 | 198641.57 | -1358.43 |

5 | 2016-05-01 | -954.83 | -292.03 | -662.81 | -50 | 198299.54 | -1700.46 |

Wow. This approach is much simpler than the looping solution I tried in my first

iteration. The only thing left is figuring out how to clean up the table if we

pay it off early.

## The Big Payoff

When an amortization table is built, the assumption is that the payments over each

period will just be enough to cover the principal and interest and at the end of the

time period, the balance goes to 0. However, there may be scenarios where you want

to accelerate the payments in order to pay off the loan earlier. In the example

we have been running with, the model includes $50 extra each month.

In order to find the last payment, we want to find the the payment where the

Curr_Balance first goes to 0:

Payment_Date | Payment | Principal | Interest | Addl_Principal | Curr_Balance | Cumulative_Principal | |
---|---|---|---|---|---|---|---|

Period | |||||||

340 | 2044-04-01 | -954.83 | -890.38 | -64.45 | -50 | 1444.24 | -198555.76 |

341 | 2044-05-01 | -954.83 | -893.35 | -61.48 | -50 | 500.89 | -199499.11 |

342 | 2044-06-01 | -954.83 | -896.33 | -58.50 | -50 | 0.00 | -200000.00 |

343 | 2044-07-01 | -954.83 | -899.32 | -55.52 | -50 | 0.00 | -200000.00 |

Based on this view, you can see that our last payment would be in period 342.

We can find this value by using

idxmax

```
last_payment = df.query("Curr_Balance <= 0")["Curr_Balance"].idxmax(axis=1, skipna=True)
df.ix[last_payment]
```

Payment_Date 2044-06-01 00:00:00 Payment -954.83 Principal -896.33 Interest -58.5 Addl_Principal -50 Curr_Balance 0 Cumulative_Principal -200000 Name: 342, dtype: object

Now we know the last payment period, but astute readers may have noticed

that we payed $896.33 + $50 in principal but we only owed $500.89. We can clean

this up with a couple of statements using

last_payment

as the index:

```
df.ix[last_payment, "Principal"] = -(df.ix[last_payment-1, "Curr_Balance"])
df.ix[last_payment, "Payment"] = df.ix[last_payment, ["Principal", "Interest"]].sum()
df.ix[last_payment, "Addl_Principal"] = 0
```

Payment_Date | Payment | Principal | Interest | Addl_Principal | Curr_Balance | Cumulative_Principal | |
---|---|---|---|---|---|---|---|

Period | |||||||

338 | 2044-02-01 | -954.83 | -884.48 | -70.36 | -50 | 3322.04 | -196677.96 |

339 | 2044-03-01 | -954.83 | -887.42 | -67.41 | -50 | 2384.62 | -197615.38 |

340 | 2044-04-01 | -954.83 | -890.38 | -64.45 | -50 | 1444.24 | -198555.76 |

341 | 2044-05-01 | -954.83 | -893.35 | -61.48 | -50 | 500.89 | -199499.11 |

342 | 2044-06-01 | -559.39 | -500.89 | -58.50 | 0 | 0.00 | -200000.00 |

For a final step, we can truncate the DataFrame so that we only include through

period 342:

```
df = df.ix[0:last_payment]
```

Now we have a complete table, we can summarize and compare results.

## Time to Analyze

It has taken some time to pull this solution together but now that we know how

to solve the problem, we can put it into a function that allows us to input

various scenarios, summarize the results and visualize them in various ways.

I have built an amortization table function that looks like this:

```
def amortization_table(interest_rate, years, payments_year, principal, addl_principal=0, start_date=date.today()):
""" Calculate the amortization schedule given the loan details
Args:
interest_rate: The annual interest rate for this loan
years: Number of years for the loan
payments_year: Number of payments in a year
principal: Amount borrowed
addl_principal (optional): Additional payments to be made each period. Assume 0 if nothing provided.
must be a value less then 0, the function will convert a positive value to
negative
start_date (optional): Start date. Will start on first of next month if none provided
Returns:
schedule: Amortization schedule as a pandas dataframe
summary: Pandas dataframe that summarizes the payoff information
"""
```

Refer to this notebook for the full code as well as example usage.

You can call it to get summary info as well as the detailed amortization schedule:

```
schedule1, stats1 = amortization_table(0.05, 30, 12, 100000, addl_principal=0)
```

Which yields a schedule:

Payment_Date | Payment | Principal | Interest | Addl_Principal | Curr_Balance | Cumulative_Principal | |
---|---|---|---|---|---|---|---|

Period | |||||||

1 | 2016-12-01 | -536.82 | -120.15 | -416.67 | 0 | 99879.85 | -120.15 |

2 | 2017-01-01 | -536.82 | -120.66 | -416.17 | 0 | 99759.19 | -240.81 |

3 | 2017-02-01 | -536.82 | -121.16 | -415.66 | 0 | 99638.03 | -361.97 |

4 | 2017-03-01 | -536.82 | -121.66 | -415.16 | 0 | 99516.37 | -483.63 |

5 | 2017-04-01 | -536.82 | -122.17 | -414.65 | 0 | 99394.20 | -605.80 |

and summary stats:

payoff_date | Interest Rate | Number of years | Period_Payment | Payment | Principal | Addl_Principal | Interest | |
---|---|---|---|---|---|---|---|---|

0 | 11-01-2046 | 0.05 | 30 | -536.82 | -193255.2 | -100000.02 | 0.0 | -93255.69 |

The powerful aspect of this approach is that you can run multiple scenarios and combine

them into 1 table:

```
schedule2, stats2 = amortization_table(0.05, 30, 12, 100000, addl_principal=-200)
schedule3, stats3 = amortization_table(0.04, 15, 12, 100000, addl_principal=0)
# Combine all the scenarios into 1 view
pd.concat([stats1, stats2, stats3], ignore_index=True)
```

payoff_date | Interest Rate | Number of years | Period_Payment | Payment | Principal | Addl_Principal | Interest | |
---|---|---|---|---|---|---|---|---|

0 | 11-01-2046 | 0.06 | 30 | -599.55 | -215838.00 | -99999.92 | 0.0 | -115838.23 |

1 | 04-01-2037 | 0.05 | 30 | -736.82 | -131689.78 | -51200.00 | -48800.0 | -80490.16 |

2 | 11-01-2031 | 0.04 | 15 | -739.69 | -133144.20 | -100000.03 | 0.0 | -33143.85 |

Finally, because the data is in a DataFrame, we can easily plot the results

to see what the payoff time lines look like for the various scenarios:

```
fig, ax = plt.subplots(1, 1)
schedule1.plot(x='Payment_Date', y='Curr_Balance', label="Scenario 1", ax=ax)
schedule2.plot(x='Payment_Date', y='Curr_Balance', label="Scenario 2", ax=ax)
schedule3.plot(x='Payment_Date', y='Curr_Balance', label="Scenario 3", ax=ax)
plt.title("Pay Off Timelines")
```

Or, we can look at the interest payments by year:

```
fig, ax = plt.subplots(1, 1)
y1_schedule = schedule1.set_index('Payment_Date').resample("A")["Interest"].sum().abs().reset_index()
y1_schedule["Year"] = y1_schedule["Payment_Date"].dt.year
y1_schedule.plot(kind="bar", x="Year", y="Interest", ax=ax, label="30 Years @ 5%")
plt.title("Interest Payments");
```

Obviously there are lots of available options for visualizing the results but this

gives you a flavor for some of the options.

## Closing Out

Thank you for reading through this example. I have to admit that this was one of my

more time consuming articles. It was also one where I learned a lot about how

to work with pandas and use some functions that I did not have much familiarity with.

I hope this article will help others build their knowledge of python and pandas

and might even be useful as a tool to analyze your own mortgage payment options.

## Article Updates

**Nov-26-2016** – Calculation Accuracy:

Based on feedback in the comments and discussions off-line, I realized

that the calculations are not correctly working with the extra principal

payments. After looking into this in more detail, I figured out that

the interest and principal payments do indeed to be recalculated each period

which is proving to be problematic in pandas. I am working on a solution but

in the meantime want to make sure to note the issue.I am keeping the article up since I think it is helpful to

show additional pandas functionality but do regret that the results

are not correct.If you have ideas on how to fix, please let me know.

**Dec-19-2016** – Corrected Article:

- A new article has been posted that contains corrected code to fix the errors

identified above.

**Source From: pbpython.com.** **Original article title: “Building a Financial Model with Pandas”.**

**This full article can be read at Building a Financial Model with Pandas.**

**Advertisement**