How Economists Convert Quarterly Data into Monthly: Cubic Spline Interpolation

January 20, 2010

A common problem economists face with time-series data is getting them into the right time interval. Some data are daily or weekly, while others are in monthly, quarterly or annual intervals. Since most regression models require consistent time intervals, an econometrician’s first job is usually getting data into the same frequency.

In this post I’ll explain how to solve a common problem we’ve run into: how to divide quarterly data into monthly data for econometric analysis. To do so, we’ll use a method known as “cubic spline interpolation.”

Cubic Spline Interpolation
One of the most widely used data sources in economics is the National Income and Product Accounts (NIPAs) from the U.S. Bureau of Economic Analysis. They’re the official source for U.S. GDP, personal income, trade flows and more. Unfortunately, most data are published only quarterly or annually. So if you’re hoping to run a regression using monthly observations — for example, this simple estimate of the price elasticity of demand for gasoline — you’ll need to split these quarterly data into monthly ones.

A common way to do this is by “cubic spline interpolation.” Here’s how it works. We start with n quarterly data points. That means we have n-1 spaces between them. Across each space, we draw a unique 3rd-degree (or “cubic”) polynomial connecting the two points. This is called a “piecewise polynomial” function.

To make sure our connecting lines form a smooth line, we force all our first and second derivatives to be continuous; that is, at each connecting point we make them equal to the derivitive on either side. When all these requirements are met — along with a couple end-point conditions you can read about here — we have a (4n-4) x (4n-4) linear system that can be solved for the coefficients of all n-1 cubic polynomials.

Once we have these n-1 piecewise polynomials, we can plug in x values for whatever time intervals we want: monthly, weekly or even daily. The polynomials will give us a pretty good interpolation between our known quarterly data points.

An Example Using MATLAB
While the above method seems simple, doing cubic splines by hand is not. A spline for just four data points requires setting up and solving a 12 x 12 linear system, then manually evaluating three different polynomials at the desired x values. That’s a lot of work. To get a sense of how hard this is, here’s an Excel file showing what’s involved in fitting a cubic spline to four data points by hand.

In practice, the best way to do a cubic spline is to use MATLAB. It takes about five minutes. Here’s how to do it.

MATLAB has a built-in “spline()” function that does the dirty work of cubic spline interpolation for you. It requires three inputs: a list of x values from the quarterly data you want to split; a list of y values from the quarterly data; and a list of x values for the monthly time intervals you want. The spline() function formulates the n-1 cubic polynomials, evaluates them at your desired x values, and gives you a list of interpolated monthly y values.

Here’s an Excel file showing how to use MATLAB to split quarterly data into monthly. In the file, the first two columns are quarterly values from BEA’s Personal Income series. Our goal is to convert these into monthly values. The next three columns (highlighted in yellow) are the three inputs MATLAB needs: the original quarterly x values (x); the original quarterly y values (y); and the desired monthly x values (xx).

In the Excel file, note that the first quarter is listed as month 2, the second quarter as month 5, and so on. Why is this? BEA’s quarterly data represent an average value over the three-month quarter. That means they should be treated as a mid-point of the quarter. For Q1 that’s month 2, for Q2 that’s month 5, and so on.

The next step is to open MATLAB and paste in these three columns of data. In MATLAB, type ” x = [ ", cut and paste the column of x values in from Excel, type " ] ” and hit return. This creates an n x 1 vector with the x values. Repeat this for the y, and xx values in the Excel file.

Once you have x, y, and xx defined in MATLAB, type “yy = spline(x,y,xx)” and hit return. This will create a new vector yy with the interpolated monthly y values we’re looking for. Each entry in yy will correspond to one of the x values you specified in the xx vector.

Copy these yy values from MATLAB, paste them into Excel, and we’re done. We now have an estimated monthly Personal Income series.

Here’s an Excel file summarizing the above example for splitting quarterly Personal Income data into monthly using MATLAB. Also, here’s a MATLAB file with the x, y, xx, and yy vectors from the above exercise.


New Study of the Waxman-Markey Cap-and-Trade Bill

September 30, 2009

We’re pleased to release our latest study today, examining the economic impact on U.S. households of the Waxman-Markey cap-and-trade bill (H.R. 2454).

The study is basically a critique of recent Congressional Budget Office (CBO) distributional estimates that suggest the bill’s impact is likely to be progressive across income groups. We find the bill is much more likely to be regressive across income groups once the microeconomic response of regulated public utilities is taken into account. Under this framework, we estimate the bill will result in net benefits to the nation’s highest-earning 20 percent of earners, while imposing net costs on the lowest-earning 80 percent of U.S. households.

Check out the full study and news release here.


New Study of Household Burdens from a U.S. Cap-and-Trade System

March 16, 2009

We’ve released a new study today exploring the likely cost to U.S. households of a typical cap-and-trade system aimed at cutting carbon emissions by 15 percent. The study uses a standard input-output model to estimate how the costs of cap-and-trade regulations will be borne by households in various income groups, age groups, family types and U.S. regions. The study is No. 6 in the Working Paper series at the Tax Foundation in Washington, D.C.

You can view the full study here.


Core Concepts: The Economics of Tax Incidence

December 13, 2008

All good economics starts with theory. The world is a complicated place—far too complex to make sense of directly. Economic theory helps collapse that complexity into a few key relationships we can work out mathematically and check against the facts. The first step in every analysis is to sit down with pencil and pad to work out the theory.

To help our clients better understand the economic theory underlying our work, we’ll be posting an ongoing series of articles titled “Core Concepts.” The goal is to provide a collection of simple and brief introductions to the core theoretical concepts used by Chamberlain Economics, L.L.C.

As the first in the series we’ve posted “Core Concepts: The Economics of Tax Incidence“. The piece is designed as a refresher on the basics of tax incidence, and how it’s derived analytically from elasticities of supply and demand in the marketplace. This idea serves as the foundation for nearly all of our work on tax modeling and policy analysis.

Check out the article here.


Creating an Input-Output Table from BEA Data: A How-to Guide

November 20, 2008

One of the hard parts about building Leontief input-output models is that the source data are hard to use.

Instead of producing a traditional “square” input-output table, the Bureau of Economic Analysis (BEA) produces what they call “use” and “make” tables, which are both “rectangular”—that is, the don’t have the same number of industries and commodities. The make table shows products produced by each industry, while the use table shows how products get used by industries, consumers and government. However, what we need for Leontief models is a table that shows only the industry-by-industry relationships, with each industry producing one commodity.

In this post I’ll explain how to create a “square” input-output table from BEA’s “rectangular” make and use tables. At the bottom, I’ve posted a spreadsheet with an I-O table I developed from the new 2002 BEA Benchmark Input-Output Data.

Building the I-O Table
The first step is to download the BEA’s use and make tables. The easiest ones to work with are the “Standard Make and Use Tables at the summary level,” which you’ll find at http://www.bea.gov/industry/xls/2002summary_makeuse.xls. These have 133 industry groups, which is enough detail for most research.

Once you’ve downloaded them, open the use table. Delete the two rows labeled “Noncomparable Imports” and “ROW adjustment”. There’s no domestic industry that produces these things, so you won’t need them in your industry-by-industry table. Next, label the intermediate-uses portion of the Use table “U”—you can label ranges in Excel by typing names in the upper-left box in the toolbar.

Next, open the make table. This is a 133×134 table showing industries down the rows and the products they make across the columns. Below the make table, create a new 133×134 matrix where each element is equal to the corresponding make table element divided by the column total. Label this matrix “M”.

Now we’re ready to build the I-O table. The classic table has three sections, which we’ll build one at a time: intermediate uses, final demands, and industry value added.

First, let’s build the intermediate-uses section. Call this “S”. You’ll calculate the “S” matrix by multiplying M*U, which we’ve defined above. This will create a 133×133 matrix that’s the intermediate-uses part of the table. Set this aside for now.

Next let’s calculate the final demands part of the table. That’s the right-hand section showing how much output from each industry is used by consumers, government and the rest of the world. In the use table, label the final demands portion of the table “D”. Create a new matrix labeled “Di” by multiplying M*D. This gives us a 133×13 matrix of final demands for the I-O table. This Di matrix will sit to the right of the intermediate portion of the input-output table (“S”) we calculated above. As a final step, add a column on the far right of the table that sums the intermediate uses and final demand for each row. This is total output for each industry.

Finally, let’s fill in the value-added section at the bottom of the table. To do this, first copy the total industry output we added to the far right column and transpose it into the bottom row of the table. This forces the row output of each industry to equal their column output, which is a fundamental equality in I-O tables.

Once you’ve pasted in the total output line, you’ll need to use the entries in the “Gross Operating Surplus” row in the table as balancing items to get the column totals for each industry to equal the row totals.

Once the row and column totals are equal—that is, output used for intermediate uses + final demands = inputs purchased + value added for all 133 industries—we’re done with our I-O table.

Bottom Line: 2002 Input-Output Table for the U.S.
Following the process above, you should end up with an I-O table that looks something like this. Here’s my own table built from the 2002 BEA Benchmark tables, which you’re free to use:

2002 U.S. Input-Output Table

Once you’ve got an I-O table like this, it’s pretty easy to turn it into a Leontief model. This will let you model the distributional impact of carbon taxes, measure tax pyramiding of gross receipts taxes, and more.


New Study of Business & Occupation Tax Pyramiding

November 2, 2008

We’ve released the latest Chamberlain Economics study this week, which examines tax pyramiding from Washington State’s Business & Occupation (B&O) tax.

Gross receipts taxes like the B&O tax work like a sales tax, except they apply to business inputs as well as final goods. For a baker selling loaves of bread, the flour, electricity and packaging are all taxed first, then the loaf itself is taxed when sold to consumers. These extra layers of taxation get quietly built into the final selling price—something economists call “tax pyramiding.”

Here’s the abstract for the piece:

Using newly released 2002 Washington State input-output data, we provide the first estimates of tax pyramiding from the state’s Business & Occupation (B&O) tax since 2001. We find tax pyramiding is more severe than found by previous studies that did not distinguish between imported and domestically produced products. We find the B&O tax pyramids an average of 3.0 times, ranging from 1.6 times on architectural, engineering and computing services to 16.7 times on petroleum and coal products manufacturing.

A file with some tables of findings is here. If you’d like to learn how we can develop an input-output model like this for your own study, give us a call today.


How Economists Measure Price Elasticity

October 19, 2008

(See also this post about converting quarterly data into monthly using cubic splines interpolation.)

Like most concepts in economics, price elasticity is easy to talk about but hard to measure.

In this post, I’ll show you how Chamberlain Economics measures demand elasticities in the real world. We’ll develop a simple theory, write it down mathematically, find some data and crunch the numbers in Excel. At the end, I’ll hand over a spreadsheet with our own elasticity estimates for retail gasoline that replicate the numbers from a well-known recent econometric study.

Start with Theory
Our goal is to estimate the price elasticity of demand for retail gasoline. The first step is to start with a theory about the demand for gas.

The simplest theory is that we know gasoline — like everything else — should have a demand curve. What should it look like? In the simplest case, it should be driven by two things: the price of gas, and how much income people have. If gas prices rise consumption should fall; conversely, if income goes up gas consumption should rise also.

So there’s our theory. Now, let’s write it down mathematically. If gas demand is a function of prices and income, one way we can write it is like this:

G = a*P + b*Y

Where:

G = Gallons of gas demanded per year
P = The price of gas
Y = Average income in the economy
a, b = Coefficients for the magnitude of the impact of prices and income on gas demand. (Note: According ot our theory, the “a” coefficient on prices should be a negative number and the “b” coefficient on income should be positive.)

Now that we’ve got a theory, the next step is to translate it into a form we can estimate in the real world. Think of the theory as an architect’s drawing — it’s a guide, but our goal is to actually to build it with hammer and nails.

To do this, think about real-world factors that might complicate our simple theory. For one, we should probably control for population by using per capita figures. Next, we should control for inflation by inflation-adjusting everything. Finally, we should control for seasonal variation somehow, since gas demand always peaks in summer and slows in winter.

Taking these messy details into account, here’s how we translate our theory into a relationship we can actually estimate. Economists call this “specifying the model”:

Gij = A + a*Pij + b*Yij + ei + eij

Where:

G = Per capita gas demand in month i and year j
A = The y-intercept term in our linear demand curve
Pij = The inflation-adjusted gas price in month i and year j
Yij = Real per capita disposable personal income in month i and year j
a, b = Coefficients on price and income
ei = A dummy variable for the month of the year to control for seasonal variation (there are actually eleven of these, one for each month January through November; they’re one if it’s the month in question and zero otherwise); this is called “seasonal fixed effects”
eij = A mean-zero random error term for month i and year j.

This way of specifying our model is called “linear”. This isn’t the only way to do it — at the end I’ll mention another way called “double log” that has some advantages. But for now, we’re ready to collect some data and run a regression.

Go Find the Data
The best source for data is always official government sources. Here’s the data we’ll use for this:

1. Gallons of gas demanded: We’ll use data from the U.S. Energy Information Administration. It’s called “product supplied”. The numbers are in barrels, so you’ll have to multiply them by 42 to convert them to gallons:

http://tonto.eia.doe.gov/dnav/pet/hist/mgfupus1m.htm

2. Gas prices: We’ll use numbers from the U.S. Bureau of Labor Statistics here. It’s from their “average price data” series, and it’s the monthly retail price of gas:

http://www.bls.gov/cpi/home.htm

3. Income: We’ll use data from the U.S. Bureau of Economic Analysis for this one. It’s called “disposable personal income”, and it comes from line 26 on Table 2.1 from the National Income and Product Account (NIPA) tables:

http://www.bea.gov/national/nipaweb/SelectTable.asp?Selected=N#S2

4. Something to inflation-adjust prices and income: For this one, we’ll use the implicit price deflator for GDP from the Bureau of Economic Analysis. It’s on line 1 of NIPA Table 1.1.9:

http://www.bea.gov/national/nipaweb/SelectTable.asp?Selected=N#S1

5. Population figures to turn gallons and income into per capita figures: This is a hard one, because we need monthly figures and the U.S. Census Bureau only produces annual figures. Also, it’s hard to piece together a consistent series from before and after each decennial census. Thankfully I’ve done the hard work for you — the Excel files below include a monthly population series I put together myself.

Once you’ve compiled these data in columns in an Excel sheet, you’re ready to run your regression. When you do, you should find something like this:

For 2000-2007, the coefficient on gas prices should be about -1.07, and the coefficient on income should be about 0.0007. Using the formula for price elasticity of E = (Average price over the period/Average quantity over the period)*(price coefficient), that implies a price elasticity of demand of about -0.048 and an income elasticity of about 0.51.

And that’s about what we’d expect. We know short-run demand for gas is inelastic, and has a negative relationship with prices. And we know that income should be positively related to gas demand, which it is.

The above method is based on a well-known 2006 study from Hughes, Knittel and Sperling which you can read at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=930730.

For those who’d like to see the final product, here are Chamberlain Economics’ own elasticity estimates for gasoline. The first file uses the linear specification above. The second one uses a “double log” specification, which basically takes the log of the data. The big advantage of the latter is that the regression coefficients are also the price and income elasticities, which is handy:

Price elasticity of demand for gasoline: Linear model.
Price elasticity of demand for gasoline: Double log model.

If you want the full data as STATA files instead, here they are:

Price elasticity of demand for gasoline: Linear model.
Price elasticity of demand for gasoline: Double log model.

Questions? Give us a call here.


We’re Open for Business

October 12, 2008

Every think tank, business and government agency has studies they’d like done. But they often lack the technical skills and manpower to do them. Our goal at Chamberlain Economics is to expand the research capabilities of these small organizations, offering them an affordable way to produce the sophisticated and clear economic analysis they need.

Our economists offer custom economic studies like this, elegant nonprofit communication pieces like this, and easy-to-use econometric models like this and this. Give us a call today to learn how we can help.