Tuesday, November 01, 2005

Real World Model Building

I have recently won a contract to build a cost and schedule prediction model for a government office. We are basing our work on the well-established COCOMO model that has been evolving for 25 years for estimating the cost and schedule of software development projects.

Data collection to drive our study is one of the biggest problems. Based on COCOMO and some other similar models we know that the typical number of variables that are included in these models is in the area of 20. Our research indicates that over the span of 20 years, the COCOMO researchers at USC have only collected 180-ish data points to build and validate their model.

Last night we had dinner with an MIT researcher who is building a variation called COSYSMO. So far, he has collected 42 data sets to drive his model with 18 variables.

If our own model has 18-20 variables, then we believe we would need about 80-100 data points to arrive at some decent confidence level.

At the outset we are expecting to work with about 5 data points.

How can we possibly build a model with 18 variables from only 5 data sets? We must rely heavily on the work that has been done before. As we change or add variables, we must keep our eye on the fact that any changes are exchanging a value that has move validity than the one we are adding.

Luckily, this class has helped me appreciate the situation we are in. It is too soon to say much about how hard this is going to be.


Post a Comment

<< Home