Maturity Levels of OR Projects

We recently went through a re-org in my organization. I am now horizontal, responsible for raising the bar on all things Operations Research in our corner of the world. This is a pretty ideal role for me. I get to learn a lot, focus on math, and act as a force multiplier.

I previously worked for more than six years as an Operations Researcher at the RAND Corporation. My new role reminds me of that job. Somewhat disappointingly, there were no reverse vampires or “run[ning] the world in a Malthusian manner.” Investigative journalism for an outlet that emphasizes not having a point of view and has a low readership would be a more apt put-down. I digress.

What does it mean to raise the bar on OR work? My previous post was a description of, in my humble opinion, a successful effort. But can we generalize? I have seen a number of Medium posts on maturity levels in Data Science and Machine Learning. Let’s define maturity levels for OR projects, starting with the basics but also mentioning features I rarely see in practice.

The Basics

The core of an OR project is a specific type of mathematical model (which I went over previously). An OR project should have obvious objective function(s) being maximized or minimized. The objective function(s) should be consistent with business goals. The objective function(s) should also be closed-form expressions of decision variables and model parameters.

Are there any unnecessary variables or model parameters in the objective function(s)? A lot of times I see complicated expressions of input data which could be computed prior to model formulation and solution. It’s not always necessary to simplify things but it can help us to focus on the core mathematical problem and is a sign of maturity with respect to OR.

Was the objective function defined in a way that makes it amenable to optimization, for example a linear function of decision variables?

Does the project have defined constraints? Are these also consistent with what business wants or expects? Are they closed-form expressions, inequalities involving decision variables that appear in or otherwise affect the objective function(s)?

Are the constraints as simple and amenable to optimization as they can be? Could any of the constraints be merged? Could any be eliminated without affecting the optimal solution? This is surprisingly common. An example might involve a constraint that places a lower bound on a weighted summation of terms that the objective is trying to maximize. Do any constraints boil down to just fixing the values of certain decision variables? If so, it would probably be better to just fix those variables directly.

Is the overall formulation a canonical optimization problem? It typically isn’t, but I’ve noticed that mature OR projects are more likely to be based on solving a canonical problem. There are often known, high quality solution algorithms to such problems. It takes an OR expert to recognize that the project boils down to, for example, the set packing problem.

Is it clear how the overall formulation is optimized? Which commercial solver or custom algorithm is used? What are the stopping conditions for the solver / algorithm? What happens when an optimal solution has not been found after some stopping condition is met?

When I arrived at Walmart, there were multiple projects where people had defined reasonable looking objective functions but never mentioned constraints. It was interesting; they were using (some of) the language of OR but never got as far as having an actual formulation.

Some projects have objective functions, or less frequently constraints, that call on other functions. An example that seems to be getting more common is the objective function that is the output of a Machine Learning model. These projects can involve relatively sophisticated customized solution algorithms. They almost have to since you can’t pass most commercial solvers a formulation containing black box calculations. I would still classify such projects as relatively immature in terms of OR. There was likely little thought given to how the mathematical form of the ML model affects the runtimes or even the solutions of the optimization.

What are the differences, if any, between the objective functions and the metrics being used by business and managers to evaluate outputs? They don’t have to be the same. In fact, we should be cynical of results showing that an optimization-based approach performs better than an alternative algorithm on a measure that the optimization-based approach maximizes or minimizes. Of course it does! On the other hand, differences between objective functions and metrics that people care about can be a sign that the formulation needs to be refactored.

Documentation and Project Management

I am a firm believer in documentation. This is the area where commercial projects are least likely to be mature by my standards. It may not seem like a rational choice for a startup to invest developer time in documentation. But documentation can speed up and improve collaboration.

One of the best things we did early on in my prior project was to document data input and outputs requirements. Expectations were set and then met. This is probably the most important form of documentation to have.

Meaningful improvement on OR projects often comes from changes to the formulations at the heart of the projects. It is much easier to consider such changes if formulations are properly documented. A mature OR project will have its formulation(s) including objective function(s), constraints, decision variables, and model parameters described in a text document. The document will be available to everyone on the project and to teams working with the project team. The document will include both the closed-form mathematical expressions mentioned above and text explaining them.

Going a step further, it would be a good sign if some document(s) have been presented, reviewed, and published outside of the project team. Ideally in a scientific outlet. None of this is necessary, but it does demonstrate a level of maturity on a project. Doing so ensures that outside reviewers have looked at the details of a project and provided feedback. Uber surge pricing is the best example I’ve seen here.

The documentation should motivate design decisions. This is particularly important if a custom solution algorithm was developed. Why was this type of algorithm developed? What are some alternatives? Why are they less suited to the problem at hand? If the algorithm is a heuristic, is there any proof that it would not be practical to solve the problem to optimality? Most published scientific papers in OR fail to answer some of these questions, in my humble opinion.

Time spent doing basic documentation quickly pays for itself. For example, onboarding is much slower and less effective without documentation. Managers could still not prioritize documentation if their incentives aren’t set properly. Developers could still avoid or do a terrible job at documentation, again especially if incentives aren’t set properly. Project management can help.

It is the role of the project manager to ensure that there is time set aside to document. A mature OR project will further have project management practices that account for investigation of interesting results and debugging as needed.

Ideally, there would also be time, space, and the right staff set aside to perform research on potential improvements to the formulation, code, and solution algorithms. These are all things that are difficult to plan for in the poorly managed “agile” processes that have infested data science. But without them, project methodologies become outdated and code bases become unwieldy. It was understood at Uber that there would be continual research into the matching algorithm(s).

The Code, Automation, Etc.

The most important part of a tech project is of course the code. The code should be efficient and relatively easy to maintain. It is particularly important on an OR project to separate the sections of the code responsible for:

collection of data inputs,
model representation and storage,
model solution, and
post-processing and delivery of outputs.

There can be benefits to using different methods, languages, and/or developers for these various parts of an OR project.

Let’s focus on the modeling parts of the project. Transparency should be the highest priority here, in the parts of the code that are typically the most difficult to follow. Someone familiar with the mathematical modeling language or library used should be able to quickly understand the formulation from the code.

I now use the pyomo library in python, use separate functions to define each objective function and constraint, and have separate files defining the model parameters, decision variables, objective function(s), and constraint sets. This seems to help.

One nice thing about pyomo as a modeling language is that it’s simple to change the solver. The importance of being able to swap solvers was brought up by a colleague at Walmart. I now take it as a sign of a mature OR project.

Articles on the maturity of ML projects often focus on automation. We can apply the same lens to OR projects. Is there an automated pipeline that goes through the various parts of the OR code mentioned above in sequence? Are there policies and processes in place to run the pipeline? Are inputs and outputs automatically tested? All of these would be signs of a mature OR project, although not all are necessary for all OR projects.

That’s all I can think of at the moment. Did I miss anything? Let me know.

Maturity Levels of OR Projects

The Basics

Documentation and Project Management

The Code, Automation, Etc.

Leave a Reply