NL Opt 3: Operations Analysis

Packages Used: No new packages are used in this notebook.

1. Non-numeric Arrays and Tuples

Non-numeric Arrays

Arrays can have any data type as their elements. Unlike a regular matrix, an array can be used to store arrays of different lengths:

The individual top-level elements in the array are each an array that can be accessed:

To access an element in the second-level array:

To add an element to the end of a second-level array:

To add an element to the end of the top-level array:

Arrays can be used to store text strings:

Many functions can accept arrays of strings as input:

If not sure of capitalization, can convert to lowercase before string matching:

Tuples

A tuple is similar to an array, but differs in that, it is defined using ( ) instead of [ ] and, once defined, it cannot be changed and is immutable. A tuple is used to display multiple outputs:

A tuple can be used to store all of the input arguments for a function:

Define a function of three inputs:

The single-input tuple abc can be used instead to pass to "splat" the inputs a, b, c to the function by using ... following the tuple:

2. Operations Analysis

Based on the name Operations Research, it is not surprising that the techniques of OR are used to analyze the operations of a variety of different systems, including production systems that are involved in manufacturing goods or providing services; for example, a hospital can be thought of as a system that produces healthcare services. In most cases, the OR analysis will involve developing a model of the system and then somehow optimizing the model in the hope that these results can be transferred into improved operation of the system. Nonlinear optimization is the OR technique that has the widest application in operations analysis.

Ex 1: Ultrasonic Testing Operations

A testing center has a device that is able to use ultrasonic waves transmitted within a special solution to detect defects in parts like aircraft engines. The part is placed in a tank that, in order to operate correctly, is filled with just enough solution to submerge the part; the solution is then drained from the tank and disposed of since it can become contaminated during testing. Currently, test results are being delayed because it is time-consuming to fill the tank with the solution. In order to speed up testing, a suggestion has been made to consider pre-filling the tank and then either adding or draining the solution as needed to submerge a part. The following demand data is available regarding the cubic volume of solution that was needed to test recent parts: 237, 214, 161, 146, 331, 159, 423, 332, 139, 327, 152, 98, and 116. Determine the cubic volume of solution that should be pre-filled in the tank to minimize the time required to submerge the part, given that the tank can be filled or drained at the same rate.

Step 1: Display the data

It is a good idea to look at the data to see if there is a pattern to the data and check to see if there are any anomalous, outlier data that could be erroneous. If anything is found, it might be possible to ask whoever provided the data about the issue before doing any further analysis. If some data is in error, then it can be deleted; outlier data should not automatically be eliminated unless it is a real error.

It is easy to display 1-D, a little harder but doable for 2-D, and difficult for 3+ degree data, where the best you can do usually is look at different 1- and 2-D cross-sections of the day. For this data, since there is no time element, a histogram is used instead of a scatter or line plot.

Not seeing anything unusual in the histogram, the next step can be to develop a model that captures the "cost" of different pre-fill levels, where the cost, in this case, can be the time required to either fill or drain the tank from the pre-fill level, where it is assumed that, for a given volume of solution, the fill and drain times are the same since their rates are the same.

Step 2: Drain/fill time as a function of pre-fill level and part demand

$ \quad \mbox{Time: } t(q,d_i) = \bigl| d_i - q \bigr|$

Step 3: Determine pre-fill level that minimizes total time

Checking Robustness of the Solution

Since the solution is only as good as the model and data used to generate it, it is helpful to try to determine how robust the solution is with respect to the data that was used and the assumptions behind the model.

Starting with data robustness, one thing that can be checked is what if there had been an extreme outlier in the data. For example, if the 423 demand was instead 1423, would the solution change significantly?

Although, as expected, the total time increases by 1000, it might seem surprising that the optimal pre-fill level remains unchanged. The reason for this is, for this particular objective function, the optimal solution is the value of the median demand point. Any unit increase in the solution would increase by one unit the times of all of the demands less and decrease by one unit the times of all demands greater than the solution. Since the unit cost of filling and draining are the same and, being the median point, there are equal numbers of points less than and greater than the solution, the optimal solution remains unchanged.

The assumption in the model that is most questionable is assuming that the fill and train rates are the same; this would only be true if the diameters of the fill and drain tubes are the same along with the flow rates being the same. If, after further investigation, it was found that the drain rate was twice the fill rate, then the model can be modified to incorporate this. Since the "cost" of filling and raining is now unbalanced, with filling taking twice as long as draining, the median point will not, in general, correspond to the optimal solution.

$ \quad \mbox{Pre-fill cost: } c(q,d_i) = \begin{cases} 2(d_i - q), & \mbox{if } d_i > q \\ q - d_i, & \mbox{otherwise } \end{cases}$

As expected, since it is more costly to fill than to drain, the optimal pre-fill level has increased and is no longer the median but is still robust with respect to data outliers.

Ex 2: Lee Pharm's Bakery

Lee Pharm owns a bakery and has to decide each morning how many loaves of bread to bake for the day. Any loaves unsold at the end of the day are donated to the local homeless shelter. He would like some help with how to best decide the number of loaves to bake each day. He has available the number of loaves sold for each of the last 13 days:

216, 214, 161, 146, 216, 159, 216, 216, 139, 216, 152, 98, and 116. 

In looking at the data, the first thing noticeable is that the number 216 appears quite often. After talking with Lee, it turns out that he typically makes 216 loaves per day, which corresponds to 18 trays of a dozen loaves. Since there was likely unmet demand any day that 216 loaves were sold, after checking with Lee, it turns out that he also has been recording the number of customers asking for bread each day after it was sold out. After adding in this unmet demand, the adjusted potential demand is:

237, 214, 161, 146, 331, 159, 423, 332, 139, 327, 152, 98, and 116.

After confirming with Lee that these past 13 days of demand are reasonably representative of typical demand, the only other information needed for you to provide some guidance is the average selling price of each loaf and the cost to bake each loaf, which turns out to be \$5 and \$1, respectively. Also, Lee agrees that it is reasonable to assume that any demand for a day that is not filled is lost; this is because most customers consume their bread the day of purchase since it is made without preservatives.

Step 1: Display the data

A line plot is used because the 13 days of demand represents a time series.

Don't see any trend or seasonality that would need to be accounted for in any model of the data. Most importantly, the variability does not show a long-term pattern and so will assume the data is stationary. There are statistical tests for startionarity, but these would not typically be used for so few data points. Since stationarity implies the lack of trend/seasonality, the data can be randomly re-arranged (or permuted), and it should look about the same. We can do this to verify stationarity:

The two plots of a random permutation of the demands also look stationary, lacking trend/seasonality.

Step 2: Determine profit as a function of loaves baked and daily demand

The following determines the daily operating profit if $q$ loaves are baked that morning (i.e., the production rate) and demand turns out to be $d_i$ that day:

$ \quad \begin{eqnarray*} \mbox{Profit: } \pi(q,d_i) &=& \begin{cases} pd_i - cq, & \mbox{if } q > d_i \\ pq - cq = (p - c)\,q, & \mbox{otherwise } \end{cases} \\ &=& p \min \bigl\{ q,d_i \bigr\} - cq \end{eqnarray*}$

Step 3. Establish a baseline and upper bound on total profits

In order to be able to compare the effectiveness of any procedure, a baseline and upper bound on total profit can be determined. The baseline is the total profit associated with using the current practice of baking 216 loaves per day. An upper bound can be determined by assuming that the number of loaves baked each day exactly matches that day's demand (perfect prediction).

Step 4. Determine number of loaves that maximize total profits

Since the demand data is assumed stationary, will not try to predict each day's demand; instead, we will assume historical demand data is representative of the variability of any day's demand and will, as a result, use all of the data to determine the same (optimal) number of loaves to bake each day. Note: using 500 as UB for search since it is likely that daily demand might exceed the maximum (423) found using just the 13 days of data.

The solution recommends that Lee bake 331 loaves each morning, which should result in an over 10% increase in total profits as compared to baking only 216 loaves. The procedure works because the data is stationary, lacking trend/seasonality. If the data has a trend/seasonality, techniques exist that can try to remove the trend/seasonality so that all of the data can be used; otherwise, for non-stationary data, you would use only the most recent data in any estimate. If, for example, sales have significantly increased in the past week and are expected to remain at this higher level for at least the next few days, then the nonlinear optimization could be rerun using just the last week's data.

Mean and Median: By way of comparison in light of the discussion of MSE (L2) vs. MAD (L1) in NL Opt 2, what the mean or median values for demand were used to determine the number of loaves to bake each morning:

After checking with Lee, the fact that the mean value of 218 is close to the 216 loaf value he was using is no accident: it turns out that he averaged an earlier set of historical demand values to determine the 216 value.

Linear Regression

Lee is quite pleased with the prospect of a 10% increase in profits and happens to mention that a large number of his customers preorder their purchases and that the number of orders that he has received each morning seems positively correlated with the number of sales he sees that day. He provides the number of orders available each morning for the past 13 days:

121, 143, 63, 80, 198, 52, 160, 162, 85, 106, 129, 141, and 106

During each day, additional walk-in demand occurs, and some orders are canceled; as a result, the final demand at the end of the day differs from the amount ordered at the start of the day. Nevertheless, it is thought that using each day's preorders to help determine the number of loaves to bake that day can increase total profits.

The correlation of the two time series can be calculated to see how likely the order data may help in estimating demand:

An almost 60% correlation is promising, and so it is likely to be beneficial to include the order data in the analysis. This can be done through linear regression by making the demand a function of order data. Previously, profit was maximized using just the prior demand data. One can think of the demand estimates used previously as single parameter estimation models; linear regression can be used to provide a two-parameter estimation model.

Previousely:

The total profit function needs to change since a different size ($q_i$) is associated with each different demand ($d_i$). In the single-parameter models considered previously, the same size ($q$) was associated with each different demand ($d_i$).

Profit using just order data: What if just the number of orders received by each morning were used to determine the number of loaves to bake that day:

Linear regression using orders and profit maximization as objective:

Question: How would the number of loaves to bake for day 14 be determined if 120 loaves have been ordered by the morning of day 14:

Quadratic Regression

One can consider using other models instead of a line; for example, a quadratic parabola (it is still linear in the coefficients $\alpha$).

The small increase in total profit associated with the quadratic model indicates that it would provide little benefit. Although the 13 data points are too few, with more data, one can perform model validation by randomly selecting a subset of the data to use to train the model (i.e., estimate the coefficients of the model), and then use the remaining data to test what would be resulting total profit using the trained model. In this way, the performance of the linear and quadratic models could be compared.

Multiple Regression

After further discussion, Lee mentions that he also can access each morning the number of unique visitors to his bakery's website over the past day. We can see if this additinal data can help:

A 46% correlation indicates a moderate relationship that may help. In order to effectively use the visitor data, it can be added to the model along with the order data in a multiple regression:

Question: How would the number of loaves to bake for day 14 be determined if 120 loaves have been ordered and there have been 45 unique visitors to the website by the morning of day 14:

Procedure Total Profit
Baseline 8517
Single parameter mean (demand) 8542
Single parameter median (demand) 7592
Single parameter max profit (demand) 9407
Using just order data (orders) 5969
Linear regression (demand; orders) 9854
Quadratic regression (demand; orders) 9856
Multiple regression (demand; orders, website visitors) 9903
Prefect prediction (UB) 11,340

Reading Seven Weeks of Data

The following dataset contains seven weeks of demand and order data for Lee's bakery:

Ex 3: Location of a Monitoring Device

There are twenty-five switches located throughout an industrial site, and a single monitoring device was just ordered, and it will be located so that it can receive data from the switches. There will be an expensive, dedicated communications line from the device to each of the switches it is monitoring. The coordinates of the switches in the site can be generated by running the following code:

using Random
Random.seed!(73248)
P = 100*rand(25,2)

Determine where to locate the device to minimize the length of communications line needed to reach the switches, assuming it can be located anywhere.

The device has arrived and is ready to be installed, but it turns out that it only has five ports (next time, the specs will be checked before ordering). Luckily, the communications line has not yet been ordered, and so now the problem is to determine where to locate the device to minimize the length of the communications line needed to reach five switches.

Warning: Automatic differentiation does not work for this problem. It could be because some of the objective function code is not pure Julia at a low level (?), but beware that the solution is reported as being a "success"!