## Welcome to Dijemeric Visualizations

Where photography and mathematics intersect with some photography, some math, some math of photography, and an occasional tutorial.

## Wednesday, May 12, 2010

### Strategies for Sampling in a Potato Warehouse

You rent a warehouse in the Mission District of San Francisco for the purpose of storing potatoes.  The storage space is not big with barely enough capacity to hold 5 tons, or 10,000 pounds of potatoes.  The owner of the warehouse is concerned that you might try and exceed that quantity and has placed a penalty clause in the contract if you do.  He also requires an annual inventory of the quantity of potatoes you are storing.

You hire a high school student to weigh potatoes so you can determine how much is in a warehouse.  To estimate the weight of potatoes in the warehouse without paying the student to weigh each and every potato you've decided that you will have him make no more than 100 weighings, randomly selected.  Once the potatoes are sampled and weighed you will estimate the total weight by multiplying the average weight per gallon of potatoes times the total volume of potatoes in gallons in the warehouse (you know this because you have a calculator and are good at math).

To do this weighing, would it be better to have the student weigh lots of individual potatoes one at a time or weigh composites of several potatoes at a time in fewer collections of a gallon each?  And should you caution the student to take care in weighing?  Does it matter if you are just a little bit over the 10,000 pounds?  Will the weighings more likely show you are over if the weighings are precise?

The answers may not be immediately obvious and depend on at least three variables: 1) the precision of the student in taking the weights and volumes, 2) the difference between the actual 'true' weight of potatoes in the warehouse and the penalty threshold, and 3) whether the actual weight is above or below the threshold.  To test how these three variables can affect the outcome, I have developed A Monte Carlo Model for Sampling Potatoes or Anything Else.  Try it for yourself.

Enter the measurement precision (also known as percent relative standard deviation  or %RSD) in cell B3, the potato penalty threshold in cell B4,  and the expected true weight of potatoes in cell B5.  The outputs are in cells D7 (percentage hits exceeding the penalty threshold for weights of single potatoes) and E7 (percentage hits for collections of potatoes).

The small chart displays overall statistics of average, standard deviation (a measure of variability), RSD (a measure of precision), and number of Hits (number of times the given sampling method exceeds the penalty threshold).  In the example shown here, the averages for both methods exceed 10,000 pounds and the number of hits are 4800 for the single potato approach and 5500 for the collection of potato approach.  The true value was set at 10,000 and precision at 50% (i.e., sloppy technique) for the default settings.

The long chart displays the first ten samplings of the student under the two different sampling methods and can be used as a check to track his performance.

Try entering different values for RSD, threshold, and true weight and see what you get.  For example, if the student is sloppy does it make any difference if the threshold is above or below the true value?  If the student is very precise can you get closer to the threshold without a penalty?  And don't be timid about the actual weight of potatoes.  You might try putting in more than 10,000 pounds and see if you can avoid the penalty!