How do you know about the wine quality? How do you decide which wine is going to get better with time? How do people decide about the price of the wine even before it matures?

Wine Analytics: How do you know about the wine quality? How do you decide which wine is going to get better with time? How do people decide about the price of the wine even before it matures?

These questions might be leaving you with a vision in mind about a wine taster or a Master Sommelier.

Does there exist a relationship between wine price prediction, Statistics and Machine Learning? If the answer is “No” then you are humbly requested to change your opinion. This is what the Princeton University Professor Orley Ashenfilter proved in early 1990s.

You must have heard of Bordeaux wines. Bordeaux is a place in France and it is famous for its wines.

The Bordeaux wines taste better as they get older, and as a result also become more expensive. There is a tendency among wine producers to store the wine and let them age, as older wines fetch more price. It is very difficult to predict the taste of the wine in its early days as the taste changes drastically with time.

The wine producers take the help of wine tasters or Sommeliers in judging which wine will mature to good taste and quality. These wine tasters would normally taste the wine and give their opinion.

Prof. Ashenfelter being a wine lover was keen on decoding the art of wine price prediction but without tasting the wine. He used the age old technique of Linear regression and proved that he could predict the price of the wine using this statistical technique. Linear regression basically uses a linear combination of input variables to predict the output or the dependent variable.

IndiQa Analytics Best Artificial Intelligence Courses, AI & Machine Learning Wine Analysis

Wine Analytics

In statistics, linear regression is an approach to model the relationship between a dependent variable and one or more independent variables. The case of one independent variable is called simple linear regression. For more than one independent variable it is called multiple linear regressions.

Price of wine is dependent on independent factors like rainfall, number of years it is stored etc. The choice of what all can affect the price of wine is a subjective one, and treads into the realm of subject matter expertise and creativity. The data heads or variables used in the study were:

Year: year in which grapes were harvested to make wine.
Price: logarithm of the average market price for Bordeaux vintages according to 1990–1991 auctions. The price is relative to the price of the 1961 vintage, regarded as the best one ever recorded.
WinterRain: winter rainfall (in mm).
AGST: Average Growing Season Temperature (in Celsius degrees).
HarvestRain: harvest rainfall (in mm).
Age: age of the wine measured as the number of years stored in a cask.
FrancePop: population of France at Year (in thousands).


he equation includes b0 which is the Y-intercept and the b1 to bp ,which are the slope terms for different variables represented by the Xs (eg: amount of rain fall, temperature in growing season etc). The parameters(b0, b1 to bp) are derived in a way which ensures that the predicted value of the dependent variable is closest to the actual value of the dependent variable and that the deviation between them is minimized.

Wine Analytics

In case of Prof. Ashenfelter’s Linear regression the linear combination of variables were as under:

Price of Wine(Y) ~b0 +(b1)Age + (b2)AGST*+ (b3) Winter Rain + (b4)Harvest Rain

*Average Growing Season Temperature

Wine Analytics

Prof. Ashenfelter used the data of past few decades to arrive at these beta values. Having arrived at the equation and the beta values, predicting the price of wine becomes an exercise of substituting value of variable/s in the linear regression equation/s. Once we feed this equation with the data about these input variables, the Linear regression is going to come up with a price which is the dependent variable.

Using this technique Dr. Ashenfelter claimed that he could predict the taste of the wine very much in advance and that too without even tasting a drop of wine. The wine tasters refuted his claim and termed it to be a sham. However Ashenfelter could prove that his predictions were as good as the predictions of the Sommeliers.

To conclude, linear regression can be used for making a prediction of any continuous dependent variable such as the sales of particular product given the amount of advertisement spent on it; or the sale price of a property given its carpet area and other associated variables. It is an extremely powerful tool when it comes to predicting a continuous variable and Ashenfelter successfully used this technique to predict the price of wine without the need of even tasting a single drop.

Need experienced data-driven assistance ?