Increasing efficiency with the lowest possible consumption of materials is the biggest challenge for car factory owners. Read how we used data science to help a leading brand on the global automotive market to improve the prediction of materials consumption.



Our client, a German car manufacturer, wanted to predict daily glue consumption based on the type and number of parts that enter the factory on a given day. The purpose of daily consumption prediction was to improve the planning of glue stores and save unnecessary purchases.

The client turned to the data scientists team from Stremedia to verify the idea of the internal team, refine the business side of the project and conduct analysis using artificial intelligence (machine learning). 



Our data scientists conducted a full business interview to match the most adequate solution. They collected information about how the factory works, what data the customer has, and when data is collected, to create a model for data analysis.


The challenge that was encountered is the client had only a small amount of data. For this reason, regularized linear models with feature elimination were used in analytical work.

For each model that we are building, we go through iterations and repeat the below steps(we start with all the car parts amounts as features):
  • 1. Cross validate (check the performance of) the model with the car parts data that we still didn’t remove.
  • 2. Remove the part that got the lowest ratio (“relevance score” from the model)


Looking at the average cross-validation errors performed on smaller and smaller sets of parameters, we decided on the optimal number of parts that enter the model. For each value described – with the help of cross-validation and correlation analysis – features were developed that allow the value to be described best and a suitable model was built.



Pic. 1 A set of test data to check the model


In the chart on the left (chart of expected consumption relative to the real one), the closer the dots are, the better. On the right, blue is real adhesive consumption, and yellow is expected consumption. They are sorted so that you can see how high the values are. This means that a significant amount of glue is used.


For more data, the team would consider using random forest or simple neural networks. With huge numbers of data points, even more complex neural architectures or gradient boosting methods like XGBoost or Light GBM would be used.


As a result, we improved the quality of materials consumption prediction by 15%, according to the weighted mean absolute percentage error. In a longer perspective, this prediction leads to more accurate planning and waste reduction in the manufacturing process.


In the abovementioned work, the model used jupyter environment for Python and the scikit learn and pandas libraries.