Build AI data lakes to optimize data-driven collaboration, safeguard planning and forecasting, and drive supply chain cost efficiencies that protect profits and grow your income
Imagine getting thousands of tables and 300+ different spread sheets. Each tab for each workbook has different data and some of it does not have columns. There are no column headers. You can dump all of this data into a large language model (LLM) and it should figure out what something is. But first you have to train your LLM.
This process is feature extraction and it’s used when deploying AI in manufacturing supply chains.
When manufacturers with extended contract manufacturing supply chains like Dell, General Motors or Samsung contract services like Snowflake, Databricks, Azure Synapse or BigQuery…it’s a multi-million dollar project.
This is because ETL data cleansing – which is the process of extracting, transforming and loading (ETL) combined data from multiple sources into a large, central repository called a data warehouse or data lake to ensure manufacturers use only quality and relevant data – can last forever.
Bayesian networks and data processing
ETL data processing is simple but good ETL requires domain expertise for accurate classification. (e.g., subject matter experts) Selecting the right people is important and if the right process is in place ETL projects don’t have to last forever.
In most large manufacturing enterprises ETL can be performed in four or five meetings of up to 30 people. It should also be mandated that every person identified to attend these ETL meetings must attend each meeting.