Deep Learning Applications & Advanced Analysis in Hydrocarbon Refineries

Deep learning is an umbrella term that covers many things. In a strict sense, it is using computer programs to do what intelligent humans could do, and often doing it even better. Deep learning (cognitive science), which is the most popular computer science course now in USA and European universities.

Five Attributes of Deep Learning

The cognitive tasks of AI can be divided into five categories:

(1) Perception.

(2) Learning.

(3) Forecasting.

(4) Reasoning.

(5) Coordinating.

With perception, deep learning can understand the environment with sensing, and detect and recognize occurrences; is that smell a fuel leak? From this it can learn by synthesizing that information into knowledge; this could be learning the relationship between temperature set points and distillate yield.

Extract value from the generated data by being able to forecast with high precision. We can simulate outcomes such as reservoir performance (IPR) at various topside operating conditions.

When it comes to solving logical problems, or reasoning, deep learning can make decisions or suggest the best solutions; given what I know, what is the optimal distribution of my products at different terminal sites?

Despite the expanding range of problems deep learning can solve, there is one thing which no deep learning program has been able to replace humans in: defining the problem itself.

Given the obvious benefits that can be derived from adopting ML in refineries, what are the challenges that downstream oil and gas companies face when they embark on a program?

One of the biggest mistakes that companies make is that they embark on deep learning without first defining the problem. They collect lots of data, but do not know what to do with it, since they do not know what problem they are trying to solve by collecting all this data.

Machine Learning roles in Dynamic Nodal Analysis

One of the most popular subcategories today is machine learning. In fact, machine learning has become so popular that many people equate machine learning to deep learning.

Machine learning is popular because it overcomes scientific unknowns through large quantities of historical data, and hence has made fortunes for companies that in the past found their data too complex to interpret.

Machine Algorithm Definition:

Machine learning is based on pattern recognition, and machine learning methods consider all data as either inputs (features) or outputs (prediction). Multiple inputs are fed into an algorithm that produces an output. If the output does not match the actual data, the algorithm is tweaked to do better next time. This is called training in machine learning.

Because machine learning relies on large quantities of data about the same subject, it is better at very focused problems and parameters, such as what is the relationship between vibration and engine failure?

Machine learning behaves poorly when the problem is a system problem with more complexity, such as a refining process or a logistics supply chain for oil that has many moving parts, which prevents repeating patterns.

It can also struggle when most of the information is domain specific, such as the pressure setting on the steam boiler that has a certain relationship with the steam energy generated and subsequently the processes in the distillation column. Such domain-specific information from the data cannot be utilized unless an engineer or data scientist has spent time to structure and correlate the data to correctly represent the relationship between them; this is something that machine learning cannot replace. The cost of this manual work is often ignored when companies want to train their data. They end up not having meaningful conclusions.

Another problem occurs when time and sequence are important. Most machine learning programs do not incorporate time-based patterns. For example, the best way to predict the loading queue at the terminal in the next hour is to count the current queue length. Fuel demand estimates at a retail fuel station require information such as which month of the year and which day of the week it is in order to predict more accurately. This is where time series come in. The central point that differentiates time series problems from most other statistical problems is that in a time series observation are not mutually independent. Rather a single chance event may affect all later data points.

Yet, existing time series technology alone does not solve all the new problems either. Enterprises are trying to aggregate and store all data in time series format, which understands time, but misses all domain correlations. This correlation across the domain of operations is critical for gaining contextual intelligence. Even though historian has been a familiar technology to first use, it is not sufficient.

Companies should consider the nature of the problems before they invest. You need the right AI tool based on the problem you have defined. Define the problem first, so that you can select the right tool. Do not make the auto industry’s mistake.

Categories of problem in downstream oil and gas

1-Scheduling/ allocation/ coordination problems:

2-Process optimization:

3-Monitoring, detection, faster responses:

4-Supply chain logistic:

Terminal Efficiency

In downstream terminals, maximizing loading efficiency can have a significant impact on the performance of terminal operations. Scheduling is a complex process (truck arrival time, terminal queue, loading bay queue, loading time) and the multiple combinations of trucks that require different products, against the required volumes and flow rates from pumps into different loading bays. The number of calculations becomes exponential as you consider all the variables in this process and becomes a nearly impossible task for humans.

Today’s manual process is typically experience based with some amount of guesswork, which does not optimize terminal operations. With predictive analysis from A-Stack, these different variables can calculate optimized scheduling, to determine for each truck which particular loading bay it should use. It minimizes overall queuing for the terminal and maximizes loading efficiency. It improves supply chain logistics.


Five categories of intelligence can be concluded based on various modes of operations and different areas of applications in oil and gas industry.

1 Future intelligence: the ability to forecast future events with good confidence & high accuracy based on the learning from current events.

2 Historical intelligence: the ability to understand what happened.

3 Contextual intelligence: the ability to correlate multiple factors in a context and make sense of what is happening.

4 Domain intelligence: the ability to deepen domain knowledge/science

5 Logical intelligence: compute numerous logical conditions simultaneously and find the solution.

Until next time,

Emad Gebesy, Founder of Optimize Global Solutions, Subject Matter Expert

Machine Learning and Deep Learning for Fire Detection

By definition, a fire is a process in which substances combine chemically with oxygen from the air and typically give out bright light, heat, and smoke; combustion or burning. In the oil and gas industry, fire hazards are a big issue which is faced on a regular basis. In order for companies to understand how to reduce these fires as well as damage and injuries, it is important to know the difference between the different types of fires that can occur. This article is about understanding the types of fires, more specifically comparing the differences between a jet fire, pool fire, and flash fire. As well, we will look into how these fires can be prevented with the use of machine learning. In order to understand the differences, we first need to see what causes these fires, and what variables are the same throughout all three, and which create a change between them.

Jet fires are caused by high pressure releases of hydrocarbon, causing flames to shoot out in one direction, similarly as a flamethrower. In order for this fire to occur, there is also a need for oxygen which allows the fire to breathe. However, in order for the fire to start, there needs to be a source of ignition. There are many potential sources of ignition, such as a spark, heat from hot surfaces, or even the reflection from mobile or tablet devices. Another source could also be cigarettes, which is why they are forbidden even in gas stations. The image above represents a case of a jet fire, which we can identify for certain by the way the fire is shooting in one direction.

Pool fires on the other hand are the result of liquid hydrocarbon which also interacts with oxygen and a source of ignition. Since in this case the hydrocarbon is liquid, the fire is spread out along the liquid, rather than shooting out from a highly pressurized point, which can be seen in the image above, as they are not high in pressure and a lot more spread out. This type of fire can be caused when liquid hydrocarbon interacts with air, as well as any one source of ignition as mentioned previously. This in turn starts a flame that then spreads out throughout the liquid, as shown in the image.

Finally, a flash fire is caused when gas hydrocarbon slowly seeps out, until it catches on fire from a source of ignition, weather it is from a spark or high temperatures, it goes through the air until it consumes all the gas in the air. The image above represents a scenario like this, where the fire is fully spread out in the air, consuming the gasses all around.

Having an understanding of how the different fires happen allows them to optimize work, to fix the roots of the problems before they are able to happen. For jet fires, it is important that highly pressurized gases aren’t able to interact with oxygen as well as having a source of ignition. To prevent pool fires pipe leaks of liquid hydrocarbon must be sealed off in order for them not to catch on fire. As for flash fires, hydrocarbon gas shouldn’t be able to seep out from pipes as it can in an instant catch on fire, causing safety risks for workers. These steps are simple in theory, but in practice a lot harder considering how big the working facilities are, and how many pipes need to be maintained.

Luckily, as technology progresses rapidly, we have more equipment at our disposal, such as sensors and IoT devices. However, it is still hard to process the data which is given from them in order to prevent potential fires, which is why deep learning is used in order to create a predictive model that helps companies predict disasters ahead of time, allowing them to be proactive rather than reactive. This is done by converting information based on past results as well as real-time results, creating simple insights that allow users to detect defects which would have lead to a malfunction in the system.


The challenge with this lies in the fact that the oil and gas industry is structured towards avoiding failure, meaning that the needed examples of failure patters can be hard to find, but not impossible. Still, what deep learning brings is an improved approach if done correctly that can prevent many catastrophic events both for the environment and for workers, and over time these methods will further improve, removing a lot more risks from the industry that is heavily connected to our everyday lives.

Power of Machine Learning in Dynamic QRA using PyRISK™

Power of Machine Learning in Dynamic QRA using PyRISK™

We are celebrating our ongoing successful business case with PetroGulf Misr of performing Dynamic QRA for their offshore topside facilities to understand the baseline risk and additional risk with intention to make recommendations if required to reduce the risk to its tolerable limits.

One of the requirements under this project is to perform a Quantitative Risk Assessment to identify the risk. It has been highlighted to PetroGulf Misr team that the limitations of static QRA usually provide an overview of the risk at fixed reference point in time. Such studies involve hundreds, sometimes thousands of scenarios covering operating conditions, manning level and maintenance activities. The cost can be substantial, and a study may need repeating every few years.

Alternatively Optimize Global Solutions and Kageera have offered the utilization of the Dynamic QRA for this challenging project to empower the the decision making for the HSE managers and executives. Optimize Global Solutions’ product in cooperation with Kageera for PyRISK™ unlocks the opportunities of machine learning in the risk management. It supports customers already using PyRISK™ to perform Dynamic QRA to get even more value from the data at no additional cost.

Data Preparation

Given the critical importance of having the right & clean data, Optimize Global Solutions and Kageera prudently apply a holistic approach in executing their oil and gas project differently and remarkably from other competitors where the inception and culmination of the processes are within their expertise and base knowledge. The following processes are applied chronologically to ensure the compliance of the data for further machine learning activities.

1-Problem Definition and Framing out.

2-Functional Block Assessment.

3-Frequency Calculations.

4-Consequence Modeling for governing Cases.

5-Sensitivity Analysis using VB-based PyRISK™

Problem Frame-out

Image for post

Geisum North Field platform comprises four (4) decks, namely

1-Upper Deck.

2-Main Deck.

3-Machinery Deck.

4-Lower Deck

Image for post

Red blocks denotes the hazardous operation where the facilities handle hydrocarbon

Functional Block Assessment

The objective of the function blocks is to identify the global scenarios where the static and dynamic inventories for each key unit operation. Each functional block should include the operating temperature and pressure plus the fluid compositions for further consequence modeling using Phast DNV GL

Image for post

From P&IDs, pipe size, approximate length, number of flanges, manual valves, instrument connections, etc. are counted to eventually calculate the frequency of failure.

Frequency Calculations

The frequency of leak event is quite important to deeply understand the likelihood of each scenario and the frequency of occurrence of each hazard. Following the latest codes and standards in oil and gas industry, and utilizing of experience in the programming, the VBA-based application of CCE™is empowered to calculate an accurate results of frequencies for various leak sizes scenarios e.g. small leak, medium leak and large leak.

Image for post

Consequence Modeling

Quantifying the hazard intensity threshold / Severity of each potential hazard is essential for our machine learning solutions.

Image for post

Data Generation & Dynamic QRA

This step is a combination of science and art where the produced data should ensure proper distribution with intention to avoid over-fitting and under-fittings.

VBA-based application of PyRISK™ (in-house tool programmed by Optimize and kageera) empowered with thremo-dynamic data from Phast and has the ability of running various sensitivity cases . This cutting-edge tool was quite imperative to massively decrease the analysis time of nearly 65% compared to the conventional ways of working.

Image for post

Front-end Machine Learning Application

Starting from Data Exploratory through out algorithms testing and culminating with the most accurate predictive model selection is one of the outstanding strength of Kageera which enables our smart solutions to take place in the industry with full recognition from our clients.

Image for post

(1) Quantitative risk assessment (QRA) benefits an operator more than simply complying with varying regulations.

(2) Dynamically updating QRA offers cost, production, and safety benefits

(3) Dynamic QRA supports decision making and planning to improve risk management

(4) Digitization assisted by online tools enables dynamic QRA

How Machine Learning Will Change Healthcare

First of all, AI( read Deep learning) won’t replace doctors any time soon, but AI should be the tool for doctors to even be better at their jobs and to have a better success rate in treating patience and overall patient wellbeing.

Healthcare is a data goldmine but still restricted due to different regulations in different countries (for example, HIPPA). McKinsey estimates that deep learning and machine learning in medicine could generate a value of up to $100B annually, based on better decisions, optimized innovation, improved efficiency of research trials, and new genius tool creation for doctors, patients, insurance companies, and policymakers.

What is the main problem in Healthcare, and how to make it better than the current status quo?

The Healthcare market size is USD439B; 78% Global population suffers from health or wellness issues. Market growth per year is 45%, and 76% of the world’s population travel for different treatments. Global Healthcare spends projected to reach USD8.7 trillion by 2020, which, due to corona, will be 20% more. With PyHEALTH (machine learning and deep learning implementation in Healthcare), we can reduce costs by 30% to 40% annually. We have an enormous amount of data collected daily, but a small percentage of up to 4% is used practically in the industry. The Healthcare industry is the same as it was back then with the Spanish flu in 1918. And we are now in 2020 with innovations flying to Mars and self-driving car, but we are dying from Influenza, the same back then in 1918. The only difference now is due to our better-equipment hospitals and better respirators; the virus pandemic will be shorter compare to that one in 1918. But still, we can see that in the last 100 years, we did not have innovation in Healthcare that can help us prevent or minimize different diseases. Worldwide pandemics are a severe threat. COVID-19 is just the beginning of the pandemic we will have in the future.

At Kagera, we research how machine learning and deep learning is impacting the healthcare industry as part of our PyHEALTH service.

How can machine learning solutions help us?

  • Better understanding who is most at risk,
  • Diagnose patients,
  • Develop drugs faster,
  • Finding old drugs that can help
  • Predict the spread of the disease,
  • Understand viruses better,
  • Map where viruses come from
  • Predict the next pandemic.

Machine learning is the best tool currently in the world to predict different types of risks. One example is a prediction of potential hazards in the oil and gas industry or even the nuclear energy industry.

We need to invest more in Healthcare, pharma, and biomedicine innovation with machine learning and deep learning tools on the go.

Early statistics show that essential risk factors that determine how likely an individual has some disease include:

  • Age,
  • Pre-existing conditions,
  • General hygiene habits,
  • Social habits,
  • Mental state
  • General stress scores
  • General diet and wellness
  • Number of human interactions,
  • Frequency of interactions,
  • Location and climate,
  • Socio-economic status.
Image for post

Essential data may vary depending on the potential disease. So every disease has particular data points to track.

To understand diseases and to get practical outcomes, it takes years. Even then, diagnostic is a time-consuming process. This puts pressure on doctors, as we all know that we don’t have a lot of doctors in any country worldwide.

Machine Learning and Deep Learning algorithms can help disease diagnostics cheaper and more accessible. Machine learning can learn from patterns as a doctor also do. The only difference is that machine learning algorithms don’t need to rest, and they have the same accuracy at any time of the day. The key difference here between machine learning and doctor is that experts can instantly see what the problem is and find a potential cure, but algorithms need a lot of data in order to learn. That is the key restriction because a lot of hospitals don’t share their data or even don’t collect them. The other issues are that data needs to be machine-readable.

Machine learning and deep learning can be used for detecting and minimizing different disease, such as:

  • Lung cancer or breast cancer on CT scans
  • Risk of sudden cardiac death based on electrocardiograms and cardiac MRI
  • Risk of different dental disease based on CT scans

What is the most important value that machine learning is bringing to healthcare?

Copyright United Nations Goals
Copyright United Nations Goals

Every person can have access to the same healthcare quality of top experts and for a low price. Machine learning can ensure healthy lives and well-being for all. Which is one of the main goals for the United Nations?

Personalize patient treatments

Every person is different and has less or more risk of getting different diseases. Also, we react differently to different drugs and treatments. Personalize patient treatment have enormous potential with the use of machine learning and deep learning.

Machine Learning can automate this complicated statistical work — and help discover which characteristics indicate that a patient will have a particular response to a particular treatment. So the algorithm can predict a patient’s probable response to a particular treatment.

The system learns this by cross-referencing similar patients and comparing their treatments and outcomes. The resulting outcome predictions make it much easier for doctors to design the right treatment plan. So, machine learning is a tool that helps doctors do their job even better.

What can we do with machine learning now?

Warning notifications of the potential risk of new diseases- Warning notifications can help doctors predict potential disease and prepare in time for future diseases.

We need to work more in order to develop prediction models for direct disease transmission, but knowing which data we need and working together with experts from the field is the first step to successful machine learning implementation. Key is problem discovery in the healthcare industry and then getting the data in order to resolve the problem with the use of machine learning.

Image for post


Machine Learning and deep learning are an important tool in fighting different diseases and COVID-19. We need to take this opportunity; especially, time is of the essence NOW. People’s lives are at stake. We, as a company, can use our knowledge to collect the data, pool our knowledge, make cross-functional teams with expert doctors, healthcare providers, companies working with healthcare providers in order to save many lives now and in the future.

Kagera mission and vision are to build machine learning solutions that help humans live longer and focus on things that matter the most: people, profit, planet.

If you need our urgent assistance in healthcare and COVID-19 projects send me a message at manja.bogicevic(at) or send me a message on LINKEDIN

For more follow me on LINKEDIN.

Until next time,

Happy Machine learning


P.S. I want to share 4 random stuff about me:

  1. I am one of the first self-made women Machine learning Entrepreneurs in the world.
  2. I am on the mission to become a self-made millionaire ForbesUnder30 (3 years to go).
  3. I have strong economics and business background, and in combination with my machine learning skills, it delivers invaluable guidance in making strategic business decisions.
  4. I am an ex-professional tennis player, and I have run four half-marathons.

Optimize Oil & Gas Production with Digital Twins (LENAᵀᴹ)

We believe data, algorithms, and software should power the industry and humans, to use their creativity to shape a profitable, safe, and sustainable present & future. Today, heavy-asset industries like oil and gas, renewable industry, and energy have reached a digitalization tipping point. Increasing access to data has made data handling a key changer, even in industries that have historically been considered far from high-tech.

No alt text provided for this image

We believe machine learning is not a magic wand though it is an entirely new technology that is just at the beginning of the usage. Only 4% of companies in the whole world are in the phase – early practice. But, humans will still be the ones to drive the change.

Companies need to consider investing in the technology of digital twins (LENAᵀᴹ) that will amplify the experience and skills of their own people and assets. Digital twins technology is there to inform users about their operations and suggest measures to avoid any downtime. Once again, humans bring their own expertise to the table, supplemented by a data-driven decision and the ability to examine the data more deeply before taking any action. A creative mind is something that can not be automated and humans are essential in the artificial intelligence reality that is coming.

Online Digital Twins

As the operational life continues, the digital copy is updated automatically, in real-time, with current data, work records, and engineering information to optimize maintenance and operational activities. Using this information, engineers, managers, and operators can easily search the asset tags to access critical up-to-date engineering and work information and find the health of a particular asset. Previously, such tasks would take considerable time and effort, and would often lead to issues being missed, leading to failures or production outages. With Online LENAᵀᴹ, operational and asset issues are flagged and addressed early on, and the workflow becomes preventative, instead of reactive.

No alt text provided for this image

The reliable, real-time process data from the digital twin can be fed into simulation and analytics to optimize overall production, process conditions, and even predict failures ahead of time. A digital twin, when combined with powerful analytics and machine learning, enables predictive maintenance and optimized processes. Analytics leverage advanced pattern recognition, statistical models, mathematical models and machine learning algorithms to model an asset’s operating profile and processes and predict future performance. Appropriate, timely actions are then recommended to reduce unplanned downtime and to optimize operating conditions. With the digital twin, process simulation can also be performed to optimize the operating models based on their physical properties and thermodynamic laws.

The following three steps approach enabled by Digital Twin to optimize oil and gas production -from gathering systems to gas processing plants – is fundamental to improving performance and boosting profitability:

1. Steady State

Engineering and Design (FEED) stage, steady-state simulation models of gas processing and others, can be created to optimize the design. During operations, engineers and operators can perform engineering studies via Offline to identify design changes that will significantly increase throughput and the reliability and safety of plant operation.

No alt text provided for this image

Data analytics can be used to model fluid flow behaviors in pipeline multiphase or single-phase flow to predict pipeline holdup and potential slugging in the network. Understanding flow performance is key to optimizing the gathering network design, reducing CAPEX, and optimizing pipeline. With a unified simulation platform, the evolution from a steady-state to dynamic simulation can be achieved effortlessly.

2. Dynamic Modeling

Dynamic modeling based on ordinary differential equations and partial differential equations can be performed on these models to validate process design such as relief and flare systems, changes in feedstock, production capacity adjustment, and controls, enabling engineers to optimize the design and reduce CAPEX and OPEX. In addition, the dynamic simulation allows effective troubleshooting, control system checkouts, and comprehensive evaluations of standard and emergency operating procedures to shorten time requirements for safe plant start-up and shutdown.

No alt text provided for this image

3. Predictive Analytics to Monitor Equipment Health

Minimizing plant downtime is therefore key to improve production. This is where predictive analytics comes in. Predictive analytics enables modeling of rotating equipment performance – such as pumps, compressors, and turbines – using advanced pattern recognition and machine learning algorithms to identify and diagnose any potential operating issues, days, or weeks before failures occur. Minimizing plant downtime is therefore key to improving production. Operating models including past loading, ambient, and operational conditions are used to create a unique asset signature for each type of equipment. Real-time operating data is then compared against these models to detect any subtle deviations from expected equipment behavior, allowing reliable and effective monitoring of different types of equipment with no programming required during setup. The early-warning notification allows reliability and maintenance teams to assess, identify and resolve problems, preventing major breakdowns that can cost companies millions of dollars in production slowdowns or stoppages.

No alt text provided for this image

Brief Summary overview of Digital Twins

Digital Twins works as a back-end rigorous model and running to report key data. Such as production throughput, product purity, but it also includes extra features and tools for optimization Oil & Gas production. Digital Twins can work either offline (case studies and What-If) or online (data acquisition and optimization) based on the requirements of business values.

The digital twins enables operational excellence by helping oil and gas engineers, managers operators and owners take a model-focused approach that quickly turns massive amounts of data into business value.

These powerful data insights mean

  1. Asset failure can be predicted.
  2. Hidden revenue opportunities can be uncovered and realized.
  3. Businesses can continuously improve in the ever-changing, competitive marketplace.

In a nutshell, Digital twins is the foundation of a digital transformation that optimize production, detects equipment problems before failure occurs, uncovers new opportunities for process improvement, all while reducing unplanned downtime. Depending on the facility itself you can combine data analytics, machine learning, Big Data, and software applications in order to utilize the power of the data and turn it into business value for every Oil and Gas company.

Until next time,

Manja Bogicevic