数据与分析

鹰福特产量预测的终极对决

本文提出了一项比较研究,评估了四种机器学习方法(包括三种深度学习方法),用于预测未来 5 年内的天然气和凝析油产量。

利用人工智能进行油田采油的概念,对获得的流量、MTBF 和机械杂质浓度数据进行分析
资料来源:伊戈尔·鲍里森科/盖蒂图片社

机器学习为鹰福特页岩等复杂非常规资产的产量预测提供了一种宝贵的数据驱动替代方案,取代了传统的产量预测方法。本文进行了一项比较研究,评估了四种机器学习方法(包括三种深度学习方法),用于预测五年期天然气和凝析油产量。该研究详细介绍了所使用的产量数据集,并描述了每种模型的方法、架构和调优:卷积神经网络 (CNN)、带和不带非负矩阵分解 (NMF) 的长短期记忆网络 (LSTM),以及极端梯度提升 (XGBoost)。

结果比较了每个模型的预测性能,分析了准确性、过度拟合、偏差、可解释性和训练效率。最终,本文确定了产量预测任务的最有效方法,并对未来机器学习在非常规油藏预测中的应用具有指导意义。

将三种深度学习方法与 XGBoost 技术进行基准测试,证实了深度学习在产量预测中的实用性,并为开发可靠且稳健的深度学习模型提供了见解,并直接解答了以下问题:所有深度学习方法是否都生而平等?深度学习能否超越增强学习?

动机:
先前的研究已有效证明了机器学习在非常规油藏各种产量预测任务中的潜力,包括估算最终采收率、预测油井产能、实时预测以及优化复杂作业。虽然这些单独的努力展示了特定模型针对特定问题的成功,但当前的研究体系中存在一个关键的差距。缺乏直接、全面的基准测试,无法将多种深度学习方法相互比较,也无法与具有一致数据集和预测目标的传统机器学习技术进行比较。

这种缺失使得我们难以准确评估不同方法的相对优缺点,并确定哪种方法最适合特定的应用。因此,本研究旨在通过提供详细的比较分析来填补这一空白。

通过将三种不同的深度学习方法与 XGBoost 等强大的传统机器学习技术进行对比,本研究旨在确认深度学习在生产预测中的实用性,并为开发可靠且强大的深度学习模型提供重要见解。

以下关键指标将用于模型比较并从本研究中得出有用的见解:

  • 以天然气和凝析油的平均和中位绝对百分比误差表示的预测准确度
  • 可训练参数的数量
  • 训练周期
  • 易受过度拟合的影响
  • 调整复杂性
  • 模型训练中的人为偏见
  • 易于使用
  • 100 个样本的推理时间
  • 模型稳健性

分析这些因素将有助于对不同的深度学习方法和 XGBoost 基准进行全面评估,最终有助于回答所有深度学习方法是否都适用于生产预测任务的问题。

数据描述:
本研究使用计算机建模小组 (Computer Modelling Group) 的 IMEX 模拟器生成的模拟数据集,创建了一个机器学习框架,用于快速预测非均质气/凝析页岩储层(类似于鹰福特页岩)的产量。该数据集旨在复制真实的储层非均质性、不确定性和完井变异性,包含 1,600 个训练集和 400 个测试集,分别按 80/20 的比例随机分配用于模型开发和泛化评估。

每个实现方案都包含九个不同的特征,包括地质和完井参数,以及两个目标:5年(61个月)内的月产气量和凝析油产量。九个特征中的三个特征捕捉了21个叠层中渗透率、孔隙度和含水饱和度的垂直空间变化,其余六个特征是标量属性,代表天然裂缝和诱导裂缝的特征。每个实现方案的两个目标均包含61个数值,用于说明5年期间的产量下降情况。

该储层模型是在计算机建模组的模拟器中开发的,使用 140×30×21 个单元(共 88,200 个)的 3D 网格,每个单元的尺寸为 50×50×5 英尺,总厚度为 105 英尺,横向尺寸为 7,000×1,500 英尺。孔隙度、渗透率和饱和度的空间非均匀性逐层分布,并通过间距和渗透率的全局参数纳入天然裂缝效应。

每个模型的中心(第11层)都布置了一口水平井,与平面水力裂缝相交。完井参数在不同实现方案中有所不同,裂缝半长(200至500英尺)、裂缝高度(20至60英尺)和簇间距(100至350英尺)均采用离散值。这些地质和完井变化共同构成了本研究中模拟的多种页岩气/凝析油储层情景的基础,这些情景可作为Eagle Ford页岩储层的类比。

特征与目标:
为了有效地实施和理解机器学习预测模型,清晰地定义输入特征和输出目标至关重要。本研究使用精心构建的数据集,旨在代表与鹰福特页岩类似的独特且真实的储层情景。

本研究中的特征经过精心采样,涵盖了广泛的参数空间,确保了可靠的储层模型和生产剖面。孔隙度和含水饱和度等地质特性服从正态分布,而渗透率则服从对数正态分布,反映了典型的页岩特征。如图1所示,数据集每个场景包含九个不同的特征:其中三个特征表示21个层段渗透率、孔隙度和含水饱和度的垂直空间变异性,六个特征是天然和水力诱发裂缝特征的标量特性。

完井参数(例如裂缝半长、高度、簇间距)是离散的,反映了设计约束。同样,天然裂缝间距和渗透率也是离散分布的。生产约束遵循均匀分布,以反映不同的操作条件。这些分布有助于深入了解数据集的地质和完井特征。

预测图1.jpg
图 1——机器学习模型中使用的地质和完井参数的直方图分布。顶行显示连续的储层特性,孔隙度近似正态分布,渗透率右偏,饱和度呈双峰分布。底部两行主要包含离散的完井参数和均匀采样的变量。

本研究的目标模型旨在反映类似鹰福特页岩的非常规页岩在不同地质和完井条件下的实际生产响应。每个生产剖面都是独一无二的,受地质特性(例如孔隙度、渗透率、饱和度)、完井参数(例如裂缝间距、高度和半长)、天然裂缝效应以及生产限制等操作限制因素的影响。对于每个实现方案,目标包括5年(61个月)内的月产气量和月凝析油产量。每个实现方案的这两个目标均包含61个数值,代表5年期间的产量下降情况。

如图2所示,这些产量剖面表现出显著的变异性,包括初始产量急剧下降,随后逐渐减小,或短暂的稳定期后缓慢下降,尤其是在天然气产量方面。凝析油产量通常呈现较为平缓的下降趋势,但也存在显著的变异性。这些多样化的生产模式是底层地质和完井参数变化的直接结果,凸显了页岩气/凝析油储层的高度非均质性和复杂性,这进一步凸显了对稳健预测模型的需求。

预测图2.jpg
图 2——100个随机采样的气藏,5 年内的气产量(上)和凝析油产量(下)。每条曲线代表一个气藏。由于地质特性和完井设计的差异,产量差异很大。气产量呈现出缓慢下降和急剧下降的混合趋势,而凝析油产量往往呈现初期下降幅度较大,随后下降速度较慢的趋势。

四种预测模型:简要概述
完整的输入特征集是通过将 63 个地质参数(21 个层的孔隙度、渗透率和饱和度)与六个完井参数合并而成的,因此,气/凝析页岩地层中每个水力压裂井共有 69 个特征。

模型 1:卷积神经网络 (CNN)。CNN工作流程始于预处理:使用 MinMaxScaler ([0, 1]) 对特征进行缩放,并使用 StandardScaler(均值 0,方差单位)对目标进行标准化,以实现高效训练。针对天然气和凝析油产量分别开发了 CNN 模型,均处理相同的 69 个输入特征(63 个储层特征,6 个完井特征)。天然气模型采用更深的架构,包含三个一维卷积层,然后进行批量归一化、最大池化、一个带有 dropout 的 256 神经元密集层和一个 61 个神经元的输出层。凝析油模型更简单,包含两个一维卷积层和类似的后续层。两者均在隐藏层中使用 ReLU 激活函数,并采用线性输出。训练使用 Adam 优化器,学习率固定为 0.001。采用 L2 正则化(lambda = 0.005)和 0.3 的 dropout 率来控制过拟合。使用批量大小为 64,并使用 EarlyStopping(耐心 25)对模型进行最多 100 个 epoch 的训练以恢复最佳权重,并使用 ReduceLROnPlateau(因子 0.5、耐心 15、min_lr 1e-5)进行动态学习率调整。

模型2:基于非负矩阵分解(NMF)的长短期记忆网络(LSTM)用于降维。基于NMF的LSTM工作流程首先对空间地质变化进行预处理。渗透率服从对数正态分布,对其进行对数变换以实现正态化。然后,所有输入特征均使用MinMax缩放方法进行缩放。目标产量也采用MinMax缩放方法,但基于样本进行,以保留特定井的趋势;缩放参数被保存,用于对预测结果进行逆变换,以获得未缩放的目标产量。为了解决空间储层特性(跨21层的孔隙度、渗透率、饱和度)的高维性问题,应用了NMF方法,将数据分解为15个低维分量,同时保留非负性和物理意义,以将重构误差保持在7.5%以下。图3展示了该工作流程。

LSTM 网络用于时间序列回归,以捕捉生产数据中的时间依赖性。分别训练两层 LSTM 模型,用于预测天然气和凝析油产量,并使用 NMF 变换后的储层属性以及完井参数作为输入。该架构的选择基于收敛性和手动调优,在全连接输出层之前使用带有 dropout 正则化的堆叠 LSTM 层。训练使用 32 的批次大小和最多 80 个 epoch,并使用 EarlyStopping(耐心 30)来防止过度拟合并恢复最佳权重。天然气的最佳单元数为 256 和 128,凝析油的最佳单元数为 64 和 32,学习率经过调整(天然气为 0.0006,凝析油为 0.02),dropout 率为 0.1。该架构的选择基于收敛性行为和手动超参数调优。

预测图3.jpg
图3”基于带有NMF的LSTM模型的生产预测工作流程。

模型 3:未进行降维的 LSTM 网络。此 LSTM 工作流程遵循与先前方法类似的结构,但省略了降维操作,例如非负矩阵分解 (NMF)。预处理阶段首先使用 MinMaxScaler(范围为 0-1)进行归一化,同时使用 StandardScaler(均值 0,方差为单位)对目标生产率进行标准化,以有效处理方差。最后,根据 LSTM 的要求,将输入矩阵重塑为 3D 格式(样本、时间步长、特征),时间步长为 1。

构建了两个架构相同的独立 LSTM 模型,用于预测未来 5 年(61 个月)的天然气和凝析油产量。每个模型由三个堆叠的 LSTM 层组成,每个层包含 64 个隐藏单元。在每个 LSTM 层后应用 Dropout(0.1 速率)以缓解过拟合,并在前两个 LSTM 后使用 LayerNormalization 以确保训练稳定性。前两个 LSTM 层使用 ReLU 激活函数,最后一个 LSTM 层使用 Tanh 激活函数,之后使用线性 Dense 层进行最终产量预测。训练使用 Adam 优化器(学习率为 0.001)、批量大小为 64 以及 Huber 损失函数来提高对异常值的鲁棒性。正则化包括 EarlyStopping(耐心 20)和 ReduceLROnPlateau(耐心 15,因子 0.5),以确保泛化能力并防止过拟合。如图 4 所示,训练和验证损失曲线呈现平滑收敛,证实了所选配置的有效性。

预测图4.jpg
图 4——气体和凝析油模型的训练和验证损失曲线。这些曲线表明,所选的超参数实现了稳定的学习,并在 200 个 epoch 内观察到收敛。

模型 4:极端梯度提升 (XGBoost)。XGBoost工作流程使用两个 XGBoost 回归器分别预测天然气和凝析油的产量。预处理流程通过反复试验确定,分别处理连续地质数据和分类完井数据。连续数据使用 StandardScaler 进行标准化,然后使用截断奇异值分解 (TSVD) 将地质特征的维度降低至 10 个分量,TSVD 可捕获 95% 的解释方差,并因其适用于连续/稀疏数据且与 XGBoost 兼容而被选中。分类数据不经过预处理,以避免引入意外的关系。转换后的连续数据和原始分类列随后被连接成一个数据框,用于模型训练。XGBoost 模型使用 GridSearchCV 和三重交叉验证进行优化,并使用负平均绝对误差作为超参数调整的评分指标。生成预测结果后,将 Savitzky-Golay 滤波器应用于预测的气体和凝析油产量曲线。此后处理步骤可平滑短期波动,以更好地反映潜在的产量趋势,并通过避免过度惩罚噪声来提供更具代表性的误差指标。图 5展示了 XGBoost 产量预测的工作流程。

预测图5.jpg
图 5: XGBoost 工作流程图,给定地质参数和完井参数,使用 Standard Scaler 和 TSVD 进行预处理。在误差评估之前,对预测产量应用 Savitzky-Golay 滤波器,以平滑短期噪声。

四种预测模型的预测性能
图6至图9通过比较5口随机选择的油井5年期间的预测月产量(虚线)和实际月产量(实线),共同展示了四种预测工作流程(模型1至模型4)的预测性能。这些图一致表明,模型预测值与观测数据高度一致,有效地捕捉了包括高产和低产生产井在内的多种油井类型的产量趋势、产量幅度和产量递减行为。虽然在某些情况下会观察到细微的差异,尤其是对于由于限制因素而产量处于平台期的气井,但预测产量和实际产量之间的整体视觉相关性证实了该模型在大多数测试场景中的准确性和稳健的预测能力。

预测图6.jpg
图 6——基于 CNN 的预测的实际与预测月生产率。
预测图7.jpg
图 7——基于 LSTM+NMF 预测的实际与预测月生产率。
预测图8.jpg
图 8——基于 LSTM 的预测的实际与预测月生产率(不含 NMF)。
预测_图9.jpg
图 9——基于 XGBoost 预测的实际与预测月生产率。

预测模型的比较评估:
在本研究中,我们比较了三种深度学习工作流程——NN、通过 NMF 进行特征约简和不进行特征约简的 LSTM 网络,以及一种基于 boosting 的传统机器学习模型 XGBoost——用于预测页岩油藏水平井未来 5 年的月天然气和凝析油产量。这四种预测工作流程均由不同的团队使用相同的数据集独立开发和调整,从而确保评估基础的一致性。

为了进行全面的比较,每个预测模型的评估不仅基于同一测试数据集的预测精度,还基于模型复杂度、训练时间、观察到的过度拟合和用户偏差。表1总结了所需比较的这些标准/方面。基于明确的标准比较四种数据驱动的产量预测方法,将对每种技术进行全面而细致的评估。这种系统的比较将清晰地洞察准确性、复杂性、训练效率、易用性和部署准备度等关键因素之间的权衡。因此,读者将能够根据页岩储层的特定碳氢化合物产量预测需求,做出最合适的方法,从而显著提升本研究的实用价值和整体效果。

根据以下标准对这四种技术进行比较:

  • 平均和中位绝对百分比误差 (MAPE)(以百分比表示)——表示每口井计算出的 MAPE 的平均值/中位数。例如,在评估四个模型的预测性能时,会使用前面提到的直方图所示的每口井计算出的 MAPE 的中位数。
  • 可训练参数——表示每个模型的复杂度。比较可训练参数的数量有助于了解模型学习复杂模式的能力,以及其对过度拟合和计算需求的敏感性。
  • “训练周期”反映了每个模型在训练过程中收敛所需的迭代次数。比较训练周期可以深入了解每种方法的学习效率。
  • 过拟合观测——提供定性评估,这对于确定每个模型对未知数据的泛化效果至关重要。一个存在严重过拟合的模型,即使训练性能良好,在实际应用中也可能表现不佳。
  • 调优复杂度——评估优化每个模型超参数所需的工作量和专业知识。较低的调优复杂度使模型更易于上手,也更易于在实践中实现。
  • 人为偏见/设置——一种定性指标,用于评估人为决策(例如,特征工程、模型架构选择)对每种方法性能的影响程度。较低的人为偏见通常能带来更客观、更可重复的结果。
  • “易用性”反映每种方法的用户友好性,包括软件可用性、文档和实施过程中涉及的学习曲线等方面。
  • 推理时间(100 个样本):对于实时或近实时预测应用至关重要。比较推理时间有助于确定哪些模型的计算效率足够高,可以进行实际部署。
  • 稳健性——定性评估每个模型在不同条件下或使用略有不同的数据集时的性能稳定性和可靠性。稳健模型应该保持一致的预测能力。
预测表1.jpg
表 1:四种产量预测模型性能和实现特征的比较总结。采用 NMF 的 LSTM 取得了最佳准确率,但需要降维和自定义预处理。CNN 和标准 LSTM 模型也表现良好,但需要更多手动调整和输入格式。XGBoost 使用起来最简单,准确率颇具竞争力,且过拟合现象极小。

基准测试结果的关键洞察:
本研究比较了三种深度学习工作流程——NN、通过 NMF 进行特征约简和不进行特征约简的 LSTM 网络,以及一种基于 boosting 的传统机器学习方法 XGBoost——用于预测页岩油藏水平井未来 5 年的月天然气和凝析油产量。这四种预测工作流程均由不同的团队使用相同的数据集独立开发和调整,从而确保了评估的一致性。每种技术都使用不同的性能和实施标准进行评估,从而全面了解它们各自的优势和局限性。

从定量分析来看,结合 NMF 的 LSTM 表现出最高的准确率,平均 MAPE 分别为 3.04%(气体)和 3.96%(凝析油),中值 MAPE 分别为 2.22%(气体)和 2.90%(凝析油)。这与 CNN(平均 MAPE:6.26%/7.72%;中值 MAPE:5.32%/6.44%)、不带 NMF 的 LSTM(平均 MAPE:5.09%/5.24%;中值 MAPE:3.51%/3.35%)以及 XGBoost(平均 MAPE:5.44%/5.24%;中值 MAPE:4.05%/3.94%)形成了鲜明对比。带有 NMF 的 LSTM 也表现出较高的鲁棒性,并且没有观察到过拟合,尽管可训练参数数量适中(天然气为 489,661 个,凝析油为 36,445 个),并且需要 80 个训练周期。如表 1 所示。

相比之下,XGBoost 虽然 MAPE 值略高于采用 NMF 的 LSTM,但表现出了出色的可用性和极快的推理时间(100 个样本总共约 0.2 秒,或每个样本约 2 毫秒)。此外,它没有出现过拟合,并且调优复杂度适中(通过使用 GridSearchCV)。

CNN 和标准 LSTM 模型在准确度方面表现相当不错,但需要更多的手动调整,并且更容易过度拟合(没有 NMF 的 LSTM 会出现轻微的过度拟合)。

与 CNN(665,661/154,045 个参数,0.3872s/0.1537s 推理时间)相比,标准 LSTM 模型具有更少的可训练参数(气体和凝析油均为 104,573 个)和更快的推理时间(100 个样本为 0.1695s/0.1589s),尽管需要比 CNN(100)或带有 NMF 的 LSTM(80)更多的训练时期(200),但在某些情况下它可能更可取。

对于不带 NMF 的 LSTM(完全手动),人为偏见/设置最高,且易于使用(LSTM 序列格式化),而 XGBoost 的人为偏见/设置(基于试验的管道)被评为中等,且易于使用(即插即用)。

基于基准测试结果的建议
基准测试结果为能源行业的实际应用提供了至关重要的见解。虽然深度学习(尤其是结合 NMF 方法的 LSTM)能够为复杂的产量预测任务提供卓越的准确性,但其实现通常需要大量的数据预处理、特征工程专业知识和计算资源。这使得深度学习在最大化预测精度至关重要且具备必要技术基础设施和专业知识的场景中极具价值。

相反,像 XGBoost 这样的基于 boosting 的方法,虽然与最佳深度学习模型相比可能会略微牺牲准确率,但却在竞争性能、卓越的易用性、快速的推理时间和较低的调优复杂度之间实现了令人信服的平衡。这使得 XGBoost 成为一种高度实用且易于部署的解决方案,适用于运营环境、场景快速筛选,或以可解释性和易实现性为关键优先考虑的场景。

在实践中,这些方法之间的选择并非简单的一刀切,而是基于具体需求、可用资源和应用优先级的战略权衡。这一比较框架使实践者能够根据其预测目标和技术约束做出明智的选择,而不仅仅是孤立地选择最准确的模型。未来的努力还应探索自动化预处理流程和超参数调整,以提高高级深度学习方法在不同生产数据集上的可访问性和可重复性。

根据这些调查结果,提出以下建议:

  • 为了实现最高的产量预测精度,尤其是在计算资源和专业知识充足的情况下,建议使用 NMF 的 LSTM。其性能证明预处理和设置方面增加的复杂性是合理的。
  • 为了快速部署和易于使用,尤其是在机器学习基础设施有限的运营环境中,XGBoost 在性能和简易性之间实现了最佳平衡。它保持了稳健性、可扩展性和可解释性,并且只需极少的调整。
  • 对于需要架构灵活性的应用,CNN 和标准 LSTM 是可行的,尽管它们可能需要更多地关注模型设计、调整和数据格式化。
  • 未来的工作应该考虑自动化预处理流程和超参数调优,特别是对于需要降维或序列格式化的模型。这将提高跨不同生产数据集的可访问性和可重复性。

这项工作得到了德克萨斯 A&M 数据科学研究所石油行业数据分析证书计划的支持,并得到了康菲石油公司的慷慨资助。

原文链接/JPT
Data & Analytics

The Ultimate Showdown in Eagle Ford Production Forecasting

This article presents a comparative study evaluating four machine-learning approaches, including three deep-learning methods, for forecasting gas and condensate production over a 5-year horizon.

the concept of oil production in the field using artificial intelligence, analysis of the obtained data on the flow rate, MTBF and concentration of mechanical impurities
Source: Igor Borisenko/Getty Images

Machine learning provides a valuable data-driven alternative to traditional methods for production forecasting in complex unconventional assets such as the Eagle Ford Shale. This article presents a comparative study evaluating four machine-learning approaches, including three deep-learning methods, for forecasting gas and condensate production over a 5-year horizon. The study details the production data set used and describes the methodology, architecture, and tuning for each model: convolutional neural network (CNN), long short-term memory (LSTM) with and without non-negative matrix factorization (NMF), and extreme gradient boosting (XGBoost).

Results compare the forecast performance, analyzing accuracy, overfitting, bias, interpretability, and training efficiency of each model. Ultimately, the article identifies the most effective approach for the production forecasting task with implications for future machine-learning applications in unconventional reservoir forecasting.

Benchmarking three deep-learning approaches against the XGBoost technique confirms the utility of deep learning for production forecasting and provides insight into developing reliable and robust deep-learning models, directly addressing the questions: Are all deep-learning methods created equal? Can deep learning outperform boosted learning?

Motivation
Previous studies have effectively demonstrated the potential of machine learning for various production forecasting tasks in unconventional reservoirs, including estimating ultimate recovery, predicting well productivity, forecasting in real time, and optimizing complex operations. While these individual efforts showcase the success of specific models for particular problems, a critical gap exists in the current body of work. There is a lack of direct, comprehensive benchmarking that compares multiple deep-learning methods against one another and against traditional machine-learning techniques with a consistent data set and forecasting objective.

This absence makes it challenging to definitively assess the relative strengths and weaknesses of different approaches and determine which are most suitable for specific applications. Therefore, this study is motivated by the need to fill this gap by providing a detailed comparative analysis.

By benchmarking three distinct deep-learning approaches against a robust traditional machine-learning technique such as XGBoost, this work aims to confirm the utility of deep learning for production forecasting and provide crucial insight into developing reliable and robust deep-learning models.

The following key metrics will be used for the model comparison and to derive useful insights from this study:

  • Forecasting accuracy in terms of mean and median absolute percentage error in percent for both gas and condensate
  • Number of trainable parameters
  • Training epochs
  • Susceptibility to overfitting
  • Tuning complexity
  • Human bias in model training
  • Ease of use
  • Inference time for 100 samples
  • Model robustness

Analyzing these factors will allow for a comprehensive evaluation of the different deep-learning approaches and the XGBoost benchmark, ultimately helping to answer the question of whether all deep-learning methods are created equal for the production forecasting task.

Data Description
This study uses a simulated data set, generated using Computer Modelling Group’s IMEX simulator, to create a machine-learning framework for rapid production forecasting in heterogeneous gas/condensate shale reservoirs, analogous to the Eagle Ford Shale. Designed to replicate realistic reservoir heterogeneity, uncertainty, and completion variability, the data set consists of 1,600 training and 400 testing realizations, representing an 80/20 random split for model development and generalization evaluation, respectively.

Each realization serves as a scenario with nine distinct features, including geological and completion parameters, and two targets: monthly gas and condensate rates over 5 years (61 months). Three of the nine features capture vertical spatial variability in permeability, porosity, and water saturation across 21 stacked layers, while the remaining six are scalar properties representing natural and induced fracture characteristics. Each of the two targets for a given realization consists of 61 values illustrating the production rate decline over the 5-year duration.

The reservoir model is developed in Computer Modelling Group’s simulator using a 3D grid of 140×30×21 cells (88,200 total), each measuring 50×50×5 ft, resulting in a total thickness of 105 ft and lateral dimensions of 7,000×1,500 ft. Spatial heterogeneity in porosity, permeability, and saturation is distributed layer by layer, and natural fracture effects are included via global parameters for spacing and permeability.

A horizontal well is placed at the center (Layer 11) of each model, intersected by planar hydraulic fractures, with completion parameters varying across realizations using discrete values for fracture half-length (200–500 ft), fracture height (20–60 ft), and cluster spacing (100–350 ft). Together, these geological and completion variations form the basis for the diverse gas/condensate shale reservoir scenarios modeled in this study that serve as an analog for the Eagle Ford shale reservoir.

Features and Targets
To effectively implement and understand the machine-learning forecasting model, a clear definition of the input features and output targets is essential. This study uses a carefully constructed data set designed to represent distinct and realistic reservoir scenarios analogous to the Eagle Ford Shale.

The features in this study were meticulously sampled to cover a broad parameter space, ensuring reliable reservoir models and production profiles. Geological properties such as porosity and water saturation follow normal distributions, while permeability is log-normal, reflecting typical shale characteristics. As shown in Fig. 1, the data set includes nine distinct features per scenario: three represent vertical spatial variability of permeability, porosity, and water saturation across 21 layers, and six are scalar properties for natural and hydraulically induced fracture characteristics.

Completion parameters (i.e., fracture half-length, height, cluster spacing) are discrete, reflecting design constraints. Similarly, natural fracture spacing and permeability are discretely distributed. The production constraint follows a uniform distribution to incorporate varied operating conditions. These distributions provide insight into the data set’s geological and completion characteristics.

Forecast_Fig1.jpg
Fig. 1—Histogram distributions of geological and completion parameters used in the machine-learning model. The top row shows continuous reservoir properties, with porosity approximately normal, permeability right skewed, and saturation bimodal. The bottom two rows include mostly discrete completion parameters and uniformly sampled variables.

The targets of this study are modeled to reflect realistic production responses of an unconventional shale analogous to the Eagle Ford Shale under different geological and completion conditions. Each production profile is unique, driven by the variations in geological properties (i.e., porosity, permeability, saturation), completion parameters (i.e., fracture spacing, height, and half-length), natural fracture effects, and operational constraints such as production limits. For each realization, the targets consist of monthly gas rate and condensate rate over a 5-year period (61 months). Each of these two targets for a realization contains 61 numerical values representing the production rate decline over the 5-year duration.

As shown in Fig. 2, these production profiles exhibit significant variability, including sharp initial declines followed by gradual tapering, or short plateaus before slower declines, particularly in gas production. Condensate production rates generally show more gradual declines but also substantial variation. These diverse production patterns are a direct consequence of the variations in the underlying geological and completion parameters and underscore the high heterogeneity and complexity of gas/condensate shale reservoirs, reinforcing the need for robust forecasting models.

Forecast_Fig2.jpg
Fig. 2—Gas rates (top) and condensate rates (bottom) over a 5-year period for 100 randomly sampled realizations. Each curve represents one realization. The production rates vary significantly because of differences in geological properties and completion designs. Gas rates show a mix of gradual and steep declines, while condensate rates tend to exhibit sharper initial drops followed by slower declines.

The Four Forecasting Models: A Brief Overview
The full input feature set is constructed by merging the 63 geological parameters (porosity, permeability, and saturation across 21 layers) with six completion parameters, resulting in a total of 69 features per hydraulically fractured well in the gas/condensate shale formation.

Model 1: Convolutional Neural Network (CNN). The CNN workflow begins with preprocessing: Features are scaled using MinMaxScaler ([0, 1]), and targets are standardized using StandardScaler (mean 0, unit variance) for efficient training. Separate CNN models are developed for gas and condensate rates, both processing the same 69 input features (63 reservoir, six completion). The gas model uses a deeper architecture with three 1D convolutional layers, followed by batch normalization, max-pooling, a 256-neuron dense layer with dropout, and a 61-unit output layer. The condensate model is simpler, with two 1D convolutional layers and similar subsequent layers. Both use ReLU activations in hidden layers and linear output. Training uses the Adam optimizer with a fixed 0.001 learning rate. L2 regularization (lambda = 0.005) and a 0.3 dropout rate are applied to control overfitting. A batch size of 64 is used, and models are trained for up to 100 epochs with EarlyStopping (patience 25) to restore best weights and ReduceLROnPlateau (factor 0.5, patience 15, min_lr 1e−5) for dynamic learning rate adjustment.

Model 2: Long Short-Term Memory (LSTM) Network With Non-Negative Matrix Factorization (NMF) for Dimensionality Reduction. The LSTM with NMF workflow begins with preprocessing the spatial geological variations. Permeability, which follows a log-normal distribution, is log-transformed for normalization. All input features are then scaled using MinMax scaling. Target production rates are also MinMax-scaled but on a samplewise basis to preserve well-specific trends; the scaling parameters are saved for inverse transformation of predictions to obtain the unscaled targets. To address the high dimensionality of spatial reservoir properties (porosity, permeability, saturation across 21 layers), NMF is applied, decomposing the data into 15 lower-dimensional components while preserving non-negativity and physical meaning to maintain a reconstruction error below 7.5%. Fig. 3 shows the workflow.

LSTM networks are used for time-series regression to capture temporal dependencies in the production data. Separate two-layer LSTM models are trained for gas and condensate rates, using the NMF-transformed reservoir properties combined with completion parameters as input. The architecture was selected based on convergence and manual tuning, using stacked LSTM layers with dropout regularization before a fully connected output layer. Training uses a batch size of 32 and a maximum of 80 epochs, with EarlyStopping (patience 30) to prevent overfitting and restore best weights. Optimal units were 256 and 128 for gas, and 64 and 32 for condensate, with tuned learning rates (0.0006 for gas, 0.02 for condensate) and a dropout rate of 0.1. Architecture was selected based on convergence behavior and manual hyperparameter tuning.

Forecast_Fig3.jpg
Fig. 3—Production forecasting workflow based on LSTM model with NMF.

Model 3: LSTM Network Without Dimensionality Reduction. This LSTM workflow follows a structure similar to the previous approach but omits dimensionality reduction, such as NMF. The preprocessing phase begins with normalization using MinMaxScaler (0–1 range), while the target production rates are standardized using StandardScaler (mean 0, unit variance) to handle variance effectively. Finally, the input matrix is reshaped to a 3D format (samples, time steps, features) with a timestep of one, as required by LSTM.

Two separate LSTM models, identical in architecture, are constructed to forecast gas and condensate production rates over 5 years (61 monthly outputs). Each model consists of three stacked LSTM layers, each with 64 hidden units. Dropout (0.1 rate) is applied after each LSTM layer to mitigate overfitting, and LayerNormalization is used after the first two LSTMs for training stability. ReLU activation is used in the first two LSTM layers and Tanh in the final LSTM, followed by a linear Dense layer for the final rate predictions. Training uses the Adam optimizer (0.001 learning rate), a batch size of 64, and the Huber loss function for robustness to outliers. Regularization includes EarlyStopping (patience 20) and ReduceLROnPlateau (patience 15, factor 0.5) to ensure generalization and prevent overfitting. As shown in Fig. 4, the training and validation loss curves exhibit smooth convergence, confirming the effectiveness of the selected configuration.

Forecast_Fig4.jpg
Fig. 4—Training and validation loss curves for both gas and condensate models. These curves demonstrate that the selected hyperparameters resulted in stable learning, with convergence observed within 200 epochs.

Model 4: Extreme Gradient Boosting (XGBoost). The XGBoost workflow uses two XGBoost regressors to separately forecast gas and condensate production rates. The preprocessing pipeline, determined through trial and error, handles continuous geological data and categorical completion data separately. Continuous data is standardized using StandardScaler, then the geological features are reduced in dimensionality to 10 components using truncated singular value decomposition (TSVD), which captures 95% of the explained variance and is chosen for its suitability for continuous/sparse data and compatibility with XGBoost. Categorical data undergoes no preprocessing to avoid introducing unintended relationships. The transformed continuous data and original categorical columns are then concatenated into a single data frame for model training. The XGBoost model is optimized using GridSearchCV with threefold cross-validation, using negative mean absolute error as the scoring metric for hyperparameter tuning. After generating predictions, a Savitzky-Golay filter is applied to the predicted gas and condensate rate curves. This post-processing step smooths out short-term fluctuations to better reflect the underlying production trend and provide more representative error metrics by avoiding over-penalizing for noise. Fig. 5 shows the XGBoost workflow for the production forecasting.

Forecast_Fig5.jpg
Fig. 5—XGBoost workflow chart given both geologic and completion parameters, using Standard Scaler and TSVD for preprocessing. A Savitzky-Golay filter is applied to the predicted rates before error evaluation to smooth short-term noise.

Forecasting Performances of the Four Forecasting Models
Figs. 6 through 9 collectively illustrate the forecasting performances of the four predictive workflows, Models 1 to 4, by comparing predicted (in dashed) and actual (in solid) monthly production rates for five randomly selected wells over the 5-year horizon. These plots consistently demonstrate strong alignment between the model’s predictions and the observed data, effectively capturing production trends, rate magnitudes, and decline behavior across a range of well types, including both high- and low-rate producers. While minor discrepancies are observed in some cases, particularly for gas wells exhibiting production plateaus because of constraints, the overall visual correlation between predicted and actual rates confirms the model’s accuracy and robust predictive capability across the majority of test scenarios.

Forecast_Fig6.jpg
Fig. 6—Actual vs. forecast monthly production rates for CNN-based forecasting.
Forecast_Fig7.jpg
Fig. 7—Actual vs. forecast monthly production rates for LSTM+NMF-based forecasting.
Forecast_Fig8.jpg
Fig. 8—Actual vs. forecast monthly production rates for LSTM-based forecasting (without NMF).
Forecast_Fig9.jpg
Fig. 9—Actual vs. forecast monthly production rates for XGBoost-based forecasting.

Comparative Assessment of the Forecasting Models
In this study, we compare three deep-learning workflows —CNN, LSTM networks with and without feature reduction via NMF, and one traditional machine learning based on boosting, XGBoost—on the task to forecast 5 years of monthly gas and condensate production rates for horizontal wells drilled in shale reservoirs. Each of the four forecasting workflows was developed and tuned independently by separate teams using the same data set, allowing for a consistent basis of evaluation.

To provide a holistic comparison, each forecasting model is evaluated not only based on predictive accuracy on the same testing data set but also on model complexity, training time, observed overfitting, and user bias. A summary of these criteria/aspects for the desired comparison is provided in Table 1. Comparing the four data-driven production-forecasting methods across well-defined criteria will furnish a comprehensive and nuanced evaluation of each technique. This systematic comparison will offer clear insights into the trade-offs between crucial aspects such as accuracy, complexity, training efficiency, ease of use, and deployment readiness. Consequently, readers will be empowered to make informed decisions regarding the most suitable method for their specific hydrocarbon production forecasting needs in shale reservoirs, significantly enhancing the practical value and overall effect of this research.

The four techniques are compared based on the following criteria:

  • Mean and median absolute percentage error (MAPE) in percent—Represents the mean/median of the MAPEs calculated for each well. For example, the median of the MAPE calculated for each well, as shown in the histograms previously mentioned, is used when evaluating the forecasting performances of the four models.
  • Trainable parameters—Indicates the complexity of each model. Comparing the number of trainable parameters helps in understanding the model’s capacity to learn intricate patterns but also its susceptibility to overfitting and computational demands.
  • Training epochs—Reflects the number of iterations required for each model to converge during training. Comparing training epochs provides insights into the learning efficiency of each method.
  • Overfitting observed—Provides qualitative assessment crucial for determining how well each model generalizes to unseen data. A model with significant overfitting, even with good training performance, will likely perform poorly in real-world applications.
  • Tuning complexity—Assesses the effort and expertise required to optimize the hyperparameters of each model. A lower tuning complexity makes a model more accessible and easier to implement in practice.
  • Human bias/setup—A qualitative metric that acknowledges the degree to which human decisions (e.g., feature engineering, model architecture selection) might influence the performance of each method. Lower human bias generally leads to more objective and reproducible results.
  • Ease of use—Reflects the user-friendliness of each method, including aspects such as software availability, documentation, and the learning curve involved in implementation.
  • Inference time (for 100 samples): Is critical for real-time or near real-time forecasting applications. Comparing inference times helps determine which models are computationally efficient enough for practical deployment.
  • Robustness—Qualitatively evaluates the stability and reliability of each model’s performance under different conditions or with slightly different data sets. A robust model should maintain consistent predictive power.
Forecast_Table1.jpg
Table 1—Comparative summary of the four production forecasting models’ performance and implementation characteristics. LSTM with NMF achieved the best accuracy, though it required dimensionality reduction and custom preprocessing. CNN and standard LSTM models also performed well but required more manual tuning and input formatting. XGBoost was the most straightforward to use, with competitive accuracy and minimal overfitting observed.

Key Insights From Benchmarking Results
his study compares three deep-learning workflows—CNN, LSTM networks with and without feature reduction via NMF, and one traditional machine learning based on boosting, XGBoost—on the task to forecast 5 years of monthly gas and condensate production rates for horizontal wells drilled in shale reservoirs. Each of the four forecasting workflows was developed and tuned independently by separate teams using the same data set, allowing for a consistent basis of evaluation. Each technique was assessed using a diverse set of performance and implementation criteria, offering a well-rounded understanding of their respective strengths and limitations.

Quantitatively, LSTM with NMF demonstrated the highest accuracy, achieving mean MAPEs of 3.04% (gas) and 3.96% (condensate), and median MAPEs of 2.22% (gas) and 2.90% (condensate). This contrasts with CNN (mean MAPE: 6.26%/7.72%; median MAPE: 5.32%/6.44%), LSTM without NMF (mean MAPE: 5.09%/5.24%; median MAPE: 3.51%/3.35%), and XGBoost (mean MAPE: 5.44%/5.24%; Median MAPE: 4.05%/3.94%). LSTM with NMF also exhibited high robustness and no observed overfitting, despite having a moderate number of trainable parameters (489,661 for gas, 36,445 for condensate) and requiring 80 training epochs. This is presented in Table 1.

In contrast, XGBoost, while showing slightly higher MAPE values than LSTM with NMF, demonstrated excellent usability and very fast inference time (approximately 0.2 seconds total for 100 samples, or ~2 ms per sample). It also exhibited no overfitting and moderate tuning complexity (by using GridSearchCV).

CNN and standard LSTM models performed reasonably well in terms of accuracy but required more manual tuning and were more susceptible to overfitting (slight overfitting observed for LSTM without NMF).

The standard LSTM model had fewer trainable parameters (104,573 for both gas and condensate) and a faster inference time (0.1695s/0.1589s for 100 samples) compared with CNN (665,661/154,045 parameters, 0.3872s/0.1537s inference time), making it potentially preferable in some cases despite requiring more training epochs (200) than CNN (100) or LSTM with NMF (80).

Human bias/setup was highest for LSTM without NMF (fully manual) and hard for ease of use (LSTM sequence formatting), while XGBoost was rated medium for human bias/setup (trial-based pipeline) and very easy for ease of use (plug-and-play).

Recommendations Based on the Benchmarking Results
The benchmarking results offer crucial insight for real-world applications in the energy industry. While deep learning, particularly the LSTM with NMF approach, can provide superior accuracy for complex production forecasting tasks, its implementation often requires significant data preprocessing, feature-engineering expertise, and computational resources. This makes it highly valuable for scenarios where maximizing predictive precision is paramount and the necessary technical infrastructure and expertise are available.

Conversely, boosting-based methods such as XGBoost, while potentially sacrificing a small degree of accuracy compared with the best-deep learning model, offer a compelling balance of competitive performance, exceptional ease of use, rapid inference time, and lower tuning complexity. This makes XGBoost a highly practical and readily deployable solution for operational environments, rapid screening of scenarios, or situations where interpretability and ease of implementation are key priorities.

The choice between these approaches in practice is not a simple one-size-fits-all decision but rather a strategic trade-off based on the specific needs, available resources, and priorities of the application. This comparative framework empowers practitioners to make informed choices tailored to their forecasting goals and technical constraints, moving beyond simply selecting the most accurate model in isolation. Future efforts should also explore automating preprocessing pipelines and hyperparameter tuning to improve the accessibility and reproducibility of advanced deep-learning methods across diverse production data sets.

Based on these findings, the following recommendations are made:

  • For highest accuracy in production forecasting, especially when computational resources and expertise are available, LSTM with NMF is recommended. Its performance justifies the added complexity in preprocessing and setup.
  • For rapid deployment and ease of use, particularly in operational environments with limited machine-learning infrastructure, XGBoost offers the best trade-off between performance and simplicity. It remains robust, scalable, and interpretable with minimal tuning.
  • For applications where architecture flexibility is desired, CNN and standard LSTM are viable, although they may require more attention to model design, tuning, and data formatting.
  • Future efforts should consider automating preprocessing pipelines and hyperparameter tuning, particularly for models requiring dimensionality reduction or sequence formatting. This would improve accessibility and reproducibility across diverse production data sets.

This work was supported by the Texas A&M Data Science Institute under the Data Analytics for Petroleum Industry Certificate program, generously funded by ConocoPhillips.