生产

使用长短期记忆网络的自适应生产预测

本文提出了一种用于自适应碳氢化合物产量预测的深度学习方法,即长短期记忆网络,该方法将历史操作和生产信息作为输入序列,以根据操作计划预测石油产量。

技术箭头电路
资料来源:amtitus/Getty Images

产量预测是油藏管理和油藏开发决策中的一项重要任务。然而,由于不确定的地质条件、生产历史、数据质量和不确定的操作因素的限制,很难准确预测一段时间内的油井产量。本文提出了一种用于自适应碳氢化合物产量预测的深度学习方法,即长短期记忆网络。该方法将历史操作和生产信息作为输入序列来预测石油产量作为操作计划的函数。

Volve 数据集
Equinor 的 Volve 数据集是一个公开可用的地下数据集。本研究使用 Volve 数据集的 15/9-F-12 井来证明自适应产量预测方法的有效性。该数据集以沃尔沃油田命名,该油田于 1993 年发现,并于 2008 年至 2016 年在挪威近海的北海生产石油。对于产量预测,考虑了多种特征,包括运行时间、平均井底压力和温度、井口压力、节流器尺寸以及每日碳氢化合物产量。

长短期记忆 (LSTM) 网络
所提出的用于生产预测的 LSTM 神经网络模型考虑了油井生产率和操作变异性。它专为时间相关数据而设计,可以保留长期依赖性,同时在短时间内保持准确性。LSTM 网络因其独特的存储单元和门控机制而非常适合处理顺序数据,可随着时间的推移选择性地存储和检索信息。这使得它们适合处理需要考虑过去和现在的输入和输出的预测问题。

LSTM 超参数。超参数调整对于提高生产预测的性能是必要的。超参数调整具有性能增强、泛化和节省计算资源等优点。值得注意的是,超参数调整可能是一个耗时的过程。避免模​​型与验证集过度拟合非常重要,这可能导致在未见过的数据上表现不佳。因此,使用交叉验证技术在单独的测试集上评估最终模型非常重要。超参数的选择决定了最适合 Volve 数据集生产预测的 LSTM 网络的最终结构(图 1)。

以下是LSTM的几个比较重要的超参数及其作用:

  • LSTM 单元的数量 -该超参数确定每个 LSTM 层中的存储单元(或 LSTM 单元)的数量。较多数量的 LSTM 单元允许模型记住更多信息,但也会增加计算成本和过度拟合的风险。为了获得最佳模型精度,我们在模型中使用了 128 和 64 个单元。
  • LSTM 层数——该超参数决定 LSTM 网络的深度。更深的网络允许模型捕获输入序列中更复杂的模式,但也会增加过度拟合的风险。在我们的模型中,两个 LSTM 层足以构建神经网络。
  • 学习率——该超参数控制训练过程中梯度下降算法的步长。较高的学习率允许模型学习得更快,但也可能导致模型超越最优解并收敛到次优解。较低的学习率(在我们的例子中为 0.0001)允许模型收敛得更慢,但也会阻止其逃离局部最小值。
  • Dropout 率——Dropout是一种正则化技术,通过迫使模型学习更稳健的特征来帮助防止过度拟合。较高的退出率可以提高模型的泛化能力,但也会降低模型学习复杂模式的能力。为了获得最佳模型性能,我们提出了两个 dropout 层,每个层的率为 0.5,以便在每次迭代期间正确删除一定比例的 LSTM 单元。
  • 批次大小——此超参数确定每个训练批次中处理的样本数量。较大的批量大小可以加快训练过程并提高模型稳定性,但也会增加内存使用量和过度拟合的风险。鉴于此,我们选择批量大小为 32。
  • 激活函数——该超参数将非线性引入模型。激活函数应用于 LSTM 单元。在我们的例子中,使用了 ReLU 和线性激活函数。选择最佳的激活函数可以帮助模型学习输入序列的更准确的表示。
  • 早停——通过早停,可以在更短的时间内训练模型,降低过拟合的风险,提高模型的泛化性能。此外,提前停止可以节省时间和资源,否则这些时间和资源将浪费在训练过度拟合的模型上。
LSTM_Fig1.JPG
1—— 超参数调整后的LSTM模型结构。
资料来源:悉达多·米斯拉

时间序列嵌套交叉验证。为了根据时间相关数据训练和调整 LSTM 网络,使用了时间序列嵌套交叉验证。它评估模型的性能并选择超参数,同时考虑数据的时间顺序。数据被分成多个折叠,每个折叠由一个连续的时间块组成(图 2)。折叠分为外环和内环。在外循环中,使用滑动窗口方法将数据分为训练集和测试集。在内循环中,每个训练集进一步分为训练集和验证集,并使用不同的超参数训练多个模型。根据验证集选择性能最佳的超参数。我们的工作使用五倍时间序列嵌套交叉验证,使用均方对数误差(MSLE)测量五倍的平均性能。这种交叉验证方法搜索最能从时间相关数据的各种时间顺序中学习的超参数,并评估模型在其余时间顺序上的性能。

LSTM_Fig2.JPG
图2’时间 序列嵌套交叉验证。
资料来源:悉达多·米斯拉

损失函数。损失函数代表评估训练时LSTM网络模型的预测与用于训练的实际输出之间的差异的函数。在我们的 LSTM 模型中,MSLE 用作损失函数的代理。MSLE 定义为预期值和实际值之间的方差均值(自然对数)。

为 LSTM 模型的每次迭代计算损失,以确定当前模型的性能。在这种情况下,如果验证损失增加,模型将提前停止,因为当验证损失在几次训练后开始增加时,这表明模型对训练数据中的某些模式过度拟合。图 3显示了 LSTM 网络训练超过 120 个 epoch 后训练和验证阶段的损失曲线下降。提前停止在 120 个 epoch 左右被激活。

LSTM_Fig3.jpg
图 3’为 针对沃尔沃数据进行自适应生产预测而训练的 LSTM 网络的训练和验证损失曲线。
资料来源:悉达多·米斯拉

使用 LSTM 网络进行自适应生产预测

输入和输出序列。如图 4所示,整个预测工作流程使用序列到序列策略,除了 3 个未来操作序列外,还使用 ​​10 个历史操作和生产序列来预测一个未来生产序列。我们对 LSTM 模型进行了广泛的超参数调整,发现使用 5 天的 10 个历史生产和操作序列以及 5 天的 3 个未来操作序列最适合预测 5 天的石油生产序列。这三个运行参数是运行时间、平均阻流器尺寸和压差 (DP) 阻流器尺寸。自适应产量预测使用历史信息和运营计划来预测石油产量。

具体来说,我们的预测工作流程使用时间指数 n-5 到 n-1 的 10 个操作和生产参数来表示前 5 天。这10个历史序列被用作LSTM网络的输入序列。另外,时间索引n到n+4的三个特定操作参数的值被用作未来的特征序列。这 13 个序列一起作为 LSTM 模型的输入特征,该模型预测接下来 5 天的石油生产序列(时间索引 n 到 n+4)。图 4 详细说明了 13 个输入序列和 1 个输出序列。

LSTM_Fig4.JPG
图 4 —自适应产量预测的输入/特征序列(绿色和黄色)和输出/目标序列(红色)。预测的石油生产顺序以红色显示。作为输入的 10 个历史操作和生产序列以绿色显示。用作输入的三个未来操作序列以黄色显示。
资料来源:悉达多·米斯拉

模型评估。训练和调优后,我们对单独保存的测试数据进行为期 1 年、每次 5 天的预测。为了评估预测模型的准确性,我们在评估模型在测试集上的性能时使用了 MSLE 指标。除 70 天和 230 天左右外,预测误差较低(如图 5顶部所示)。LSTM 在正常情况下表现非常好,除了两次突然关闭的情况。然而,LSTM 模型可以很好地处理不太大的生产变化(例如,大规模停工)。在图 5 中,测试数据的实际石油产量剖面以绿色显示,预测产量剖面以红色显示。图 5 顶部的蓝色曲线显示了 MSLE 的预测误差。

LSTM_Fig5.jpg
图5’1年测试数据误差 分析。
资料来源:悉达多·米斯拉

自适应生产预测的模型部署。部署在培训/测试阶段完成后进行。为了演示自适应生产预测方法的部署,我们仅更改了图 5 中所示的三个操作参数:通流时间、平均油门尺寸和 DP 油门尺寸,它们代表了用作 LSTM 输入的未来操作序列。其余输入序列保持不变。图 6显示了 1 年的石油产量预测,每次预测 5 天。在部署阶段,由于缺乏测量数据,无法评估模型的性能,而测量数据仅适用于训练和测试阶段。图 6 证实,自适应产量预测方法可以从历史数据中学习,预测对运行时间、平均油门尺寸和 DP 油门尺寸变化敏感的石油产量。自适应生产预测方法可以成功地从训练数据中学习,以便在一年内进行部署。

LSTM_Fig6.jpg
图 6 — 一年期间的石油 产量预测(顶部图)作为三个运行参数(底部三个图)变化的函数,即运行时间、平均油门尺寸和 DP 油门尺寸。
资料来源:悉达多·米斯拉

原文链接/jpt
Production

Adaptive Production Forecasting Using a Long Short-Term Memory Network

This article presents a deep-learning approach, the long short-term memory network, for adaptive hydrocarbon production forecasting that takes historical operational and production information as input sequences to predict oil production as a function of operational plans.

technology arrow circuit
Source: amtitus/Getty Images

Production forecasting is a crucial task in reservoir management and reservoir development decisions. Because of the limitations of uncertain geological conditions, production history, data quality, and uncertain operational factors, however, it is difficult to accurately predict the well production over time. This article presents a deep-learning approach, the long short-term memory network, for adaptive hydrocarbon production forecasting. The method takes historical operational and production information as input sequences to predict oil production as a function of operational plans.

Volve Data Set
Equinor’s Volve data set is a publicly available subsurface data set. Well 15/9-F-12 from the Volve data set is used in this study to demonstrate the effectiveness of the adaptive production forecasting methodology. The data set is named after the Volve oil field, which was discovered in 1993 and produced oil from 2008 to 2016 in the North Sea, offshore Norway. For production forecasting, multiple features were considered that include onstream hours, average bottomhole pressure and temperature, wellhead pressure, choke size, as well as daily hydrocarbon production.

Long Short-Term Memory (LSTM) Network
The proposed LSTM neural network model for production forecasting accounts for well productivity and operational variabilities. It is designed for time-dependent data and can retain long-term dependence while maintaining accuracy over short time scales. LSTM networks are ideal for processing sequential data because of their unique memory cell and gating mechanism, which selectively stores and retrieves information over time. This makes them suitable for handling forecasting problems that require consideration of past and present inputs and outputs.

LSTM Hyperparameters. Hyperparameter tuning is necessary for improving the performance of production forecasting. Hyperparameter tuning offers advantages such as performance enhancement, generalization, and computational resource savings. It is important to note that hyperparameter tuning can be a time-consuming process. It is important to avoid overfitting the model to the validation set, which can lead to poor performance on unseen data. Using cross-validation techniques evaluating the final model on a separate test set, therefore, is important. The choice of hyperparameters governs the final structure of the LSTM network best suited for the production forecasting for the Volve data set (Fig. 1).

The following are a few of the more important hyperparameters of LSTM and their roles:

  • Number of LSTM units—This hyperparameter determines the number of memory cells (or LSTM units) in each LSTM layer. A higher number of LSTM units allows the model to remember more information but also increases the computational cost and the risk of overfitting. For the best model accuracy, we used 128 and 64 units in our model.
  • Number of LSTM layers—This hyperparameter determines the depth of the LSTM network. A deeper network allows the model to capture more complex patterns in the input sequence but also increases the risk of overfitting. In our model, two LSTM layers were enough to build the neural network.
  • Learning rate—This hyperparameter controls the step size of the gradient descent algorithm during training. A higher learning rate allows the model to learn faster but can also cause the model to overshoot the optimal solution and converge to a suboptimal solution. A lower learning rate, in our case 0.0001, allows the model to converge more slowly but also prevents it from escaping local minima.
  • Dropout rate—Dropout is a regularization technique that helps prevent overfitting by forcing the model to learn more robust features. A higher dropout rate can increase the model’s ability to generalize but can also decrease the model’s capacity to learn complex patterns. For the best model performance, we proposed two dropout layers with a rate of 0.5 for each to properly remove a certain fraction of LSTM units during each iteration.
  • Batch size—This hyperparameter determines the number of samples that is processed in each training batch. A larger batch size can speed up the training process and improve model stability, but it also increases memory usage and the risk of overfitting. Given this, we chose a batch size of 32.
  • Activation function—This hyperparameter introduces nonlinearity into the model. Activation function is applied to the LSTM units. In our case, the ReLU and Linear activation functions are used. Choosing the best activation function can help the model learn more accurate representations of the input sequence.
  • Early stopping—By using early stopping, the model can be trained in less time, reducing the risk of overfitting and improving the model’s generalization performance. Additionally, early stopping can save time and resources that would otherwise be wasted on training an overfitted model.
LSTM_Fig1.JPG
Fig. 1—The structure of the LSTM model after hyperparameter tuning.
Source: Siddharth Misra

Time-Series Nested Cross Validation. To train and tune the LSTM network on time-dependent data, time-series nested cross-validation is used. It evaluates the model’s performance and selects hyperparameters while accounting for the temporal ordering of the data. The data is split into multiple folds, with each fold consisting of a contiguous block of time (Fig. 2). The folds are divided into an outer and inner loop. In the outer loop, a sliding window approach is used to split the data into training and testing sets. In the inner loop, each training set is further split into a training and validation set and multiple models are trained with different hyperparameters. The best-performing hyperparameters are selected based on the validation set. Our work uses fivefold time-series nested cross validation, with the average performance of the five folds measured using the mean squared logarithmic error (MSLE). This cross-validation method searches for hyperparameters that best learn from various chronological sequences of the time-dependent data and evaluates the model’s performance on the remaining chronological sequence.

LSTM_Fig2.JPG
Fig. 2—Time series nested cross-validation.
Source: Siddharth Misra

Loss Function. The loss function represents the function that evaluates the difference between the prediction of the LSTM network model under training and the actual output used for the training. In our LSTM model, the MSLE is used as a proxy for the loss function. The MSLE is defined as the mean of the squared variances between the expected and actual values’ natural logarithms.

The loss is computed for each iteration of the LSTM model in order to determine the current model’s performance. In this case, the model will be stopped early if the validation loss increases because, when the validation loss starts increasing after several training sessions, that indicates the model is overfitting for some pattern in the training data. Fig. 3 shows the drop in loss curves for training and validation stages as the LSTM network is trained over 120 epochs. Early stopping is activated around 120 epochs.

LSTM_Fig3.jpg
Fig. 3—The training and validation loss curves for the LSTM network trained for adaptive production forecasting on the Volve data.
Source: Siddharth Misra

Adaptive Production Forecasting Using the LSTM Network

Input and Output Sequences. The overall forecasting work flow, as depicted in Fig. 4, uses a sequence-to-sequence strategy with 10 historical operational and production sequences, in addition to three future operational sequences, to predict one future production sequence. We performed extensive hyperparameter tuning for the LSTM model to find that using 5 days of 10 historical production and operational sequences along with 5 days of three future operational sequence are best suited to predict 5 days of oil production sequence. These three operational parameters are onstream hours, average choke size, and differential pressure (DP) choke size. The adaptive production forecasting uses both historical information along with operational plans to predict oil production.

Specifically, our forecasting work flow uses 10 operational and production parameters for time indices n-5 through n-1 to represent the previous 5 days. These 10 historical sequences are used as input sequences for the LSTM network. Additionally, the values of three specific operational parameters for time indices n through n+4 are used as future feature sequences. Together, these 13 sequences serve as the input features for the LSTM model, which predicts one oil production sequence for the following 5 days (time indices n through n+4). Fig. 4 elaborates the 13 input sequences and one output sequence.

LSTM_Fig4.JPG
Fig. 4—The input/feature sequence (green and yellow) and output/target sequence (red) for the adaptive production forecasting. The oil production sequence to be predicted is shown in red. The 10 historical operational and production sequences that serve as inputs are shown in green. The three future operational sequences that serve as inputs are shown in yellow.
Source: Siddharth Misra

Model Evaluation. After the training and tuning, we perform the forecasting for a period of 1 year, 5 days at a time, on the test data that was kept separate. To assess the accuracy of our forecasting model, we used the MSLE metric when evaluating the model’s performance on the testing set. Errors in forecasting are low except around 70 days and 230 days (shown at the top in Fig. 5). LSTM performs very well during normal scenarios, except at the two instances when there were sudden shutdowns. However, the LSTM model can handle changes in the production that are not very large (e.g., large shutdowns) very well. In Fig. 5, the actual oil production profile of the test data is shown in green and the forecasted production profile is shown in red. The blue curve at the top in Fig. 5 shows the forecasting error in terms of MSLE.

LSTM_Fig5.jpg
Fig. 5—Error analysis on testing data for a period of 1 year.
Source: Siddharth Misra

Model Deployment for Adaptive Production Forecasting. Deployment happens after the completion of the training/testing stage. To demonstrate the deployment of the adaptive production forecasting method, we changed only three operational parameters—onstream hours, average choke size, and DP choke size—as shown in Fig. 5, which represent the future operational sequences used as input to LSTM. The remaining input sequence were kept constant. Fig. 6 shows the forecast of the oil production for 1 year, being predicted 5 days at a time. In the deployment stage, the performance of the model cannot be evaluated because of the lack of measured data, which is available only for training and testing stages. Fig. 6 confirms that the adaptive production forecasting methodology can learn from the historical data to predict the oil production that is sensitive to changes in onstream hours, average choke size, and DP choke size. The adaptive production forecasting methodology can successfully learn from the training data for deployment over a period of 1 year.

LSTM_Fig6.jpg
Fig. 6—Oil production forecast (top plot) for a period of 1 year as a function of variations in three operational parameters (bottom three plots), namely onstream hours, average choke size, and DP choke size.
Source: Siddharth Misra