非常规/复杂油藏

案例研究:利用数据科学优化巴肯运营:工作流程示例和经验教训

数据挖掘技术正在从数千口巴肯油井的油藏动态中挖掘新的洞见。本文探讨这些洞见如何帮助完善完井优化。

北达科他州的一条输油管道
资料来源:Steve Oehlenschlager/Getty Images/iStockphoto。

自2012年左右以来,水平井的大型多级完井技术显著提高了巴肯组的石油产量。截至本文撰写时,巴肯地区已采用各种完井设计完成了超过18,000口增产井。

作业者通过注入越来越大的流体和支撑剂,对井距、增产作业的规模和强度进行了试验,以最大限度地提高石油产量,同时最大限度地降低成本。然而,接受更大规模增产处理的油井并不总是能达到预期效果。此外,各种相互关联的完井设计参数和油藏特征的综合影响尚未完全了解,这为完井优化评估留下了改进空间。

北达科他州工业委员会 (NDIC) 精心收集的公开完井和生产数据,加上数据科学的进步,为利用统计数据分析和预测模型优化钻井和完井策略创造了绝佳机会。现在,我们可以使用数据挖掘技术来分析和解释数千口巴肯油井的观测结果和经验。

本文介绍了数据科学在巴肯生产完井优化计算中应用的示例。

利用预测模型优化完井设计

初步优化研究使用 2020 年数据 ( URTeC 3723843 )分析了北达科他州巴肯地区 12,000 多口产油井的钻井和完井结果。

由于这些依赖关系的非线性特性,基于双变量(二维)散点图对生产性能和完井参数之间的关系进行简单解释非常困难。

我们运用能够处理非线性关系和复杂、不完整信息的数据挖掘技术来确定最佳完井方案。本分析中使用的数据集包括八个公开的完井设计参数:

  1. 射孔段
  2. 注入液体量
  3. 支撑剂量
  4. 阶段数
  5. 注射速率
  6. 注射压力
  7. 支撑剂类型
  8. 完成类型

这些参数用于预测油井性能,以6个月累计产油量衡量。预测模型采用梯度提升 (GB) 数据挖掘工具进行,其中多个决策树描述了多维数据集的变异性。针对整个巴肯油田进行的初始预测模型表明,训练模型的性能尚可(由 R 评估),但测试模型(占所有数据的 20%)的性能较弱。

这种现象被称为过度拟合,当单一统计模型难以准确预测巴肯地区不同地质和储层特性的油井生产情况时,就会发生这种情况。

为了减轻地质变异性的影响并改进测试统计模型,对位于巴肯不同子区域的三组油井分别进行了分析,这三组油井分别代表低产区、中产区和高产区,每组约300口油井(URTeC 3723843)。针对每组油井开发的预测模型得出以下结论:

  1. 变量重要性图表,对每个完井设计参数的重要性进行排序
  2. 单变量依赖图,用于估算完井参数的最佳值,以最大化6个月的产量,同时最小化刺激作业规模(例如,流体体积或支撑剂质量)

不出所料,在所有三个调查子区域中,支撑剂总量和流体总量均成为影响油井性能的最重要因素。这一发现与常见的完井实践以及先前的研究结果一致,这些研究结果表明这些参数与石油产量之间存在很强的相关性。

然而,结果也表明,每个子区域的最佳完井配置有所不同。位于巴肯核心区的高产子区域受益于更高的支撑剂总量和更高的处理压力,而中低产子区域则以略低的支撑剂用量、更高的液量和更低的处理压力实现了最大产量。完井策略的这些差异归因于地层深度、温度、压力、成熟度、储层厚度和其他地质特征的差异。

完成优化评估是使用一系列部分依赖图进行的,这些图显示了在保持所有其他预测因子(完成参数)为平均值不变的情况下,完成设计参数的增加如何影响井性能(6 个月的产量)。

例如,图1显示,当注入1000万磅支撑剂时,巴肯核心区域可实现最大生产性能。

图 1——使用 2020 年数据集,使用 GB(Minitab SPM 数据挖掘工具)为巴肯(核心区域)高产子区域内的油井生成的部分依赖性图,显示最佳总支撑剂值为 1000 万磅。来源:URTeC 3723843。
图 1——使用 2020 年数据集,使用 GB(Minitab SPM 数据挖掘工具)为巴肯(核心区域)高产子区域内的油井生成的部分依赖性图,显示最佳总支撑剂值为 1000 万磅。
来源:URTeC 3723843。

超过此阈值后,进一步增加支撑剂量并不能提高石油产量。针对注入流体、级数、注入速率、支撑剂类型和其他完井参数,绘制了类似的图。值得注意的是,图1中的最佳支撑剂量是使用代表核心区域一部分的2020年数据集确定的。如果使用更新或更大的数据集,或者分析侧重于巴肯区块的不同位置,优化评估的结果可能会发生变化。

聚类分析创建巴肯子区域

鉴于地质和完井设计都控制着非常规油气田的井性能,完井优化方法得到了进一步发展,通过提供将地质因素整合到完井优化分析中的定量方法(Chakhmakhchev 等人,2021 年)。

基于公开的制图和表格数据(Sonnenberg 2017、NDIC 和北达科他州地质调查局),对超过 14,700 口井进行了聚类分析。输入的地质数据包括地层深度、厚度、温度、压力梯度、孔隙度和渗透率。该分析还整合了其他地球化学信息,包括烃源岩特征,例如氢指数 (HI)、最高温度 (Tmax)、总有机碳 (TOC) 以及整体油气性质。

采用K均值聚类算法对油井进行分组,并根据相似的地质和储层特征创建子区域。最终划分出的五个子区域边界清晰,重叠度极小(图2)。用于分类的关键地质和地球化学变量值与巴肯地区的专业知识相符。

图2:采用均值聚类算法在北达科他州巴肯地区创建了具有相似地质和储层特征的研究子区域。资料来源:Chakhmakhchev等人,2021年。
图 2——采用均值聚类算法,在北达科他州巴肯地区创建了具有相似地质和储层特征的研究子区域。
资料来源:Chakhmakhchev 等人,2021 年。

这种分层分析方法可以扩展到单井评估之外,以优化钻井间距单元 (DSU) (图 3)。DSU优化分析汇总了完井和产量值,同时融入了其他预测因素,例如井距和每个 DSU 的井数,以解释井间相互作用。

图3:巴肯地区运用数据科学优化完井设计工作流程。资料来源:Chakhmakhchev,2025年。
图 3——使用数据科学的巴肯完井设计优化工作流程。
来源:Chakhmakhchev,2025 年。

经验教训

完井优化评估基于对数千口增产井实际实施的完井设计策略的分析。这些评估中使用的统计方法并未对物理过程进行建模,而是从历史作业数据中学习。

自2005年巴肯油田开发开始以来,完井技术不断发展。这些发展包括增加增产强度、延长水平段、应用转向器以及二次完井。为了确保优化计算有意义,输入数据集必须包含使用类似技术完成的井。实现此目标的一种方法是根据完井年份筛选数据,将2012年和2017年作为重要的技术里程碑。

部分依赖图的未来改进方向可能包括使用热图或三维图表来可视化多个完井参数对生产性能的综合影响。此外,建议对模拟数据集进行敏感性分析,而不是假设部分依赖图中其他参数的静态值。这种方法会将预测因子设置为各自的分位数(例如 20%、50%、70% 和 90%),并生成反映多种情景的部分依赖图。

事实证明,6 个月、12 个月和 24 个月的累计石油产量指标是油井性能的可靠指标。

然而,通过使用滑动窗口方法计算最佳的6个月、12个月和24个月累计产量,可以实现更精确的评估。仅仅依靠前6个月或12个月的累计产量可能无法始终准确评估油井动态,因为初始产量通常受运营限制的影响。运营商经常会由于技术和非技术原因(例如设施限制、市场条件或压力管理策略)而减少早期产量。通过结合滑动窗口技术,可以减轻这些限制的影响,从而更精确地评​​估油井产能。

K均值算法仍然是将油井分配到地质相似的聚类的有效工具。尽管K均值是一种无监督学习方法,但必须提前指定聚类数量。

肘部方法或轮廓方法等正式方法可以帮助确定最佳聚类数量,但领域知识最终应指导选择,以确保与区域和当地地质趋势保持一致。

进一步阅读

URTeC 3723843 使用数据挖掘对巴肯石油系统进行完井优化, 作者:A. Chakhmakhchev、N. Azzolin、B. Kurz、X. Yu、C. Dalkhaa、J. Kovacevich 和 J. Sorensen。

利用聚类分析增强巴肯油气系统完井优化研究, 作者:A. Chakhmakhchev、N. Azzolina、B. Kurz、X. Yu、J. Kovacevich、K. Glazewski、J. Sorensen、C. Gorecki、J. Harju 和 E. Steadman。EERC,巴肯生产优化计划报告已公开(2021 年)。

美国威利斯顿盆地巴肯含油气系统巨型连续油藏, 作者:SA Sonnenberg、C. Theloy 和 H. Jin。AAPG回忆录,《2000-2010 年的巨型油田》。RK Merrill 和 CA Sternbach 编,美国石油地质学家协会(2017 年)。

Alexander Chakhmakhchev,博士,SPE,独立顾问,在国际石油行业拥有超过20年的经验。此前,他曾于2019年至2025年担任北达科他大学能源与环境研究中心 (EERC) 的首席科学家。他领导了巴肯生产优化项目的研究,重点关注提高采收率、生产优化、地球化学解决方案和环境保护。此前,他还曾担任Applied Chem Data公司的高级数据科学家,以及位于德克萨斯州伍德兰兹的SGS石油、天然气和化学服务公司的首席地球化学家。他的联系方式:alchak@sbcglobal.net

原文链接/JPT
Unconventional/complex reservoirs

Case Study: Optimization of Bakken Operations Using Data Science: Examples of the Workflows and Lessons Learned

Data mining techniques are unlocking new insights from the performance of thousands of Bakken wells. This article explores how those insights are helping refine completion optimization.

An oil pipeline in North Dakota
Source: Steve Oehlenschlager/Getty Images/iStockphoto.

Since approximately 2012, large multistage completions in horizontal wells have significantly boosted oil production in the Bakken Formation. As of the time of writing, more than 18,000 stimulated wells have been completed in Bakken using various completion designs.

Operators have experimented with well spacing, as well as the size and intensity of stimulation jobs by injecting increasingly large volumes of fluid and proppant to maximize oil production while minimizing costs. However, wells that received larger treatments did not always perform as expected. Furthermore, the aggregated impact of various interrelated completion design parameters and reservoir characteristics was not fully understood, leaving room for improvement in completion optimization evaluations.

Publicly accessible completion and production data, meticulously collected by the North Dakota Industrial Commission (NDIC), combined with advancements in data science, have created an excellent opportunity to optimize drilling and completion strategies using statistical data analysis and predictive modeling. Observations and experiences from thousands of producing Bakken wells can now be analyzed and interpreted using data mining techniques.

This article presents examples of data science applications in completion optimization calculations in Bakken production.

Completion Design Optimization Using Predictive Modeling

The initial optimization study analyzed drilling and completion results from more than 12,000 oil-producing wells across the Bakken in North Dakota using 2020 data (URTeC 3723843).

Simple interpretations of the relationship between production performance and completion parameters, based on bivariate (two-dimensional) scatterplots, proved challenging due to the nonlinear nature of these dependencies.

Data-mining techniques capable of accommodating nonlinear relationships and complex, incomplete information were applied to identify optimal completion practices. The dataset used in this analysis included eight publicly available completion design parameters:

  1. Perforated interval
  2. Injected fluid volume
  3. Proppant amount
  4. Stage count
  5. Injection rate
  6. Injection pressure
  7. Proppant type
  8. Completion type

These parameters were used to predict well performance, measured by cumulative 6-month oil production. Predictive modeling was performed using the gradient boosting (GB) data-mining tool, in which multiple decision trees described the variability of the multidimensional dataset. The initial predictive modeling conducted for the entire Bakken play demonstrated acceptable performance for the training model (evaluated by R²), but the test model—using 20% of all data—showed weaker performance.

This phenomenon, known as overfitting, occurred when a single statistical model struggled to accurately predict well production performance across various Bakken locations which are characterized by heterogeneous geology and reservoir properties.

To mitigate the impact of geologic variability and improve test statistical models, the analysis was conducted separately on three groups of wells located in distinct Bakken subareas representing low-, moderate-, and high-productivity regions, with approximately 300 wells in each group (URTeC 3723843). The predictive models developed for each three groups of wells generated:

  1. Variable importance graphs, ranking the significance of each completion design parameter
  2. One-variable dependence graphs, used to estimate the optimal values of completion parameters that maximized 6-month production while minimizing stimulation job size (e.g., fluid volume or proppant mass)

Unsurprisingly, across all three investigated subareas, total proppant and total fluid emerged as the most influential factors affecting well performance. This finding aligns with common completion practices and previous studies demonstrating a strong correlation between these parameters and oil production.

However, the results also suggested different optimal completion configurations for each subarea. The high-productivity subarea located in the core area of Bakken benefited from higher total proppant and higher treatment pressure, while the moderate- and low-productivity subareas achieved maximum oil production with slightly lower proppant amounts, increased fluid volumes, and lower treatment pressures. These differences in completion strategies were attributed to variations in formation depth, temperature, pressure, maturity level, reservoir thickness, and other geologic characteristics.

Completion optimization evaluations were done using a series of partial dependence plots which showed how the increase of the completion design parameter impacts the well performance (6-month production) while holding all other predictors (completion parameters) constant at average values.

For instance, Fig. 1 shows that maximum production performance in the core area of the Bakken is achieved when 10 million lb of proppant are injected.

Fig. 1—Partial dependence plot generated from GB (Minitab SPM data mining tool) for wells in the high-productivity subarea of the Bakken (core area) using the 2020 dataset, showing an optimal total proppant value of 10 million lb. Source: URTeC 3723843.
Fig. 1—Partial dependence plot generated from GB (Minitab SPM data mining tool) for wells in the high-productivity subarea of the Bakken (core area) using the 2020 dataset, showing an optimal total proppant value of 10 million lb.
Source: URTeC 3723843.

Further increases of the proppant amount beyond this threshold does not improve oil production. Similar plots were built for the injected fluid, number of stages, injection rate, proppant type and other completion parameters. It is important to note that the optimal proppant amount in Fig. 1 was determined using the 2020 dataset representing a section of the core area. The results of optimization evaluations are subject to change if newer or larger datasets are used or if the analysis focuses on different Bakken locations.

Cluster Analysis Creates Subareas of Bakken

Given that both geology and completion design control well performance in unconventional oil and gas plays, the completion optimization methodology was further developed by providing a quantitative means for integrating geologic factors into completion optimization analysis (Chakhmakhchev et al., 2021).

Cluster analysis based on more than 14,700 wells was completed using publicly available cartographic and tabular data (Sonnenberg 2017, NDIC, and North Dakota Geological Survey). The input geologic data included formation depth, thicknesses, temperature, pressure gradient, porosity, and permeability. Additional geochemical information integrated into this analysis included source rock characteristics such as hydrogen index (HI), Tmax, total organic carbon (TOC), and bulk oil and gas properties.

K-means clustering algorithm was used to group wells and create subareas characterized by similar geologic and reservoir characteristics. The resulting five subareas had well-defined boundaries with minimal overlap (Fig. 2). The values of key geologic and geochemical variables driving classification aligned with domain expertise in the Bakken.

Fig. 2—K-means clustering algorithm created the studied subareas in the North Dakota Bakken characterized by similar geologic and reservoir characteristics. Source: Chakhmakhchev et al., 2021.
Fig. 2—K-means clustering algorithm created the studied subareas in the North Dakota Bakken characterized by similar geologic and reservoir characteristics.
Source: Chakhmakhchev et al., 2021.

This stratified approach of analyses can be extended beyond individual well evaluation to optimize drilling spacing units (DSUs) (Fig. 3). The DSU optimization analysis aggregates well completion and production values while incorporating additional predictors such as well spacing and well count per DSU to account for wells’ interaction.

Fig. 3—Completion design optimization workflow in Bakken using data science. Source: Chakhmakhchev, 2025.
Fig. 3—Completion design optimization workflow in Bakken using data science.
Source: Chakhmakhchev, 2025.

Lesson Learned

The completion optimization evaluations were based on the analysis of actual completion design strategies implemented in thousands of stimulated wells. The statistical approach used in these evaluations did not model physical processes but instead learned from historical operational data.

Since the beginning of Bakken development in 2005, completion technologies have continually evolved. This evolution has included increased stimulation intensity, longer laterals, the application of diverters, and recompletions. To ensure meaningful optimization calculations, the input dataset must consist of wells completed using similar technologies. One way to achieve this is by filtering data based on year of completion, with 2012 and 2017 serving as major technological milestones.

Future enhancements of the partial dependence plots could include the use of heat maps or 3D charts to visualize the combined effects of multiple completion parameters on production performance. Furthermore, rather than assuming static values for other parameters in partial dependence plots, sensitivity analysis on a simulated dataset is recommended. This approach would set predictors at their respective quantiles (e.g., 20, 50, 70, and 90%) and generate partial dependence plots reflecting multiple scenarios.

The 6-, 12-, and 24-month cumulative oil production metrics have proven to be reliable indicators of well performance.

However, a more refined assessment can be achieved by calculating the best 6-, 12-, and 24-month cumulative production volumes using a sliding window approach. Relying solely on the first 6- or 12-month cumulative volumes may not always yield an accurate assessment of well performance, as initial production rates are often influenced by operational constraints. Operators frequently curtail early production due to both technical and nontechnical reasons, such as facility limitations, market conditions, or pressure management strategies. By incorporating the sliding window technique, the impact of such constraints can be mitigated, leading to a more precise evaluation of well productivity.

The k-means algorithm remains an effective tool for assigning wells to geologically similar clusters. Although k-means is an unsupervised learning method, the number of clusters must be specified in advance.

Formal approaches such as the elbow or silhouette methods can assist in determining the optimal number of clusters, but domain knowledge should ultimately guide the selection to ensure alignment with regional and local geologic trends.

For Further Reading

URTeC 3723843 Completion Optimization in the Bakken Petroleum System Using Data Mining by A. Chakhmakhchev, N. Azzolin, B. Kurz, X. Yu, C. Dalkhaa, J. Kovacevich, and J. Sorensen.

Using Cluster Analysis To Enhance Completion Optimization Studies of the Bakken Petroleum System by A. Chakhmakhchev, N. Azzolina, B. Kurz, X. Yu, J. Kovacevich, K. Glazewski, J. Sorensen, C. Gorecki, J. Harju, and E. Steadman. EERC, Bakken Production Optimization Program report in public domain (2021).

The Giant Continuous Oil Accumulation in the Bakken Petroleum System, US Williston Basin by S.A. Sonnenberg, C. Theloy, and H. Jin. AAPG Memoir, Giant Fields of the Decade 2000–2010. R.K. Merrill and C.A. Sternbach (eds), American Association of Petroleum Geologists (2017).

Alexander Chakhmakhchev, PhD, SPE, independent consultant, holds more than 20 years of experience in the international petroleum industry. He previously served as a principal scientist at the Energy & Environmental Research Center (EERC) at the University of North Dakota from 2019 to 2025. He led research under the Bakken Production Optimization Program, focusing on enhanced oil recovery, production optimization, geochemical solutions, and environmental protection. He also previously worked as a senior data scientist at Applied Chem Data and as principal geochemist at SGS Oil, Gas, and Chemical Services in The Woodlands, Texas. He can be reached at alchak@sbcglobal.net.