Failure Prediction for Rod Pump Artificial Lift Systems

Failure prediction, a subset of anomaly detection which aims at the precursory events that potentially triggers failures, is of great value in maintaining reliable complex engineering systems. Given massive amount of historical data in multivariate time series for a complex system, data mining and m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Liu, Yintao
Format:	Dissertation
Sprache:	Englisch
Veröffentlicht:	ProQuest Dissertations & Theses 01.01.2013
Schlagworte:	Computer science
ISBN:	9781303694042, 1303694042
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Failure prediction, a subset of anomaly detection which aims at the precursory events that potentially triggers failures, is of great value in maintaining reliable complex engineering systems. Given massive amount of historical data in multivariate time series for a complex system, data mining and machine learning techniques can play an important role that learns from historical failures which can then be integrated into real world monitoring and fault-prevention applications. In this dissertation, such data mining and machine learning techniques are applied to failure prediction in digital oil fields. However, there are two major challenge categories in applying to oil fields where there are wells in multiple assets: 1) within a single domain/asset, and 2) across multiple domains/assets. For a single domain/asset, the data collected is in high dimensional multivariate time series which contains uncertainties such as noise, missing data, inconsistent data, etc. In Multiple domains/assets, because of the rarity of such events with regards to the heterogeneity from thousands of assets and diversity of failure patterns, as well as sparse labels and limited resource, it is unrealistic to build individual predictive model for each asset. This thesis addresses the problems of failure prediction on multiple multivariate time series: 1) how to systematically learn from historical failures and train an effective model that is applicable in failure prediction application; 2) how to train a generalized model from the labeled dataset that is efficient in predicting failures out of thousands of multivariate time series that exhibit clustered heterogeneity. This thesis emphasizes, but not limited to, down-hole mechanical failures for rod pump artificial lift systems (a.k.a. rod pumps), which is the most common type of oil producer system. The first part of the thesis addresses the first challenge category by presenting Smart Engineering Apprentice system (SEA) that involves data extraction, data preparation, feature extraction, machine learning, alert generation and knowledge management. The data extraction stage extracts data needed including the time series data and event logs from the enterprise database. The noise and missing values are partly handled in the data preparation stage. The denoised data is then fed into feature extraction stage for obtaining features. Given a labeled dataset, general supervised learning algorithms can be applied to train, test and evaluate the results in the Machine Learning stage. In the Alerting stage the system visually depicts alerts to provide warning of impending failures. Finally, in knowledge management stage, a wide range of factors are combined to train a confidence level model for ranking across multiple assets from multiple categories. SEA has been successfully applied to failure prediction for rod pump artificial lift systems. In the second part, this thesis addresses an approach to handling the second challenge by presenting an in-depth model for generalizing the learning algorithm so that a unified model can be applied to multiple heterogeneous fields yet maintaining comparable precision and recall. Our objective is to build a generalized model that: 1) automatically recognizes examples based on limited knowledge from the subject matter experts (SME); 2) takes advantage of larger amount of recognizable examples from all historical data so that the learned model is statistically more robust; 3) better customizes so that different fields are capable of exhibiting variations that arises from other important uncertainties that were difficult to be considered during previous algorithms. We proposed an unsupervised rule-enhanced labeling with support vector machine (SVM) that enables the SEA system to learn from much larger historical data from multiple fields. Then we further improved this algorithm by proposing a multitask learning algorithm that combines multiple decision relevant factors to yield a better generalized global model. As a pure data-driven system, SEA is evaluated using real-world data from thousands of rod pump artificial lift systems in multiple heterogeneous oil fields. Experiments show that SEA produces good results and significant economic value for use in oil fields.
Bibliographie:	SourceType-Dissertations & Theses-1 ObjectType-Dissertation/Thesis-1 content type line 12
ISBN:	9781303694042 1303694042