Quantcast
Channel: Oil Council - Energy Blog
Viewing all articles
Browse latest Browse all 2

The Big Data Challenges Associated with Validating Safe and Secure Operation of Critical Energy Infrastructure

0
0

Sifting Vast Amounts of Documents to Locate Critical Records as a Precursor to Validation

The United States has some of the largest and oldest industrial infrastructure in the world, including its natural gas, oil, electrical and communications pipelines and wires of which 85+% is controlled by private industry.  Placed into service at the beginning of the modern industrial era, much of this infrastructure is still in use, serving the needs of society.  The operation of this infrastructure comes under multiple regulatory purviews, including but not limited to the Department of Transportation (safe operation of pipelines), Homeland Security (secure operation of critical infrastructure), the Federal Energy Regulatory Commission (NERC – CIP) and specific state regulations governing intrastate operation.  A substantial portion of the infrastructure was built before 1970, and was placed into service when the pipes and lines were going through unpopulated areas; today those areas are cities, towns and suburbs, making the impact of an incident more likely to cause destruction of property and loss of life.
 
In order to validate the safety or security of any physical asset, it is necessary to determine if the asset is operating within design specifications.  Depending on the type of asset, this generally means locating the “as-built” records, which confirm the materials used and the results of testing performed before turnover by the constructor to an operator.  In addition, repairs made over the course of several decades constitute additional sets of records which need to be found and their data analyzed.  The weakest link in the chain determines how strong the entire chain is; one weak valve or section of line sets the upper limit for operational capacities and pressures for the entire system.  Knowing which pieces of the system are weak lessens the potential for mishaps or catastrophe caused by operating above safe design limits.  Having a good handle on the as-built and repair records improves the ability to protect the systems from malicious attack, as preventative measures can be designed and implemented based on good knowledge of the design.  
 
The failure modes of some types of infrastructure have a domino effect whereby a surge (or tidal wave of energy) is created.  This happened during the Northeast Blackout of 2003 when a sequence of events, including a tree branch and a software bug in the alarm system, triggered a cascading failure that automatically took 265 power plants offline. 
 
In San Bruno, California, a high pressure natural gas pipeline exploded and destroyed many homes in the neighborhood near San Francisco and caused several deaths.  The subsequent investigation found poor welds to be the primary cause of the failure, and final fine amounts are still being discussed. The San Bruno event caused enact ion of new DOT regulations, requiring pipeline operators to verify the test and materials records associated with every mile of pipe.
 
The big data challenge associated with these initiatives lies in vast amounts of un-indexed or poorly indexed documents going back decades.  Physical records were created at the time of construction and during repairs, and have been collected and preserved in one of many ways:
  • Stored in boxes with other records; boxes have some short description of the contents stored in a database; each box holds an average of 2500 pages of content.
  • Scanned to microfilm, with or without a film index.
  • Scanned to an image (usually TIFF or PDF); common practice is to combine hundreds of documents and records into a single PDF, which is hundreds of pages long, with no index
  • Repair records are commonly separated from as-built records, and among the most difficult to find.
Unfortunately, locating the critical records to validate testing performed years ago and unearth materials design data requires a tremendous amount of research into the data.  This data is generally of poor quality due to its age and having undergone a scanning process yielding poor resolution.  Since the assets may have been bought and sold many times over several decades, little historical context is available as the companies and people who created the data are long gone, and the data has changed hands multiple times.
 
Tasking a team of people to manually review the data is an expensive and time consuming activity.  This approach is error prone, as the work is mind numbingly repetitive and tedious, and a poor use of precious engineering resources.  To alleviate the task of finding the needles in the haystack, the recommended approach is to leverage technology to interrogate the data on your behalf.  This includes such tasks as:
  • Analyzing box and file descriptions, mining key phrases from the database which are of higher interest and also fingerprinting boxes whose descriptions match a controlled vocabulary of phrases which indicate relevance.
  • Choosing the probable relevant boxes to sample based on the analysis, then scanning each box to a giant PDF. 
  • Splitting giant PDFs, from the box analysis and as collected by the corporation, into their original documents, and auto-classifying each document as belonging to a standard document type.
  • Being able to work with very poor quality data, and use predictive analytics to determine if the fragment of a work or phrase in a document title matches a target, and populate a database with correct document descriptions.
  • Using machine learning to teach the system how to recognize unique patterns of data belonging to a particular category.
  • Using auto-extraction techniques to populate critical data points into a database, such as materials data.

The benefits of using a technology-assisted process include:
  • Reducing the cost of locating critical records by at least 30%, due to reducing the amount of labor needed to complete work.
  • Reduce the risk of not finding critical records by using more powerful methods.
  • Creating a robust database containing high quality records and attributes which can be interrogated relative to the objectives.
  • Affirmatively demonstrating to government regulators the preemptive steps initiated to fully comply with prescribed policies and procedures.
  • Freeing up engineering resources to work on higher level tasks.
 

About Haystac
Haystac LLC is a software company specializing in solutions which leverage intelligent analytics to create structure out of unstructured data for use cases including information governance, due diligence readiness, and fraud detection. Headquartered in Boston MA, products include Haystac RetenGine, that processes data held in the enterprise environment, and Haystac Web, that processes data on the web. Both products leverage the use of machine learning and other predictive analytics methods and data extraction tools to improve the understanding and management of data.  Visit us at www.haystac.com or contact us at efritsch@haystac.com or 212 599 5349212 599 5349 (W) or 917 886 4145917 886 4145 (C).

Time: 
2013 - 09:18
Image: 
Feature on Home: 
Yes
Text Block Reference: 
Members Only: 
Public Summary: 

The United States has some of the largest and oldest industrial infrastructure in the world, including its natural gas, oil, electrical and communications pipelines and wires of which 85+% is controlled by private industry.  Placed into service at the beginning of the modern industrial era, much of this infrastructure is still in use, serving the needs of society. 


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images