Monitor

Monitor is a simple (substantially) Python software system which implements on-line monitoring of Process Control performance in multiple process plants. This is a subpackage of the pctools package.

Objectives

monitoring that actually works (experience with Matrikon ProcessDoctor has been poor)
keep it simple and maintainable by engineers (it must not compete with large vendor supplied systems)
can monitor multiple process plants
automatically analyses and diagnoses problems (as well as allowing ad hoc analysis)
doesn't overload networks or historians (by using these resources efficiently)

Approach

pure python except for support software such as databases etc.
use a single collector process (to eliminate multiple simultaneous "writers" and simplify data-storage requirements)
collects data from an existing on-site collector (e.g. historian or SCADA)
starts with a "use what data is there as best as possible" philosophy and develops the site data collection capability over time.
emphasis on automatically and correctly selecting the data to be analysed (Matrikon ProcessDoctor fails badly here and produces useless reports when the loop is not in-control for 100% of the day).

Structure

Collector

collects all data from external sources and stores them for later access
runs in a batch-wise mode, collecting a period of data from each site in sequence
retrieve small batches, but frequently - to spread the burden of network and historian load
needs a database of what sites, data-source, variables and frequency of collection are to be used.
need "drivers" for (say) PI Historian, Citect SCADA, etc. (do any of these implement the historical data functionality of OPC?)
robust design so that data is not lost (i.e. gets data acquisition that failed last time)

Data Storage

range of options: text files (compressed), numpy arrays/recarrays (chunk-size?), pyTables (meta-data!), pySQLite (server-less), SQL Server
archiving system?

Analyser

automatically runs analyses on the collected data, at a frequency that makes sense for the plants' loops
generates reports (ready for word-processor) and analysis results (stored for later followup analysis/diagnosis)
analyses:
- instability check on all analog tags
- specific (configured) loop analyses (see Csense and ProcessDoctor example files attached)

Diagnostics

automatically calculates or estimates (EKF) critical parameters and performance measures for:
- instrumentation (e.g. that weightometer mass-balances agree)
- process
- process control (what? oscillation index for whole plant areas?)
generates exception reports for parameters that have moved drastically or past a threshold (i.e. normal defined as within absolute bounds and within rate-of-change bounds).

Configurator

just dialogs UIs for a database? python dictionaries (imported from files)?

Ad Hoc

data must be easily accessible by people using Matlab etc. (preferably in a data file form rather than needing SQL statements etc.)
e.g. HDF5 file access is built into Matlab and is a library for Python (with pyTables or h5py) and has "browsers"
tools to make this access easier (e.g. show list of variable tags available from a site along with descriptions etc., get set of tags for specified period, ...)

User Interface

administrator (integrity of collection and storage processing)
monitoring
- What does it need? tree-charts, reports, ...?

Implementation Ideas

pyTables persistent storage (keep it simple!)
- we don't need to absolutely gaurantee data validity or retention (we can either go back and get it again later, or just eliminate it from our analysis) - does this mean we shouldn't use it for valuation purposes since throughput etc may not be perfectly valid? but neither is the original collected data (some gets lost, then most get corrupted by compression techniques, etc).
- use the pyTables hierarchy and metadata to implicitly structure the objects involved as well as store the configuration
- accessible through Matlab, Python and HDF5 browsers
- for a PID loop, store a branch/node containing:
  - configuration data for the loop (description, tag (probably the name of the node), ...) - either as separate leaves in the node or as a single config string (dictionary or executable python objects), in each case the config inherits from a base class and just updates the "unusual" attributes rather than listing all attributes
  - a single very-long table of the timeseries of raw data (optionally): TimeStamp, PV, CO, SP, FF, mode
  - a table of analysis outputs at a nominal period: TimeStamp, analysis1, analysis2 (recognising that each analysis column can contain a recursive hierarchy of data results, e.g. a list of values - could this include multiple arrays as well?)
- the PID fits into a hierarchy of branches
  - site
  - plant (or plant area)
  - tag/loop/object
Analysis objects (inheritance hierarchy?)
- loop
  - FF
  - FB
    - PID
- group (variables) - for cross-correlation?
report generation
- using python API for pdf generation (but can't put this in a word-processor!), or
- as a matplotlib figure, then save (automatically or otherwise) as any format you like
- SVG file?
if HDF5, need to decide on H5py or pyTables:
- H5py
  - more like underlying HDF5 standard - guarantees of being able to be read by other HDF5 software (e.g. HDF5 GUIs)
  - nice numpy array behaviour directly from the file (could reasonably do normal list comprehension here?)
  - thread-safe (but no info on locking for multiple process access) - probably don't need this (one writer only)
  - HDF5 documentation (which is very relevant) is comprehensive
- pyTables
  - some database-like selection schemes (if data collection is not of a fixed timestep, then selection based on time will be the primary access!)
  - popular
  - documentation?
  - peculiarities that corrupt transferability of data (but careful selection of storage format may fix this)
perhaps store all data from a plant in a single table
- segregating into (say) controller groups for object orientation doesn't make sense when wanting to do correlation analysis across all PVs (for example)
- in each PID "group", store the tag names of where to get the data from the main table + analysis results tables
- would need to "force" all tags to be recorded at the one rate (e.g. 5 seconds) regardless of origin rate.
- is lumping all of a plant's data into one table going to impact access speed?
only record "useful" mostly-analog tags
- estimate, for 1 plant, 200 loops (SP,PV,CO) + 400 PV-only --> ~1000 analog tags
- single precision float, 5 sec frequency, for 1 year --> 4 * 1000 * (12*60*24*365) = 25 GB (without compression)
- several of these plants can fit on a disk (e.g. 10 plants = 250GB, 50% compression: 20 plants = 250GB)
if above data retrieved every hour, and assuming float (4-byte) data --> 4 * 1000 * (12*60) = 3 MB per plant per hour

ToDo

How do we handle "high-speed" collection rates (for fast loops) when the on-site historian doesn't collect this fast?
- maybe do "burst" collection, bypassing the historian, but only collecting for a short period suitable for stability analysis of the fast loop. Would probably need to check status of loop before bothering to collect the data. Data would be collected for the tags of the loop only (e.g. 4 tags)

Files

[get | view] (2015-04-04 10:21:00, 232.0 KB) [[attachment:Csense_loopdiagnostics.pdf]]
[get | view] (2015-04-04 10:21:00, 254.3 KB) [[attachment:Matrikon_ProcessDoctor_loop_report.pdf]]
[get | view] (2015-04-04 10:21:00, 1.9 KB) [[attachment:PI_APIpythonInfo.txt]]
[get | view] (2015-04-04 10:21:00, 338.7 KB) [[attachment:old_PI-API_linking_python_with_Win32_dll.mht]]

Monitor

Objectives

Approach

Structure

Collector

Data Storage

Analyser

Diagnostics

Configurator

Ad Hoc

User Interface

Implementation Ideas

ToDo

Links

Files