Monitor is a simple (substantially) Python software system which implements on-line monitoring of Process Control performance in multiple process plants. This is a subpackage of the pctools package.
monitoring that actually works (experience with Matrikon ProcessDoctor has been poor)
- keep it simple and maintainable by engineers (it must not compete with large vendor supplied systems)
- can monitor multiple process plants
- automatically analyses and diagnoses problems (as well as allowing ad hoc analysis)
- doesn't overload networks or historians (by using these resources efficiently)
- pure python except for support software such as databases etc.
- use a single collector process (to eliminate multiple simultaneous "writers" and simplify data-storage requirements)
- collects data from an existing on-site collector (e.g. historian or SCADA)
- starts with a "use what data is there as best as possible" philosophy and develops the site data collection capability over time.
emphasis on automatically and correctly selecting the data to be analysed (Matrikon ProcessDoctor fails badly here and produces useless reports when the loop is not in-control for 100% of the day).
- collects all data from external sources and stores them for later access
- runs in a batch-wise mode, collecting a period of data from each site in sequence
- retrieve small batches, but frequently - to spread the burden of network and historian load
- needs a database of what sites, data-source, variables and frequency of collection are to be used.
- need "drivers" for (say) PI Historian, Citect SCADA, etc. (do any of these implement the historical data functionality of OPC?)
- robust design so that data is not lost (i.e. gets data acquisition that failed last time)
- range of options: text files (compressed), numpy arrays/recarrays (chunk-size?), pyTables (meta-data!), pySQLite (server-less), SQL Server
- archiving system?
- automatically runs analyses on the collected data, at a frequency that makes sense for the plants' loops
- generates reports (ready for word-processor) and analysis results (stored for later followup analysis/diagnosis)
- instability check on all analog tags
specific (configured) loop analyses (see Csense and ProcessDoctor example files attached)
- automatically calculates or estimates (EKF) critical parameters and performance measures for:
- instrumentation (e.g. that weightometer mass-balances agree)
- process control (what? oscillation index for whole plant areas?)
- generates exception reports for parameters that have moved drastically or past a threshold (i.e. normal defined as within absolute bounds and within rate-of-change bounds).
- just dialogs UIs for a database? python dictionaries (imported from files)?
- data must be easily accessible by people using Matlab etc. (preferably in a data file form rather than needing SQL statements etc.)
- e.g. HDF5 file access is built into Matlab and is a library for Python (with pyTables or h5py) and has "browsers"
- tools to make this access easier (e.g. show list of variable tags available from a site along with descriptions etc., get set of tags for specified period, ...)
- administrator (integrity of collection and storage processing)
- What does it need? tree-charts, reports, ...?
- pyTables persistent storage (keep it simple!)
- we don't need to absolutely gaurantee data validity or retention (we can either go back and get it again later, or just eliminate it from our analysis) - does this mean we shouldn't use it for valuation purposes since throughput etc may not be perfectly valid? but neither is the original collected data (some gets lost, then most get corrupted by compression techniques, etc).
- use the pyTables hierarchy and metadata to implicitly structure the objects involved as well as store the configuration
- accessible through Matlab, Python and HDF5 browsers
- for a PID loop, store a branch/node containing:
- configuration data for the loop (description, tag (probably the name of the node), ...) - either as separate leaves in the node or as a single config string (dictionary or executable python objects), in each case the config inherits from a base class and just updates the "unusual" attributes rather than listing all attributes
a single very-long table of the timeseries of raw data (optionally): TimeStamp, PV, CO, SP, FF, mode
a table of analysis outputs at a nominal period: TimeStamp, analysis1, analysis2 (recognising that each analysis column can contain a recursive hierarchy of data results, e.g. a list of values - could this include multiple arrays as well?)
- the PID fits into a hierarchy of branches
- plant (or plant area)
- Analysis objects (inheritance hierarchy?)
- group (variables) - for cross-correlation?
- report generation
- using python API for pdf generation (but can't put this in a word-processor!), or
- as a matplotlib figure, then save (automatically or otherwise) as any format you like
- SVG file?
- if HDF5, need to decide on H5py or pyTables:
- more like underlying HDF5 standard - guarantees of being able to be read by other HDF5 software (e.g. HDF5 GUIs)
- nice numpy array behaviour directly from the file (could reasonably do normal list comprehension here?)
- thread-safe (but no info on locking for multiple process access) - probably don't need this (one writer only)
- HDF5 documentation (which is very relevant) is comprehensive
- some database-like selection schemes (if data collection is not of a fixed timestep, then selection based on time will be the primary access!)
- peculiarities that corrupt transferability of data (but careful selection of storage format may fix this)
- perhaps store all data from a plant in a single table
- segregating into (say) controller groups for object orientation doesn't make sense when wanting to do correlation analysis across all PVs (for example)
- in each PID "group", store the tag names of where to get the data from the main table + analysis results tables
- would need to "force" all tags to be recorded at the one rate (e.g. 5 seconds) regardless of origin rate.
- is lumping all of a plant's data into one table going to impact access speed?
- only record "useful" mostly-analog tags
estimate, for 1 plant, 200 loops (SP,PV,CO) + 400 PV-only --> ~1000 analog tags
single precision float, 5 sec frequency, for 1 year --> 4 * 1000 * (12*60*24*365) = 25 GB (without compression)
- several of these plants can fit on a disk (e.g. 10 plants = 250GB, 50% compression: 20 plants = 250GB)
if above data retrieved every hour, and assuming float (4-byte) data --> 4 * 1000 * (12*60) = 3 MB per plant per hour
- How do we handle "high-speed" collection rates (for fast loops) when the on-site historian doesn't collect this fast?
- maybe do "burst" collection, bypassing the historian, but only collecting for a short period suitable for stability analysis of the fast loop. Would probably need to check status of loop before bothering to collect the data. Data would be collected for the tags of the loop only (e.g. 4 tags)
- note: PI documentation contains Visual Basic examples, so translating these to Python is useful
http://www.boddie.org.uk/python/COM.html - help with using COM from python (e.g. for PI-SDK)
http://oreilly.com/catalog/9781565926219/ - Python Programming On Win32 (book) containing lots of info on COM programming, DB, etc.
http://www.boddie.org.uk/python/COM.html - tutorial on COM using Outlook as an example
http://www.ecp.cc/pyado.html - Python and ActiveX Data Objects (ADO)
- [get | view] (2015-04-04 10:21:00, 232.0 KB) [[attachment:Csense_loopdiagnostics.pdf]]
- [get | view] (2015-04-04 10:21:00, 254.3 KB) [[attachment:Matrikon_ProcessDoctor_loop_report.pdf]]
- [get | view] (2015-04-04 10:21:00, 1.9 KB) [[attachment:PI_APIpythonInfo.txt]]
- [get | view] (2015-04-04 10:21:00, 338.7 KB) [[attachment:old_PI-API_linking_python_with_Win32_dll.mht]]