User Story #860 (new)
Opened 16 years ago
Last modified 14 years ago
Add ServerErrorEvent subsystem for notification of internal errors
Reported by: | jamoore | Owned by: | jamoore |
---|---|---|---|
Priority: | critical | Milestone: | Unscheduled |
Component: | Services | Keywords: | errors, exceptions, logging, asynchrnous |
Cc: | atarkowska, cxallan, jburel, jrswedlow | Story Points: | n.a. |
Sprint: | n.a. | Importance: | n.a. |
Total Remaining Time: | 4.0d | Estimated Remaining Time: | n.a. |
Description (last modified by jmoore)
With more asynchronous logic in the server -- full text search processing, job processing, etc. -- it's difficult for server adminstrators to find problems when they only show up in the rather bloated logs.
All asynchronous processing subsystems should start raising a ServerErrorEvent in addition to logging an exception. The event can be handled by multiple listeners. E.g.:
- A simple LoggingServerErrorEventListener can write a special log file
- A EmailingServerErrorEventListener can send an email to a specified admin (emails are disabled if the configuration property is set to "", e.g. omero.servererror.email=
- A WebAdminServerErrorEventListener could pass the information on to the WebAdmin? console which administrators could check periodically.
Events which are of importance include:
- CorruptedFileServerError - When the sha1 of a Pixels or an OriginalFile do not match the value in the DB
- LuceneLockedServerError - some forms of exceptions can leave Lucene in a locked state, making search mostly unusable.
- NoJobProcessorServerError - if all jobs are failling/not being accepted, then JobHandler is essentially useless. The problem may be that all compute nodes are down.
Perhaps an "error level" can determine, for example, whether or not an email will be sent.
See:
- #1840 - notification needs to find new jobs and start processing
Change History (6)
comment:1 Changed 16 years ago by jmoore
- Milestone changed from Future to 3.0-Beta4
- Owner changed from josh to jmoore
- Priority changed from minor to critical
comment:2 Changed 15 years ago by jmoore
- Milestone changed from OMERO-Beta4 to OMERO-Beta4.1
comment:3 Changed 14 years ago by jmoore
- Cc jburel jrswedlow added
- Milestone changed from Unscheduled to OMERO-Beta4.2
Has not been clearly discussed with the team, but is on the 4.2 roadmap, so moving.
comment:4 Changed 14 years ago by jmoore
r6190 disables JobNotification (and an annoying exception in the log). This will need to get replaced by some other notification system.
comment:5 Changed 14 years ago by jmoore
- Description modified (diff)
comment:6 Changed 14 years ago by jmoore
- Milestone changed from OMERO-Beta4.2 to Unscheduled
Too much work for 4.0. Pushing.