Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

User Story #860 (new)

Opened 13 years ago

Last modified 10 years ago

Add ServerErrorEvent subsystem for notification of internal errors

Reported by: jamoore Owned by: jamoore
Priority: critical Milestone: Unscheduled
Component: Services Keywords: errors, exceptions, logging, asynchrnous
Cc: atarkowska, cxallan, jburel, jrswedlow Story Points: n.a.
Sprint: n.a. Importance: n.a.
Total Remaining Time: 4.0d Estimated Remaining Time: n.a.

Description (last modified by jmoore)

With more asynchronous logic in the server -- full text search processing, job processing, etc. -- it's difficult for server adminstrators to find problems when they only show up in the rather bloated logs.

All asynchronous processing subsystems should start raising a ServerErrorEvent in addition to logging an exception. The event can be handled by multiple listeners. E.g.:

  • A simple LoggingServerErrorEventListener can write a special log file
  • A EmailingServerErrorEventListener can send an email to a specified admin (emails are disabled if the configuration property is set to "", e.g. omero.servererror.email=
  • A WebAdminServerErrorEventListener could pass the information on to the WebAdmin? console which administrators could check periodically.

Events which are of importance include:

  • CorruptedFileServerError - When the sha1 of a Pixels or an OriginalFile do not match the value in the DB
  • LuceneLockedServerError - some forms of exceptions can leave Lucene in a locked state, making search mostly unusable.
  • NoJobProcessorServerError - if all jobs are failling/not being accepted, then JobHandler is essentially useless. The problem may be that all compute nodes are down.

Perhaps an "error level" can determine, for example, whether or not an email will be sent.

See:

  • #1840 - notification needs to find new jobs and start processing

Change History (6)

comment:1 Changed 13 years ago by jmoore

  • Milestone changed from Future to 3.0-Beta4
  • Owner changed from josh to jmoore
  • Priority changed from minor to critical

comment:2 Changed 12 years ago by jmoore

  • Milestone changed from OMERO-Beta4 to OMERO-Beta4.1

Too much work for 4.0. Pushing.

comment:3 Changed 11 years ago by jmoore

  • Cc jburel jrswedlow added
  • Milestone changed from Unscheduled to OMERO-Beta4.2

Has not been clearly discussed with the team, but is on the 4.2 roadmap, so moving.

comment:4 Changed 10 years ago by jmoore

r6190 disables JobNotification (and an annoying exception in the log). This will need to get replaced by some other notification system.

comment:5 Changed 10 years ago by jmoore

  • Description modified (diff)

comment:6 Changed 10 years ago by jmoore

  • Milestone changed from OMERO-Beta4.2 to Unscheduled
Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.180210 sec.)

We're Hiring!