Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #9330 (closed)

Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

Bug: stateless service memory leak

Reported by: jamoore Owned by: jamoore
Priority: blocker Milestone: OMERO-4.4
Component: Performance Version: n.a.
Keywords: n.a. Cc: omero-team@…
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: 0.0d
Sprint: 2012-07-17 (19)

Description

Chris and Carlos are seeing *Tie instances persist in the server even after the session is closed client side.

Attachments (4)

ss-9330-jvisualvm.png (108.3 KB) - added by jmoore 12 years ago.
jvisualvm screenshot after running GC.
ss-9330-jvisualvm-parallelgc.png (173.1 KB) - added by jmoore 12 years ago.
Same problem with ParallelGC
ss-9330-jvisualvm-concsweep.png (178.3 KB) - added by jmoore 12 years ago.
same problem with UseConcMarkSweepGC
ss-9330-jvisualvm-post-fix.png (136.3 KB) - added by jmoore 12 years ago.
regular GC with the fix in place

Download all attachments as: .zip

Change History (10)

comment:1 Changed 12 years ago by jmoore

This is looking less like a memory leak and more like a garbage collection DoS. Using the following script, I can keep the server growing (in terms of numbers of *Tie elements). As soon as the script is cancelled, I can run GC and the count drops back down to essentially zero.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
sys.path.insert(0,"lib/python")
import omero
import threading


class Counter(object):
    def __init__(self):
        self.count = 0
        self.lock = threading.RLock()
    def incr(self):
        self.lock.acquire()
        self.count += 1
        self.lock.release()
    def get(self):
        self.lock.acquire()
        count = self.count
        self.lock.release()
        return count

count = Counter()
e = threading.Event()
c = omero.client("localhost")
try:
    c.createSession("root","ome")
    id = c.getSessionId()

    class T(threading.Thread):
        def run(self):
            while not e.isSet():
                count.incr()
                try:
                    j = omero.client("localhost")
                    j.joinSession(id)
                    j.sf.getShareService()
                finally:
                    j.closeSession()
                    copy = count.get()
                    if copy % 100 == 0:
                        print copy

    threads = [T() for x in range(10)]
    for t in threads:
        t.start()

    try:
        print "Enter any key to exit"
        sys.stdin.read()
    finally:
        e.set()
        for t in threads:
            t.join()

finally:
    c.closeSession()

comment:2 Changed 12 years ago by jmoore

When the script has printed out 39K, there are 39222 ShareTie instances in the server:

$ jmap -histo 59088 | grep Tie
  54:         39322         629152  omero.api._IShareTie
 615:            40            640  omero.api._ServiceFactoryTie
2011:             1             16  omero.api._IQueryTie
2387:             1             16  omero.grid._SharedResourcesTie
2502:             1             16  omero.api._IConfigTie

comment:3 Changed 12 years ago by jmoore

The instances don't decrease until a GC is forced from jvisualvm:

$ jmap -histo 59088 | grep Tie
1560:             2             32  omero.api._ServiceFactoryTie
2206:             1             16  omero.grid._SharedResourcesTie
2271:             1             16  omero.api._IConfigTie
2380:             1             16  omero.api._IQueryTie

Changed 12 years ago by jmoore

jvisualvm screenshot after running GC.

comment:4 Changed 12 years ago by jmoore

So, it does look now like ServiceFactoryI are not being properly cleaned, and since they aren't cleaned the services aren't being removed. Currently working on a refactoring.

Changed 12 years ago by jmoore

Same problem with ParallelGC

Changed 12 years ago by jmoore

same problem with UseConcMarkSweepGC

Changed 12 years ago by jmoore

regular GC with the fix in place

comment:5 Changed 12 years ago by jmoore

  • Resolution set to fixed
  • Status changed from new to closed

Did find a leak wrt sf.destroy(). Fixed and in today's build:

commit 200dd334714e248335ffe6eddfc518e160605a5c
Author: jmoore <josh@glencoesoftware.com>
Date:   Wed Jul 11 14:41:23 2012

    Fix memory leak on joinSession (Fix #9330)
    
    Since a Tie is created per every ServiceFactoryI,
    we need to make sure that on destroy() all stateless
    services are properly closed.
    
    High-level changes made:
    
     * SessionManagerI now is responsible for ServantHolder creation
     * ServantHolders are responsible for tracking clientIds
     * SessionIs (incl.SF) now cleanup stateless services on destroy

comment:6 Changed 12 years ago by jmoore <josh@…>

  • Remaining Time set to 0

(In [200dd334714e248335ffe6eddfc518e160605a5c/ome.git] on branch develop) Fix memory leak on joinSession (Fix #9330)

Since a Tie is created per every ServiceFactoryI,
we need to make sure that on destroy() all stateless
services are properly closed.

High-level changes made:

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.65616 sec.)

We're Hiring!