Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #9330 (closed)

Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

Bug: stateless service memory leak

Reported by: jamoore Owned by: jamoore
Priority: blocker Milestone: OMERO-4.4
Component: Performance Version: n.a.
Keywords: n.a. Cc: omero-team@…
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: 0.0d
Sprint: 2012-07-17 (19)

Description

Chris and Carlos are seeing *Tie instances persist in the server even after the session is closed client side.

Attachments (4)

ss-9330-jvisualvm.png (108.3 KB) - added by jmoore 7 years ago.
jvisualvm screenshot after running GC.
ss-9330-jvisualvm-parallelgc.png (173.1 KB) - added by jmoore 7 years ago.
Same problem with ParallelGC
ss-9330-jvisualvm-concsweep.png (178.3 KB) - added by jmoore 7 years ago.
same problem with UseConcMarkSweepGC
ss-9330-jvisualvm-post-fix.png (136.3 KB) - added by jmoore 7 years ago.
regular GC with the fix in place

Download all attachments as: .zip

Change History (10)

comment:1 Changed 7 years ago by jmoore

This is looking less like a memory leak and more like a garbage collection DoS. Using the following script, I can keep the server growing (in terms of numbers of *Tie elements). As soon as the script is cancelled, I can run GC and the count drops back down to essentially zero.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
sys.path.insert(0,"lib/python")
import omero
import threading


class Counter(object):
    def __init__(self):
        self.count = 0
        self.lock = threading.RLock()
    def incr(self):
        self.lock.acquire()
        self.count += 1
        self.lock.release()
    def get(self):
        self.lock.acquire()
        count = self.count
        self.lock.release()
        return count

count = Counter()
e = threading.Event()
c = omero.client("localhost")
try:
    c.createSession("root","ome")
    id = c.getSessionId()

    class T(threading.Thread):
        def run(self):
            while not e.isSet():
                count.incr()
                try:
                    j = omero.client("localhost")
                    j.joinSession(id)
                    j.sf.getShareService()
                finally:
                    j.closeSession()
                    copy = count.get()
                    if copy % 100 == 0:
                        print copy

    threads = [T() for x in range(10)]
    for t in threads:
        t.start()

    try:
        print "Enter any key to exit"
        sys.stdin.read()
    finally:
        e.set()
        for t in threads:
            t.join()

finally:
    c.closeSession()

comment:2 Changed 7 years ago by jmoore

When the script has printed out 39K, there are 39222 ShareTie instances in the server:

$ jmap -histo 59088 | grep Tie
  54:         39322         629152  omero.api._IShareTie
 615:            40            640  omero.api._ServiceFactoryTie
2011:             1             16  omero.api._IQueryTie
2387:             1             16  omero.grid._SharedResourcesTie
2502:             1             16  omero.api._IConfigTie

comment:3 Changed 7 years ago by jmoore

The instances don't decrease until a GC is forced from jvisualvm:

$ jmap -histo 59088 | grep Tie
1560:             2             32  omero.api._ServiceFactoryTie
2206:             1             16  omero.grid._SharedResourcesTie
2271:             1             16  omero.api._IConfigTie
2380:             1             16  omero.api._IQueryTie

Changed 7 years ago by jmoore

jvisualvm screenshot after running GC.

comment:4 Changed 7 years ago by jmoore

So, it does look now like ServiceFactoryI are not being properly cleaned, and since they aren't cleaned the services aren't being removed. Currently working on a refactoring.

Changed 7 years ago by jmoore

Same problem with ParallelGC

Changed 7 years ago by jmoore

same problem with UseConcMarkSweepGC

Changed 7 years ago by jmoore

regular GC with the fix in place

comment:5 Changed 7 years ago by jmoore

  • Resolution set to fixed
  • Status changed from new to closed

Did find a leak wrt sf.destroy(). Fixed and in today's build:

commit 200dd334714e248335ffe6eddfc518e160605a5c
Author: jmoore <josh@glencoesoftware.com>
Date:   Wed Jul 11 14:41:23 2012

    Fix memory leak on joinSession (Fix #9330)
    
    Since a Tie is created per every ServiceFactoryI,
    we need to make sure that on destroy() all stateless
    services are properly closed.
    
    High-level changes made:
    
     * SessionManagerI now is responsible for ServantHolder creation
     * ServantHolders are responsible for tracking clientIds
     * SessionIs (incl.SF) now cleanup stateless services on destroy

comment:6 Changed 7 years ago by jmoore <josh@…>

  • Remaining Time set to 0

(In [200dd334714e248335ffe6eddfc518e160605a5c/ome.git] on branch develop) Fix memory leak on joinSession (Fix #9330)

Since a Tie is created per every ServiceFactoryI,
we need to make sure that on destroy() all stateless
services are properly closed.

High-level changes made:

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.81579 sec.)

We're Hiring!