Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Bug #1088 (closed)

Opened 11 years ago

Closed 11 years ago

Slow query

Reported by: atarkowska Owned by: jamoore
Priority: critical Cc: cxallan
Sprint: n.a.
Total Remaining Time: n.a.

Description

This query is very slow:

[select el from EventLog el left outer join fetch el.event ev where
el.entityType in ('ome.model.core.Pixels', 'ome.model.core.Image',
'ome.model.containers.Dataset',
el.entityType='ome.model.containers.Project') and
ev.experimenter='100' and ev.time > '2008-09-01 00:00:00.000' and
ev.time < '2008-09-30 23:59:59.999', null]

Change History (5)

comment:1 Changed 11 years ago by atarkowska

  • Priority changed from minor to critical

Java exaple:

Login l = new Login("test", "ome");
Server s = new Server("localhost", 1099);
ServiceFactory sf = new ServiceFactory(s, l);

IAdmin ia = sf.getAdminService();
IQuery iq = sf.getQueryService();
EventContext ctx = ia.getEventContext();
System.out.println(ctx.getCurrentSessionUuid());
Long id = ctx.getCurrentUserId();

Timestamp start = new Timestamp(1218529874000L); //"2008-09-01 00:00:00.000");

Timestamp end = new Timestamp(1221121874000L);//"2008-09-30 23:59:59.999");

QueryParameter qp1 = new QueryParameter("start", Timestamp.class, start);
QueryParameter qp2 = new QueryParameter("end", Timestamp.class, end);
QueryParameter uid = new QueryParameter("uid", Long.class, id);
Parameters p = new Parameters();
p.add(qp1);
p.add(qp2);
p.add(uid);

String sql = "select el from EventLog el left outer join fetch el.event ev " +
      "where el.entityType in ('ome.model.core.Pixels', 'ome.model.core.Image', " +
      "'ome.model.containers.Dataset', el.entityType='ome.model.containers.Project') " +
      "and ev.experimenter.id=:uid and ev.time > :start and ev.time < :end";

System.out.println(iq.findAllByQuery(sql, p).size());

Python example:

c = omero.client(['--Ice.Config='+ICE_CONFIG])
c.createSession("test", "ome")

q = c.sf.getQueryService()
admin = c.sf.getAdminService()
cx = admin.getEventContext()
print cx

p = omero.sys.Parameters()
p.map = {}
p.map["uid"] = omero.RLong(cx.userId)
p.map['start'] = start = omero.RTime(1218529874000)
p.map['end'] = end = omero.RTime(1221121874000)

sql = "select el from EventLog el left outer join fetch el.event ev " \
      "where el.entityType in ('ome.model.core.Pixels', 'ome.model.core.Image', " \
      "'ome.model.containers.Dataset', el.entityType='ome.model.containers.Project') " \
      "and ev.experimenter.id=:uid and ev.time > :start and ev.time < :end"
print len(q.findAllByQuery(sql, p))

comment:2 Changed 11 years ago by atarkowska

Both are slow on mage.

comment:3 Changed 11 years ago by atarkowska

comment:4 Changed 11 years ago by atarkowska

  • Cc callan added; jmoore removed
  • Owner changed from atarkowska to jmoore

comment:5 Changed 11 years ago by jmoore

  • Milestone changed from 3.0-Beta4 to 3.0-Beta3.1
  • Resolution set to fixed
  • Status changed from new to closed

See r2803 for how to rewrite this query for more performance.


The SQL for this query is:

   select
        eventlog0_.id as id125_0_,
        event1_.id as id33_1_,
        eventlog0_.action as action125_0_,
        eventlog0_.external_id as external6_125_0_,
        eventlog0_.permissions as permissi3_125_0_,
        eventlog0_.entityId as entityId125_0_,
        eventlog0_.entityType as entityType125_0_,
        eventlog0_.event as event125_0_,
        event1_.containingEvent as containi5_33_1_,
        event1_.external_id as external6_33_1_,
        event1_.permissions as permissi2_33_1_,
        event1_.experimenter as experime7_33_1_,
        event1_.experimenterGroup as experime8_33_1_,
        event1_.session as session33_1_,
        event1_.status as status33_1_,
        event1_.time as time33_1_,
        event1_.type as type33_1_ 
    from
        eventlog eventlog0_ 
    left outer join
        event event1_ 
            on eventlog0_.event=event1_.id 
    where
        ( 
            eventlog0_.entityType in (
                'ome.model.core.Pixels' , 'ome.model.core.Image' , 'ome.model.containers.Dataset'
            )
        ) 
        and event1_.experimenter=0
        and event1_.time> '2008-08-12 10:31:14'
        and event1_.time< '2008-09-11 10:31:14';

which you can generate by adding:

hibernate.show_sql=true
hibernate.format_sql=true

to dist/etc/hibernate.properties. If you also want the bind parameters, add:

  <category name="org.hibernate.type">
    <priority value="DEBUG"/>
  </category>

to dist/etc/log4j.xml.

This query is EXPLAIN ANALYZE'd by postgres as:

$ psql mage < hib-el.sql 
                                                                                QUERY PLAN                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..2424.66 rows=1 width=547) (actual time=197.989..192080.686 rows=1585 loops=1)
   Join Filter: (eventlog0_.event = event1_.id)
   ->  Seq Scan on event event1_  (cost=0.00..650.20 rows=1 width=217) (actual time=0.049..24.081 rows=3939 loops=1)
         Filter: ((experimenter = 0) AND ("time" > '2008-08-12 10:31:14'::timestamp without time zone) AND ("time" < '2008-09-11 10:31:14'::timestamp without time zone))
   ->  Seq Scan on eventlog eventlog0_  (cost=0.00..1763.25 rows=897 width=330) (actual time=0.025..47.939 rows=2514 loops=3939)
         Filter: ((entitytype)::text = ANY (('{ome.model.core.Pixels,ome.model.core.Image,ome.model.containers.Dataset}'::character varying[])::text[]))
 Total runtime: 192082.860 ms
(7 rows)

which shows two sequential scans on two of our biggest tables: event and eventlog, causing the query to take almost 200 seconds.

If we rewrite this with a subselect as:

EXPLAIN ANALYZE
   select
        eventlog0_.id as id125_0_,
        event1_.id as id33_1_,
        eventlog0_.action as action125_0_,
        eventlog0_.external_id as external6_125_0_,
        eventlog0_.permissions as permissi3_125_0_,
        eventlog0_.entityId as entityId125_0_,
        eventlog0_.entityType as entityType125_0_,
        eventlog0_.event as event125_0_,
        event1_.containingEvent as containi5_33_1_,
        event1_.external_id as external6_33_1_,
        event1_.permissions as permissi2_33_1_,
        event1_.experimenter as experime7_33_1_,
        event1_.experimenterGroup as experime8_33_1_,
        event1_.session as session33_1_,
        event1_.status as status33_1_,
        event1_.time as time33_1_,
        event1_.type as type33_1_ 
    from
        eventlog eventlog0_ , 
        event event1_ 
    where
        eventlog0_.event = event1_.id and
        ( 
            eventlog0_.entityType in (
                'ome.model.core.Pixels' , 'ome.model.core.Image' , 'ome.model.containers.Dataset'
            )
        ) and
        event1_.id in (select id from event where
            experimenter=0
            and time> '2008-08-12 10:31:14'
            and time< '2008-09-11 10:31:14');

then the EXPLAIN ANALYZE output is:

$ psql mage < sub-el.sql 
                                                                                         QUERY PLAN                                                                                         
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=632.65..2405.30 rows=1 width=547) (actual time=17.575..68.536 rows=1585 loops=1)
   ->  Hash Join  (cost=632.65..2397.02 rows=1 width=338) (actual time=17.476..60.453 rows=1585 loops=1)
         Hash Cond: (eventlog0_.event = event.id)
         ->  Seq Scan on eventlog eventlog0_  (cost=0.00..1761.00 rows=896 width=330) (actual time=0.074..41.316 rows=2474 loops=1)
               Filter: ((entitytype)::text = ANY (('{ome.model.core.Pixels,ome.model.core.Image,ome.model.containers.Dataset}'::character varying[])::text[]))
         ->  Hash  (cost=632.64..632.64 rows=1 width=8) (actual time=17.372..17.372 rows=3939 loops=1)
               ->  HashAggregate  (cost=632.63..632.64 rows=1 width=8) (actual time=14.422..15.436 rows=3939 loops=1)
                     ->  Seq Scan on event  (cost=0.00..632.62 rows=1 width=8) (actual time=0.043..10.068 rows=3939 loops=1)
                           Filter: ((experimenter = 0) AND ("time" > '2008-08-12 10:31:14'::timestamp without time zone) AND ("time" < '2008-09-11 10:31:14'::timestamp without time zone))
   ->  Index Scan using event_pkey on event event1_  (cost=0.00..8.27 rows=1 width=217) (actual time=0.003..0.004 rows=1 loops=1585)
         Index Cond: (eventlog0_.event = event1_.id)
 Total runtime: 69.274 ms
(12 rows)

or > 2700x faster.

In Python/HQL, this looks like:

        sql1 = "select el from EventLog el left outer join fetch el.event ev " \
               "where el.entityType in ('ome.model.core.Pixels', 'ome.model.core.Image', " \
               "'ome.model.containers.Dataset', 'ome.model.containers.Project') " \
               "and ev.id in (select id from Event where experimenter.id=:uid and time > :start and time < :end)"


        sql2 = "select el from EventLog el left outer join fetch el.event ev " \
               "where el.entityType in ('ome.model.core.Pixels', 'ome.model.core.Image', " \
               "'ome.model.containers.Dataset', 'ome.model.containers.Project') " \
               "and ev.experimenter.id=:uid and ev.time > :start and ev.time < :end"

        import time
        sql1_start = time.time()
        l = q.findAllByQuery(sql1, p)
        sql1_stop = time.time()
        print "\nSQL1: %s objects in %s seconds" % (str(len(l)), str(sql1_stop - sql1_start))

        sql2_start = time.time()
        l = q.findAllByQuery(sql2, p)
        sql2_stop = time.time()
        print "SQL2: %s objects in %s seconds\n" % (str(len(l)), str(sql2_stop - sql2_start))

with output of:

SQL1: 1592 objects in 1.52403688431 seconds
SQL2: 1592 objects in 193.370220184 seconds

which is considerably less of a speed up due to the Hibernate and network overhead, but still significant.

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.76176 sec.)

We're Hiring!