Bug #1088 (closed)
Opened 11 years ago
Closed 11 years ago
Slow query
| Reported by: | atarkowska | Owned by: | jamoore |
|---|---|---|---|
| Priority: | critical | Cc: | cxallan |
| Sprint: | n.a. | ||
| Total Remaining Time: | n.a. |
Description
This query is very slow:
[select el from EventLog el left outer join fetch el.event ev where
el.entityType in ('ome.model.core.Pixels', 'ome.model.core.Image',
'ome.model.containers.Dataset',
el.entityType='ome.model.containers.Project') and
ev.experimenter='100' and ev.time > '2008-09-01 00:00:00.000' and
ev.time < '2008-09-30 23:59:59.999', null]
Change History (5)
comment:1 Changed 11 years ago by atarkowska
- Priority changed from minor to critical
comment:2 Changed 11 years ago by atarkowska
Both are slow on mage.
comment:3 Changed 11 years ago by atarkowska
comment:4 Changed 11 years ago by atarkowska
- Cc callan added; jmoore removed
- Owner changed from atarkowska to jmoore
comment:5 Changed 11 years ago by jmoore
- Milestone changed from 3.0-Beta4 to 3.0-Beta3.1
- Resolution set to fixed
- Status changed from new to closed
See r2803 for how to rewrite this query for more performance.
The SQL for this query is:
select
eventlog0_.id as id125_0_,
event1_.id as id33_1_,
eventlog0_.action as action125_0_,
eventlog0_.external_id as external6_125_0_,
eventlog0_.permissions as permissi3_125_0_,
eventlog0_.entityId as entityId125_0_,
eventlog0_.entityType as entityType125_0_,
eventlog0_.event as event125_0_,
event1_.containingEvent as containi5_33_1_,
event1_.external_id as external6_33_1_,
event1_.permissions as permissi2_33_1_,
event1_.experimenter as experime7_33_1_,
event1_.experimenterGroup as experime8_33_1_,
event1_.session as session33_1_,
event1_.status as status33_1_,
event1_.time as time33_1_,
event1_.type as type33_1_
from
eventlog eventlog0_
left outer join
event event1_
on eventlog0_.event=event1_.id
where
(
eventlog0_.entityType in (
'ome.model.core.Pixels' , 'ome.model.core.Image' , 'ome.model.containers.Dataset'
)
)
and event1_.experimenter=0
and event1_.time> '2008-08-12 10:31:14'
and event1_.time< '2008-09-11 10:31:14';
which you can generate by adding:
hibernate.show_sql=true hibernate.format_sql=true
to dist/etc/hibernate.properties. If you also want the bind parameters, add:
<category name="org.hibernate.type">
<priority value="DEBUG"/>
</category>
to dist/etc/log4j.xml.
This query is EXPLAIN ANALYZE'd by postgres as:
$ psql mage < hib-el.sql
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2424.66 rows=1 width=547) (actual time=197.989..192080.686 rows=1585 loops=1)
Join Filter: (eventlog0_.event = event1_.id)
-> Seq Scan on event event1_ (cost=0.00..650.20 rows=1 width=217) (actual time=0.049..24.081 rows=3939 loops=1)
Filter: ((experimenter = 0) AND ("time" > '2008-08-12 10:31:14'::timestamp without time zone) AND ("time" < '2008-09-11 10:31:14'::timestamp without time zone))
-> Seq Scan on eventlog eventlog0_ (cost=0.00..1763.25 rows=897 width=330) (actual time=0.025..47.939 rows=2514 loops=3939)
Filter: ((entitytype)::text = ANY (('{ome.model.core.Pixels,ome.model.core.Image,ome.model.containers.Dataset}'::character varying[])::text[]))
Total runtime: 192082.860 ms
(7 rows)
which shows two sequential scans on two of our biggest tables: event and eventlog, causing the query to take almost 200 seconds.
If we rewrite this with a subselect as:
EXPLAIN ANALYZE
select
eventlog0_.id as id125_0_,
event1_.id as id33_1_,
eventlog0_.action as action125_0_,
eventlog0_.external_id as external6_125_0_,
eventlog0_.permissions as permissi3_125_0_,
eventlog0_.entityId as entityId125_0_,
eventlog0_.entityType as entityType125_0_,
eventlog0_.event as event125_0_,
event1_.containingEvent as containi5_33_1_,
event1_.external_id as external6_33_1_,
event1_.permissions as permissi2_33_1_,
event1_.experimenter as experime7_33_1_,
event1_.experimenterGroup as experime8_33_1_,
event1_.session as session33_1_,
event1_.status as status33_1_,
event1_.time as time33_1_,
event1_.type as type33_1_
from
eventlog eventlog0_ ,
event event1_
where
eventlog0_.event = event1_.id and
(
eventlog0_.entityType in (
'ome.model.core.Pixels' , 'ome.model.core.Image' , 'ome.model.containers.Dataset'
)
) and
event1_.id in (select id from event where
experimenter=0
and time> '2008-08-12 10:31:14'
and time< '2008-09-11 10:31:14');
then the EXPLAIN ANALYZE output is:
$ psql mage < sub-el.sql
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=632.65..2405.30 rows=1 width=547) (actual time=17.575..68.536 rows=1585 loops=1)
-> Hash Join (cost=632.65..2397.02 rows=1 width=338) (actual time=17.476..60.453 rows=1585 loops=1)
Hash Cond: (eventlog0_.event = event.id)
-> Seq Scan on eventlog eventlog0_ (cost=0.00..1761.00 rows=896 width=330) (actual time=0.074..41.316 rows=2474 loops=1)
Filter: ((entitytype)::text = ANY (('{ome.model.core.Pixels,ome.model.core.Image,ome.model.containers.Dataset}'::character varying[])::text[]))
-> Hash (cost=632.64..632.64 rows=1 width=8) (actual time=17.372..17.372 rows=3939 loops=1)
-> HashAggregate (cost=632.63..632.64 rows=1 width=8) (actual time=14.422..15.436 rows=3939 loops=1)
-> Seq Scan on event (cost=0.00..632.62 rows=1 width=8) (actual time=0.043..10.068 rows=3939 loops=1)
Filter: ((experimenter = 0) AND ("time" > '2008-08-12 10:31:14'::timestamp without time zone) AND ("time" < '2008-09-11 10:31:14'::timestamp without time zone))
-> Index Scan using event_pkey on event event1_ (cost=0.00..8.27 rows=1 width=217) (actual time=0.003..0.004 rows=1 loops=1585)
Index Cond: (eventlog0_.event = event1_.id)
Total runtime: 69.274 ms
(12 rows)
or > 2700x faster.
In Python/HQL, this looks like:
sql1 = "select el from EventLog el left outer join fetch el.event ev " \
"where el.entityType in ('ome.model.core.Pixels', 'ome.model.core.Image', " \
"'ome.model.containers.Dataset', 'ome.model.containers.Project') " \
"and ev.id in (select id from Event where experimenter.id=:uid and time > :start and time < :end)"
sql2 = "select el from EventLog el left outer join fetch el.event ev " \
"where el.entityType in ('ome.model.core.Pixels', 'ome.model.core.Image', " \
"'ome.model.containers.Dataset', 'ome.model.containers.Project') " \
"and ev.experimenter.id=:uid and ev.time > :start and ev.time < :end"
import time
sql1_start = time.time()
l = q.findAllByQuery(sql1, p)
sql1_stop = time.time()
print "\nSQL1: %s objects in %s seconds" % (str(len(l)), str(sql1_stop - sql1_start))
sql2_start = time.time()
l = q.findAllByQuery(sql2, p)
sql2_stop = time.time()
print "SQL2: %s objects in %s seconds\n" % (str(len(l)), str(sql2_stop - sql2_start))
with output of:
SQL1: 1592 objects in 1.52403688431 seconds SQL2: 1592 objects in 193.370220184 seconds
which is considerably less of a speed up due to the Hibernate and network overhead, but still significant.
Java exaple:
Login l = new Login("test", "ome"); Server s = new Server("localhost", 1099); ServiceFactory sf = new ServiceFactory(s, l); IAdmin ia = sf.getAdminService(); IQuery iq = sf.getQueryService(); EventContext ctx = ia.getEventContext(); System.out.println(ctx.getCurrentSessionUuid()); Long id = ctx.getCurrentUserId(); Timestamp start = new Timestamp(1218529874000L); //"2008-09-01 00:00:00.000"); Timestamp end = new Timestamp(1221121874000L);//"2008-09-30 23:59:59.999"); QueryParameter qp1 = new QueryParameter("start", Timestamp.class, start); QueryParameter qp2 = new QueryParameter("end", Timestamp.class, end); QueryParameter uid = new QueryParameter("uid", Long.class, id); Parameters p = new Parameters(); p.add(qp1); p.add(qp2); p.add(uid); String sql = "select el from EventLog el left outer join fetch el.event ev " + "where el.entityType in ('ome.model.core.Pixels', 'ome.model.core.Image', " + "'ome.model.containers.Dataset', el.entityType='ome.model.containers.Project') " + "and ev.experimenter.id=:uid and ev.time > :start and ev.time < :end"; System.out.println(iq.findAllByQuery(sql, p).size());Python example:
c = omero.client(['--Ice.Config='+ICE_CONFIG]) c.createSession("test", "ome") q = c.sf.getQueryService() admin = c.sf.getAdminService() cx = admin.getEventContext() print cx p = omero.sys.Parameters() p.map = {} p.map["uid"] = omero.RLong(cx.userId) p.map['start'] = start = omero.RTime(1218529874000) p.map['end'] = end = omero.RTime(1221121874000) sql = "select el from EventLog el left outer join fetch el.event ev " \ "where el.entityType in ('ome.model.core.Pixels', 'ome.model.core.Image', " \ "'ome.model.containers.Dataset', el.entityType='ome.model.containers.Project') " \ "and ev.experimenter.id=:uid and ev.time > :start and ev.time < :end" print len(q.findAllByQuery(sql, p))