Hi,recently we encounter an performance problem when we upgrade the PyGreSQL from 4.1 to 5.1 with Python2.7 runing environment.
The test case is just consider about the fetchall interface. It's just query 200M rows. The test case is
import datetime
import time
import pgdb
try:
conn=pgdb.connect(host='127.0.0.1:5433',user='postgres',database='postgres')
c=conn.cursor()
c.execute("select * from XXX_int")
row = c.fetchall()
c.close()
conn.close()
except (pgdb.InternalError,Exception) as e:
print(str(e) + "\nFAILED")
The server version is PostgreSQL12.4, it's information as follows
postgres=# select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (EulerOS 4.8.5-28), 32-bit
(1 row)
postgres=# select count(1) from XXX_int;
count
---------
2000000
(1 row)
postgres=# \d xxx_int
Table "public.xxx_int"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
a | integer | | |
b | integer | | |
For private reason, we need to compile 32-bit version in X86_64 platform with the -m32 flag.
We compare the execution time with two version.
py4.1
[/PyGreSQL-4.1/module/build/lib.linux-x86_64-2.7]$ time python test.py
real 0m7.868s
user 0m7.139s
sys 0m0.325s
[/PyGreSQL-4.1/module/build/lib.linux-x86_64-2.7]$ time python test.py
real 0m8.237s
user 0m7.473s
sys 0m0.333s
[/PyGreSQL-4.1/module/build/lib.linux-x86_64-2.7]$ time python test.py
real 0m7.856s
user 0m7.151s
sys 0m0.298s
py5.1
[/usr/local/lib/python2.7/site-packages/PyGreSQL-5.1-py2.7-linux-x86_64.egg]$ time python test.py
real 0m13.153s
user 0m12.408s
sys 0m0.318s
[/usr/local/lib/python2.7/site-packages/PyGreSQL-5.1-py2.7-linux-x86_64.egg]$ time python test.py
real 0m13.223s
user 0m12.505s
sys 0m0.291s
[/usr/local/lib/python2.7/site-packages/PyGreSQL-5.1-py2.7-linux-x86_64.egg]$ time python test.py
real 0m13.261s
user 0m12.535s
sys 0m0.313s
You can see the avg time is about 8s in py4.1 while 13.2s in py5.1.
【Analyze】
Trough the analyze, we find the most cost time is in the following at the fetchmany function end, which is used for constructing the returning result.
return [row_factory([typecast(value, typ)
for typ, value in zip(coltypes, row)]) for row in result]
Comparing the fetchmany function, we find a little differences and rewrite it.
typecast = self.type_cache.typecast
row_factory = self.row_factory
coltypes = self.coltypes
return [row_factory([typecast(value, typ)
for typ, value in zip(coltypes, row)]) for row in result]
The executing time is as follows.
[ /usr/local/lib/python2.7/site-packages/PyGreSQL-5.1-py2.7-linux-x86_64.egg]$ time python test.py
real 0m11.158s
user 0m10.451s
sys 0m0.288s
[ /usr/local/lib/python2.7/site-packages/PyGreSQL-5.1-py2.7-linux-x86_64.egg]$ time python test.py
real 0m11.226s
user 0m10.530s
sys 0m0.300s
Through the testing result, execution time is down from 13s to 11s.
And another difference is funcion typecast, but I donnot know how to optimiation it.