blocking node with large result sets #181

houmanb · 2012-09-02T18:28:09Z

We have a table with more than 1x10ˆ6 entries. We use the evented API for data retrieval.
Although ignoring the row event (for debugging purposes) the node server is blocked during data processing.
Using the non-native mode the node server is not blocked but responding extremely slow (up to 5 secs instead of 100 mis).
In the native mode the node server is completely blocked and allocates more than 2Gig of RAM.

In order to dive into the problem we tried to use your own "large-datatset-bench.js" with slightly different number of rows (1x10ˆ6).
The execution times with respect to number of items to be inserted does not increase in a linear manner (as expected).

What we have noticed during both test cases was the fact that the Postgres backend processes did not use more than 1.5 % of CPU time.

brianc · 2012-09-04T16:31:05Z

Yikes. The only thing I can think of is postgres is pushing you rows as fast as is possible and they're taking all the memory available. Though the API is evented, it can still get overwhelmed by a back-end system sending you 1x10^6 rows. Are you using a query with parameters or one without?

houmanb · 2012-09-04T21:21:19Z

Hi, thanks for your response.
We use only non parametric statements (i.e. not prepared statements). Our statements look like this
SELECT T.col_1, T.col_2, T.col_3, ... FROM T;
We actually do not load all the data into the memory.

Maybe the issue is somehow correlated with the following behavoir.
In the file "large-datatset-bench.js" located in the "./pg/benchmark" directory (after adding the missing value conString)
var conString = 'postgres://postgres@127.0.0.1:5432/test';
I was able to run the large-dataset-bench.js. The "pure-javascript benchmark." works fine and the native test ends with a segfault and terminates (which is not the issue here).
But if I increase the number of rows by multiplies of 10 (i.e. starting from your value 10000 to 100000 and then to 1000000) I get the non linear behavior I mentioned in my last post.

Here my results:

loading all rows into memory:

average(10000) = 110.9 ms ms
average(100000) = 1200.9 ms
average(1000000) not as expected around 12000 ms

The first encounter with this issue was the following situation: Given a database (DB1) containing a table with 10^6 rows and another database (DB2) with an empty copy of the same table. We created two connections with both databases (evented API) and tried to read form DB1 and stream the data into DB2 on each 'row' event. Did we abuse the driver? Or did we use it in a manner it was never meant to?

thanks in advance

rhibnes · 2013-03-04T21:56:21Z

I have an issue with Node blocking as well. Ive structured my query method using both the callback and event listener methods and while I get results back from both, the following scenario blocks.

Client requests query of about 180,000 rows. an instance pg client is created and the query is sent to a remote server on the same network. The query takes about 10 seconds to run. All my console.log debugging message fire immediately alerting me that all my methods have finished save the query, which is how node is supposed to work.

The client sends a second request 1 second later for a dataset of 500 rows. The client uses $.post in both instances and I know it isn't blocking for the request. My server will not take my second request until the dataset from the first request has been returned. Is there a problem with multiple async requests from the same client? Am I missing something? Here is my code.
pgClient = require 'pg'
@dbquery = new pgClient.client {'host': host, 'user': user, 'password': pass, 'port': port, 'database': database}
@dbquery.connect()
@dbQuery.query qString, (err, result) =>
if err
console.log 'Error in getData'
response.send result.rows.

Thanks in advance.

brianc · 2013-03-04T21:58:57Z

the PostgreSQL server only supports 1 executing query per client. If you
need to issue two queries you need to use a pool of clients.

On Mon, Mar 4, 2013 at 3:56 PM, rhibnes notifications@github.com wrote:

I have an issue with Node blocking as well. Ive structured my query method
using both the callback and event listener methods and while I get results
back from both, the following scenario blocks.

Client requests query of about 180,000 rows. an instance pg client is
created and the query is sent to a remote server on the same network. The
query takes about 10 seconds to run. All my console.log debugging message
fire immediately alerting me that all my methods have finished save the
query, which is how node is supposed to work.

The client sends a second request 1 second later for a dataset of 500
rows. The client uses $.post in both instances and I know it isn't blocking
for the request. My server will not take my second request until the
dataset from the first request has been returned. Is there a problem with
multiple async requests from the same client? Am I missing something? Here
is my code.
pgClient = require 'pg'
@dbquery = new pgClient.client {'host': host, 'user': user, 'password':
pass, 'port': port, 'database': database}
@dbquery.connect()
@dbQuery.query qString, (err, result) =>
if err
console.log 'Error in getData'
response.send result.rows.

Thanks in advance.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/181#issuecomment-14408428
.

rpedela · 2013-05-20T03:10:46Z

Does this help?
http://stackoverflow.com/questions/15717351/libpq-using-large-amount-of-memory-for-simple-example
http://www.postgresql.org/docs/9.2/static/libpq-single-row-mode.html

Single row mode in libpq is only available for 9.2+ though. Regardless it seems like it would be worth implementing. I have been wondering how to handle very large result sets and this seems like the solution as long as you are using 9.2+.

You can also use FETCH.
http://www.postgresql.org/docs/current/static/sql-fetch.html

rpedela · 2013-05-30T23:10:04Z

There is another use case for supporting single row mode which I just ran into. Let's say you want to do a COPY but want the output of a particular column reformatted. For example, "true" instead of "t" for boolean columns. As far as I am aware, you can't do this with COPY or at least not easily. But you could do this with a SELECT * FROM T and use single row mode to avoid exceeding RAM.

…+. This will enable single row mode only when the user wants to stream rows.

brianc · 2014-10-20T04:04:30Z

Single row mode would indeed help with this but still wont properly take into account back-pressure when you are reading rows faster than you can process them. The problem is libpq by default and always until 9.2 selected everything into client memory before making any rows available to the application. If you really want to do this -- and I do this all the time -- the best way is to use pg-cursor and pg-query-stream. These will efficiently stream an unlimited data and handle back-pressure properly. I use it to daily pipe a few gigs of database results into elastic search without every going above 200 megs of node memory.

131 · 2018-10-14T22:30:26Z

Hi @brianc, sorry to bother you here.
It's 2018, node 10 is going to be LTS, and async generators are here.

Do you have plans to integrate async generators cursors into pg-cursors ?

( i cannot assume everybody is familiar with async generators (aka. cursors), so here's a gist to browse all your filesystem in a for await(files of readdir('/')) )

rpedela mentioned this issue May 20, 2013

streaming query results #352

Closed

rpedela mentioned this issue Jun 14, 2013

feature request: support single row mode #366

Closed

rpedela pushed a commit to rpedela/node-postgres that referenced this issue Aug 22, 2013

brianc#181 brianc#366 Add support for single row mode in Postgres 9.2…

6b6b92b

…+. This will enable single row mode only when the user wants to stream rows.

rpedela pushed a commit to rpedela/node-postgres that referenced this issue Sep 4, 2013

brianc#181 brianc#366 Add support for single row mode in Postgres 9.2…

cd4565b

…+. This will enable single row mode only when the user wants to stream rows.

brianc closed this as completed Oct 20, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

blocking node with large result sets #181

blocking node with large result sets #181

houmanb commented Sep 2, 2012

brianc commented Sep 4, 2012

Uh oh!

houmanb commented Sep 4, 2012

Uh oh!

rhibnes commented Mar 4, 2013

Uh oh!

brianc commented Mar 4, 2013

Uh oh!

rpedela commented May 20, 2013

Uh oh!

rpedela commented May 30, 2013

Uh oh!

brianc commented Oct 20, 2014

Uh oh!

131 commented Oct 14, 2018 •

edited

Loading

Uh oh!

Uh oh!

blocking node with large result sets #181

blocking node with large result sets #181

Comments

houmanb commented Sep 2, 2012

brianc commented Sep 4, 2012

Uh oh!

houmanb commented Sep 4, 2012

loading all rows into memory:

Uh oh!

rhibnes commented Mar 4, 2013

Uh oh!

brianc commented Mar 4, 2013

Uh oh!

rpedela commented May 20, 2013

Uh oh!

rpedela commented May 30, 2013

Uh oh!

brianc commented Oct 20, 2014

Uh oh!

131 commented Oct 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

131 commented Oct 14, 2018 •

edited

Loading