Skip to content

blocking node with large result sets #181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
houmanb opened this issue Sep 2, 2012 · 8 comments
Closed

blocking node with large result sets #181

houmanb opened this issue Sep 2, 2012 · 8 comments

Comments

@houmanb
Copy link

houmanb commented Sep 2, 2012

We have a table with more than 1x10ˆ6 entries. We use the evented API for data retrieval.
Although ignoring the row event (for debugging purposes) the node server is blocked during data processing.
Using the non-native mode the node server is not blocked but responding extremely slow (up to 5 secs instead of 100 mis).
In the native mode the node server is completely blocked and allocates more than 2Gig of RAM.

In order to dive into the problem we tried to use your own "large-datatset-bench.js" with slightly different number of rows (1x10ˆ6).
The execution times with respect to number of items to be inserted does not increase in a linear manner (as expected).

What we have noticed during both test cases was the fact that the Postgres backend processes did not use more than 1.5 % of CPU time.

@brianc
Copy link
Owner

brianc commented Sep 4, 2012

Yikes. The only thing I can think of is postgres is pushing you rows as fast as is possible and they're taking all the memory available. Though the API is evented, it can still get overwhelmed by a back-end system sending you 1x10^6 rows. Are you using a query with parameters or one without?

@houmanb
Copy link
Author

houmanb commented Sep 4, 2012

Hi, thanks for your response.
We use only non parametric statements (i.e. not prepared statements). Our statements look like this

SELECT T.col_1, T.col_2, T.col_3, ... FROM T;

We actually do not load all the data into the memory.

Maybe the issue is somehow correlated with the following behavoir.
In the file "large-datatset-bench.js" located in the "./pg/benchmark" directory (after adding the missing value conString)

var conString = 'postgres://postgres@127.0.0.1:5432/test';

I was able to run the large-dataset-bench.js. The "pure-javascript benchmark." works fine and the native test ends with a segfault and terminates (which is not the issue here).
But if I increase the number of rows by multiplies of 10 (i.e. starting from your value 10000 to 100000 and then to 1000000) I get the non linear behavior I mentioned in my last post.

Here my results:

loading all rows into memory:

average(10000) = 110.9 ms ms
average(100000) = 1200.9 ms
average(1000000) not as expected around 12000 ms

The first encounter with this issue was the following situation: Given a database (DB1) containing a table with 10^6 rows and another database (DB2) with an empty copy of the same table. We created two connections with both databases (evented API) and tried to read form DB1 and stream the data into DB2 on each 'row' event. Did we abuse the driver? Or did we use it in a manner it was never meant to?

thanks in advance

@rhibnes
Copy link

rhibnes commented Mar 4, 2013

I have an issue with Node blocking as well. Ive structured my query method using both the callback and event listener methods and while I get results back from both, the following scenario blocks.

Client requests query of about 180,000 rows. an instance pg client is created and the query is sent to a remote server on the same network. The query takes about 10 seconds to run. All my console.log debugging message fire immediately alerting me that all my methods have finished save the query, which is how node is supposed to work.

The client sends a second request 1 second later for a dataset of 500 rows. The client uses $.post in both instances and I know it isn't blocking for the request. My server will not take my second request until the dataset from the first request has been returned. Is there a problem with multiple async requests from the same client? Am I missing something? Here is my code.
pgClient = require 'pg'
@dbquery = new pgClient.client {'host': host, 'user': user, 'password': pass, 'port': port, 'database': database}
@dbquery.connect()
@dbQuery.query qString, (err, result) =>
if err
console.log 'Error in getData'
response.send result.rows.

Thanks in advance.

@brianc
Copy link
Owner

brianc commented Mar 4, 2013

the PostgreSQL server only supports 1 executing query per client. If you
need to issue two queries you need to use a pool of clients.

On Mon, Mar 4, 2013 at 3:56 PM, rhibnes notifications@github.com wrote:

I have an issue with Node blocking as well. Ive structured my query method
using both the callback and event listener methods and while I get results
back from both, the following scenario blocks.

Client requests query of about 180,000 rows. an instance pg client is
created and the query is sent to a remote server on the same network. The
query takes about 10 seconds to run. All my console.log debugging message
fire immediately alerting me that all my methods have finished save the
query, which is how node is supposed to work.

The client sends a second request 1 second later for a dataset of 500
rows. The client uses $.post in both instances and I know it isn't blocking
for the request. My server will not take my second request until the
dataset from the first request has been returned. Is there a problem with
multiple async requests from the same client? Am I missing something? Here
is my code.
pgClient = require 'pg'
@dbquery = new pgClient.client {'host': host, 'user': user, 'password':
pass, 'port': port, 'database': database}
@dbquery.connect()
@dbQuery.query qString, (err, result) =>
if err
console.log 'Error in getData'
response.send result.rows.

Thanks in advance.


Reply to this email directly or view it on GitHubhttps://github.com//issues/181#issuecomment-14408428
.

@rpedela
Copy link
Contributor

rpedela commented May 20, 2013

Does this help?
http://stackoverflow.com/questions/15717351/libpq-using-large-amount-of-memory-for-simple-example
http://www.postgresql.org/docs/9.2/static/libpq-single-row-mode.html

Single row mode in libpq is only available for 9.2+ though. Regardless it seems like it would be worth implementing. I have been wondering how to handle very large result sets and this seems like the solution as long as you are using 9.2+.

You can also use FETCH.
http://www.postgresql.org/docs/current/static/sql-fetch.html

@rpedela
Copy link
Contributor

rpedela commented May 30, 2013

There is another use case for supporting single row mode which I just ran into. Let's say you want to do a COPY but want the output of a particular column reformatted. For example, "true" instead of "t" for boolean columns. As far as I am aware, you can't do this with COPY or at least not easily. But you could do this with a SELECT * FROM T and use single row mode to avoid exceeding RAM.

rpedela pushed a commit to rpedela/node-postgres that referenced this issue Aug 22, 2013
…+. This will enable single row mode only when the user wants to stream rows.
rpedela pushed a commit to rpedela/node-postgres that referenced this issue Sep 4, 2013
…+. This will enable single row mode only when the user wants to stream rows.
@brianc
Copy link
Owner

brianc commented Oct 20, 2014

Single row mode would indeed help with this but still wont properly take into account back-pressure when you are reading rows faster than you can process them. The problem is libpq by default and always until 9.2 selected everything into client memory before making any rows available to the application. If you really want to do this -- and I do this all the time -- the best way is to use pg-cursor and pg-query-stream. These will efficiently stream an unlimited data and handle back-pressure properly. I use it to daily pipe a few gigs of database results into elastic search without every going above 200 megs of node memory.

@brianc brianc closed this as completed Oct 20, 2014
@131
Copy link

131 commented Oct 14, 2018

Hi @brianc, sorry to bother you here.
It's 2018, node 10 is going to be LTS, and async generators are here.

Do you have plans to integrate async generators cursors into pg-cursors ?

( i cannot assume everybody is familiar with async generators (aka. cursors), so here's a gist to browse all your filesystem in a for await(files of readdir('/')) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants