Skip to content

Long-running transaction within a pooled client #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pixelglow opened this issue Jul 10, 2011 · 11 comments
Closed

Long-running transaction within a pooled client #35

pixelglow opened this issue Jul 10, 2011 · 11 comments

Comments

@pixelglow
Copy link

What happens when I issue several queries in a long-running transaction within a pooled client?

E.g.

query('BEGIN');
// do something else for a long time
query('COMMIT');

When the system has gone away to process the "do something else", there are no outstanding queries in the client, so shouldn't the client end up issuing the drain event and get checked back into the pool?

If this is an issue, are there workarounds/fixes etc.?

@pixelglow
Copy link
Author

Note that the "do something else" for a long time could be an asynchronous call, and the query('COMMIT') attempt within the end callback of that call.

@brianc
Copy link
Owner

brianc commented Jul 10, 2011

Excellent point. I think this might require an API addition to the pooled client. I think we need to keep the 'automatic' releasing back to the pool in the non-transactional case since that's how the API kinda works now...but here are a few ideas of how to better support transactions.

pg way

pg.connect('whatever', function(err, client) {
  pg.checkOut(client);
  client.query('BEGIN');
  client.query('select name, filePath from foo', function(err, result) {
    process.nextTick(function() {
      client.query('COMMIT', function(err, result) { 
        process.nextTick(function() {
          pg.checkIn(client);
        })    
      })
    })
  })
})

This way the 'pg' object is still responsible for handling the pooling. By 'default' it will check the client back in after the client drains, but if you call 'checkOut' then you have to explicitly check the client back in.

Another way that would be kinda more nice looking possibly would be this:

Client way

pg.connect('whatever', function(err, client) {
  client.pauseDrain();
  client.query('BEGIN');
  client.query('select name, filePath from foo', function(err, result) {
    process.nextTick(function() {
      client.query('COMMIT', function(err, result) { 
        process.nextTick(function() {
          client.resumeDrain();
        })    
      })
    })
  })
})

This way you could pause & resume the drain event. I'd like to keep the api on either pg or Client and not having to call something on both objects because really I'd like the Client to not know it's being pooled and have the Client api be useful even in a non-pooled scenario. I like the Client approach beter because it extends Client with new functionality which would be useful in other places. It does slightly break encapsulation though because you kinda have to know "oh, I need to pause the drain event on the client so it doesn't raise it and get released back to the pool." But then again...it's not the client making the decision, only your calling code, so it's not too dirty?

I believe the node-mysql folk solve this problem by relying on node-pool and requiring the user of node-mysql to implement their own pooling logic. I've entertained the idea of swapping out my pool implementation with node-pool internally as well, though it wouldn't matter because I would still like to provide a default pooling solution to allow for people to be up and running with little fuss.

Anyways, you bring up a great point pixelglow. Let's discuss a nice API for this!

@pixelglow
Copy link
Author

Currently does checkOut/checkIn and pauseDrain/resumeDrain exist and work the way you outline above? In particular, I thought checkOut/checkIn is called automatically and there doesn't seem to be any reference counting involved.

My preference is to interpose a transaction object. A good precedent is the .NET SqlTransaction object, but keeping it simple and asynchronous:

client.transact(function(err, transaction)); // calls BEGIN and returns a new transaction object
transaction.commit(); // calls COMMIT and also releases underlying client back to the pool (if no outstanding transactions)
transaction.rollback(); // calls ROLLBACK and also releases underlying client back to the pool (if no outstanding transactions)

I don't think you need callbacks for commit and rollback since they appear to be "fire-and-forget" commands.

You might want to add savepoints as well.

Essentially, each outstanding transaction would keep the client from being drained, much like outstanding queries do now. This should scale up to nested/multiple transactions too. Also this proposed API

  • would not be tied to low-level details such as client pooling or drain logic (which your existing API admirably achieves!)
  • it is less error-prone since it does the BEGIN, COMMIT and ROLLBACK at the right time w.r.t. to the drainage instead of relying on the user to do so
  • and I can't really think of a good use case where you would want to delay drainage except for transactions anyway, so it's probably best to insulate and encapsulate the whole shebang in a friendly object.

@pixelglow
Copy link
Author

Rewriting your example using transaction objects:

pg.connect('whatever', function(err, client) {
  client.transact(function(err, transaction) {
    client.query('select name, filePath from foo', function(err, result) {
      process.nextTick(function() {
        transaction.commit();
      })
    })
  })
})

@brianc
Copy link
Owner

brianc commented Jul 13, 2011

I am familiar with the .NET IDbConnection/IDbTransaction interfaces. I actually tried to model the node-mysql API more closely. I think the IDbConnection & IDbTransaction stuff is more tailored to .NET and also a lot more verbose. (i.e. ExecuteQuery, ExecuteScalar, ExecuteCommand all issue commands to the database) and I was aiming for something more low level and/or simple.

What I'm thinking is transactions are pretty fine-grained and sometimes you might want a particular transaction isolation level, etc. And so we'd need to provide in the API a way to specify isolation levels. Internally the client would have to pause & resume drain events anyway so maybe implementing the pause/drain would be a good first step while letting nicer, higher-level transaction api's take shape in the client space for the time being. I am guessing in any app of serious size the postgres interaction (be it through pg.connect, new Client() or some custom pooling mechanic) is kept in a data access layer...the data access layer could provide the proper sugar for transactions.

I do agree there aren't really many uses for pausing/resuming the 'drain' event other than for controlling the pooling behavior of a Client. It would be useful to be able to pause/resume 'drain' if you were implementing your own custom queue as well.

I like the transaction.commit style...I just think it's a lot of API additions.

@pixelglow
Copy link
Author

Hmmm... OK, it's your call and your API. It's fine to build higher level transaction API's on your code and I appreciate your desire to keep it simple.

Let's take a slightly different tack then, but more inline with your original suggestion. As I see it, the abstraction from queries (implicit, one-command transactions) and explicit transactions (multiple-command queries) is that each have a reference-counted "claim" on the client to prevent draining, until they are done. If the API pauseDrain/resumeDrain increment or decrement some reference count that is shared by the query (increment on query creation, decrement on query completion), and on reference count == 0 the drain event is triggered, that would be ideal. This would help in cases of nested or multiple transactions or other long-running claims on delaying the drain event. In which case, pauseDrain/resumeDrain could be called retain/release or something more indicative of a reference count.

@brianc
Copy link
Owner

brianc commented Jul 19, 2011

I've been letting this churn in the back of my mind for some time and the more I think about it the more I'm thinking a pauseDrain and resumeDrain type of api might be best. I can think of cases where you may not be in a transaction but you are doing a few commands, reading a few files, and then doing a few more commands. While you may not care if you're on the same connection (since you're not in a transaction) using pg.connect might behave in surprising and unacceptable ways when you run a few commands on the client, then do something async (the client drains during the async stuff) and then trying to access your client reference again. Example:

pg.connect('blah', function(err, client) {
  client.query("one query");
  process.nextTick(function() {
    //client has been released back to the pool and potentially picked up by another co-routine! 
    client.query('two query");
  })
})

I think naming the method (or method pair) something related specifically to transaction could present confusion on these points:

  • does the transaction method issue any commands to the server? (no, but it might seem like that method would do that)
  • is it okay to call transaction when I'm not in a transaction?

What do you think?

@pixelglow
Copy link
Author

I agree that we support a more abstract API rather than be tied down to supporting transactions in particular. To reiterate and clarify my points:

  • The API should be reference-counted, not just a binary state. This helps with conceptually nested transactions, multiple outstanding transactions etc.
  • Conceptually, the API should interact correctly with outstanding queries. Each query beginning does a +1 to the reference count, and each query ending does a -1 to the reference count. Obviously since you've already implemented a query queue, you may not need to update the reference count, but conceptually it should -- i.e. the API user can reason correctly when the drain event will be triggered in the face of multiple queries and multiple claims.
  • The API should be named something that makes it clear it is a reference-counting scheme e.g. retain/release. You are right in not naming it a transaction or close to a transaction.

@pixelglow
Copy link
Author

Alternatively:

Since the "client being stolen from under the user" really only applies to pooled clients through the pg.connect call, you could KISS by having an equivalent pg.connectAndItWontBeStolen call which returns a client that won't be stolen but still has to be end()'ed when you're done with it. Underneath the covers you could return a "subclass" of the client class which does a pauseDrain at start and resumeDrain at end.

This neatly handles the long-running transaction and other such cases without layering any other public API (retain/release/pause/resume etc.), and just pushes the rarer (hopefully) use case of multiple transactions out to the user to keep track of.

@brianc
Copy link
Owner

brianc commented Sep 2, 2011

client.pauseDrain() and client.resumeDrain() now implemented and new version pushed to npm. This allows explicit control over if and when a client fires it's drain event and is therefore returned to the client pool. I will leave this issue open until I finished documentation on the methods within the wiki.

@brianc
Copy link
Owner

brianc commented Oct 17, 2011

Added documentation to the wiki.

@brianc brianc closed this as completed Oct 17, 2011
marekventur added a commit to marekventur/node-postgres that referenced this issue Jul 30, 2015
brianc added a commit that referenced this issue Dec 18, 2019
return rowCount on insert
brianc pushed a commit that referenced this issue Dec 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants