-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Long-running transaction within a pooled client #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note that the "do something else" for a long time could be an asynchronous call, and the query('COMMIT') attempt within the end callback of that call. |
Excellent point. I think this might require an API addition to the pooled client. I think we need to keep the 'automatic' releasing back to the pool in the non-transactional case since that's how the API kinda works now...but here are a few ideas of how to better support transactions. pg waypg.connect('whatever', function(err, client) {
pg.checkOut(client);
client.query('BEGIN');
client.query('select name, filePath from foo', function(err, result) {
process.nextTick(function() {
client.query('COMMIT', function(err, result) {
process.nextTick(function() {
pg.checkIn(client);
})
})
})
})
}) This way the 'pg' object is still responsible for handling the pooling. By 'default' it will check the client back in after the client drains, but if you call 'checkOut' then you have to explicitly check the client back in. Another way that would be kinda more nice looking possibly would be this: Client waypg.connect('whatever', function(err, client) {
client.pauseDrain();
client.query('BEGIN');
client.query('select name, filePath from foo', function(err, result) {
process.nextTick(function() {
client.query('COMMIT', function(err, result) {
process.nextTick(function() {
client.resumeDrain();
})
})
})
})
}) This way you could pause & resume the drain event. I'd like to keep the api on either I believe the node-mysql folk solve this problem by relying on node-pool and requiring the user of node-mysql to implement their own pooling logic. I've entertained the idea of swapping out my pool implementation with node-pool internally as well, though it wouldn't matter because I would still like to provide a default pooling solution to allow for people to be up and running with little fuss. Anyways, you bring up a great point pixelglow. Let's discuss a nice API for this! |
Currently does checkOut/checkIn and pauseDrain/resumeDrain exist and work the way you outline above? In particular, I thought checkOut/checkIn is called automatically and there doesn't seem to be any reference counting involved. My preference is to interpose a transaction object. A good precedent is the .NET SqlTransaction object, but keeping it simple and asynchronous: client.transact(function(err, transaction)); // calls BEGIN and returns a new transaction object
transaction.commit(); // calls COMMIT and also releases underlying client back to the pool (if no outstanding transactions)
transaction.rollback(); // calls ROLLBACK and also releases underlying client back to the pool (if no outstanding transactions) I don't think you need callbacks for commit and rollback since they appear to be "fire-and-forget" commands. You might want to add savepoints as well. Essentially, each outstanding transaction would keep the client from being drained, much like outstanding queries do now. This should scale up to nested/multiple transactions too. Also this proposed API
|
Rewriting your example using transaction objects: pg.connect('whatever', function(err, client) {
client.transact(function(err, transaction) {
client.query('select name, filePath from foo', function(err, result) {
process.nextTick(function() {
transaction.commit();
})
})
})
}) |
I am familiar with the .NET IDbConnection/IDbTransaction interfaces. I actually tried to model the node-mysql API more closely. I think the IDbConnection & IDbTransaction stuff is more tailored to .NET and also a lot more verbose. (i.e. ExecuteQuery, ExecuteScalar, ExecuteCommand all issue commands to the database) and I was aiming for something more low level and/or simple. What I'm thinking is transactions are pretty fine-grained and sometimes you might want a particular transaction isolation level, etc. And so we'd need to provide in the API a way to specify isolation levels. Internally the client would have to pause & resume drain events anyway so maybe implementing the pause/drain would be a good first step while letting nicer, higher-level transaction api's take shape in the client space for the time being. I am guessing in any app of serious size the postgres interaction (be it through pg.connect, new Client() or some custom pooling mechanic) is kept in a data access layer...the data access layer could provide the proper sugar for transactions. I do agree there aren't really many uses for pausing/resuming the 'drain' event other than for controlling the pooling behavior of a Client. It would be useful to be able to pause/resume 'drain' if you were implementing your own custom queue as well. I like the |
Hmmm... OK, it's your call and your API. It's fine to build higher level transaction API's on your code and I appreciate your desire to keep it simple. Let's take a slightly different tack then, but more inline with your original suggestion. As I see it, the abstraction from queries (implicit, one-command transactions) and explicit transactions (multiple-command queries) is that each have a reference-counted "claim" on the client to prevent draining, until they are done. If the API pauseDrain/resumeDrain increment or decrement some reference count that is shared by the query (increment on query creation, decrement on query completion), and on reference count == 0 the drain event is triggered, that would be ideal. This would help in cases of nested or multiple transactions or other long-running claims on delaying the drain event. In which case, pauseDrain/resumeDrain could be called retain/release or something more indicative of a reference count. |
I've been letting this churn in the back of my mind for some time and the more I think about it the more I'm thinking a pg.connect('blah', function(err, client) {
client.query("one query");
process.nextTick(function() {
//client has been released back to the pool and potentially picked up by another co-routine!
client.query('two query");
})
}) I think naming the method (or method pair) something related specifically to
What do you think? |
I agree that we support a more abstract API rather than be tied down to supporting transactions in particular. To reiterate and clarify my points:
|
Alternatively: Since the "client being stolen from under the user" really only applies to pooled clients through the pg.connect call, you could KISS by having an equivalent pg.connectAndItWontBeStolen call which returns a client that won't be stolen but still has to be end()'ed when you're done with it. Underneath the covers you could return a "subclass" of the client class which does a pauseDrain at start and resumeDrain at end. This neatly handles the long-running transaction and other such cases without layering any other public API (retain/release/pause/resume etc.), and just pushes the rarer (hopefully) use case of multiple transactions out to the user to keep track of. |
client.pauseDrain() and client.resumeDrain() now implemented and new version pushed to npm. This allows explicit control over if and when a client fires it's drain event and is therefore returned to the client pool. I will leave this issue open until I finished documentation on the methods within the wiki. |
Added documentation to the wiki. |
What happens when I issue several queries in a long-running transaction within a pooled client?
E.g.
query('BEGIN');
// do something else for a long time
query('COMMIT');
When the system has gone away to process the "do something else", there are no outstanding queries in the client, so shouldn't the client end up issuing the drain event and get checked back into the pool?
If this is an issue, are there workarounds/fixes etc.?
The text was updated successfully, but these errors were encountered: