OpenQuery() to Linked Server hangs, leaving SPID with open tran, then templog.ldf grows without limit.

Hi,

We have a customer that is using SQL Server 2000, and experiencing the following intermittent problem -- occasionally templog.ldf will grow and grow until it fills the entire disk.

This is a rare problem, it has happened less than 10 times in the past two years, for a job that runs once a minute. But it has happened on 3 different SQL Servers, (two production servers, and one test server).

Our suspicion is that the root cause is an OPENQUERY() to pull data from a linked server (Oracle database on Unix). We have seen that these OPENQUERY() statements occasionally hang and cannot be killed. The OPENQUERY() is used to populate a local table, and when we originally populated permanent tables with OPENQUERY(), then there would be a lock on this permanent table and nothing could be done until SQL Server was stopped and restarted.

To workaround the immediate problem, last year we modified all OPENQUERY() statements so that the local table was a #temp table. In that case, cancelling the job left the hung process, but the hung process had a lock on a #temp table and so that didn't keep the next execution of the job from creating a new #temp table and populating it.

However... it appears that the hung process, because it leaves an SPID with an open transaction, keeps the tempdb log file from being truncated. And eventually you run out of disk space, no matter how much disk space is on the server.

We have tried using "kill" on the SPID with the open transaction, but that doesn't work - the process never dies (we waited several days) and the transactions are never rolled back.

We would like to resolve this issue, because it has started coming back with some regularity, and of course it usually happens on nights or weekends. Plus there is the problem of scheduling a stop/restart of SQL Server on a production server, which we would like in all cases to avoid.

Here is some data from the most recent instant of this problem (test server):

Product version: 8.00.760
Product level: SP3
Edition: Standard Edition
Version: Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002
14:22:05 Copyright (c) 1988-2003 Microsoft Corporation Standard Edition
on Windows NT 5.0 (Build 2195: Service Pack 4)

SELECT * FROM master.dbo.sysprocesses returns this suspicious SPID:

spid: 51
kpid: 1876
blocked: 0
waittype: 0x0042
waittime: 135452422
lastwaittype: OLEDB
waitresource: _name of Oracle linked server_
dbid: 9
uid: 0
cpu: 188
physical_io: 0
memusage: 12
login_time: 2006-08-07 18:53:04
last_batch: 2006-08-07 18:53:04
ecid: 0
open_tran: 2
status: runnable
sid: …
hostname: …
program_name: SQLAgent - TSQL JobStep ...
hostprocess: 1500
cmd: INSERT
nt_domain: NT AUTHORITY
nt_username: SYSTEM
net_address: 000CF1C7A509
net_library: TCP/IP
loginame: NT AUTHORITY\SYSTEM
context_info: …
sql_handle: …
stmt_start: 0
stmt_end: -1

Note 1: This data is from 2006-08-09, so the SPID has been hanging around for almost 2 days.
Note 2: We did not attempt to KILL this SPID. In the past when we have done so that changes the status of the SPID but even after waiting several days does not get rid of the SPID & open transaction.
Note 3: These OPENQUERY() statements normally run in 1-5 seconds, so it does not seem likely that it takes more than 2 days to roll this transaction back when we attempt to kill it.
Note 4: From a web search, I have found other people with similar problems but no resolution.
Note 5: We have consciously not installed SQL Server 2000 SP4, because it is our understanding that there is a problem in SP4 with online defragmentation holding locks.

Any suggestions on how to:

1) Kill an SPID like this, when it is waiting for OLEDB

or

2) Keep templog.ldf from growing without limit when we get a hung transaction like this

Thanks!
Frank.



Answer this question

OpenQuery() to Linked Server hangs, leaving SPID with open tran, then templog.ldf grows without limit.

  • Embirath

    Suffered the same problem during a migration of system from a foxpro backend to a mssql backend.

    if you ran ANY openrowset() command on the sql server(ie mssqlsvr2k) through queryanalyser it wouldn't allow any new processes to connect. Any existing ones would be left alone but once they finished/disconnected nothing else would be allowed to happen. A reset of the server was the only thing that fixed it.

    The ONLY way to run an openrowset command like:

    INSERT INTO [mssqlsvr2k].[proddb].[dbo].[client]([clientno], [dob], [gname], [surname])
    SELECT dbf.[episodeno], dbf.[clientno], dbf.[epitype], dbf.[refdate], dbf.[epi_group]
    FROM OPENROWSET('MSDASQL',
    'Driver=Microsoft Visual Foxpro Driver;SourceType=DBF;SourceDB=C:\Foxpro Data Copy\',
    'SELECT * FROM client') AS dbf
    go

    through queryanalyser was to install mssql on your local PC and set up a linkserver relationship with the production server (mssqlsvr2k) and run the command on the localhost server. This totally killed the tempdb on the local machine though so i'm guessing this your main problem (ie, your openrowset runs 'locally' on the server).

    I tried a bunch of different ways to get it to run on the remote machine but anything I tried that used openrowset locked up the server.

    Tried the same code on the localhost server and it ran but tempdb grew until it had taken up all the physical mem & swapfile on my local pc. The only way I could migrate "larger" tables (500meg-1gig +) was using DTS. Have you guys considered setting your transfers up as a DTS


  • ratslav

    Hi,

    I am having a similar problem with the hanging connection being left on Sybase which I am Linked Server to from SQL Server. I tried your above link but no luck, would love to vote.

    Thanks,



  • SiTec

    Hi JuanitaG,

    That is puzzling, I do not have much experience with the connect site. I tried it from a different machine where I was not logged into Windows Live and can still jump to the page with bug report.

    Try google for "openquery linked server hangs", the first link that comes up (August 30 2006, 1am Eastern) is the bug report, the second link that comes up is this thread on MSDN forums.

    Here is the cached version of the bug report:

    http://72.14.203.104/search q=cache:FSBfT5Alxa8J:https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx%3FFeedbackID%3D187192+openquery+linked+server+hangs&hl=en&gl=us&ct=clnk&cd=1

    Regards,
    -Frank.


  • Juco

    Thanks Frank,

    New link worked. I notice that you are on SP3 I am having our server upgraded to SP4 following advice on this link

    http://www.kbalertz.com/822668/results.another.error.message.occurs.linked.server.query.aspx

    It seems that I have a couple of error messages but all giving me the same problem - left over dead (Sybase) conenctions from linked server.



  • RoggA

    Hi teeth777,

    Thanks for the info, it seems like this is a more common problem than I thought, independent of the back-end data source... searching for "openquery to linked server hangs", here's just a few examples:

    Oracle: http://groups.google.com/group/microsoft.public.sqlserver.odbc/browse_thread/thread/4c6a57ddf22aa5a9/3893e142c56f2306%233893e142c56f2306

    ASE: http://groups.google.com/group/microsoft.public.sqlserver.connect/browse_thread/thread/7a2df55eaa3c63fe/f7562f954e3726d7%23f7562f954e3726d7

    Sybase: http://groups.google.com/group/microsoft.public.sqlserver.odbc/browse_thread/thread/e1270ce025dfc4ad/52cb71b66c31a33b

    In general: http://groups.google.com/groups q=openquery+to+linked+server+hangs&start=0&scoring=d&hl=en&lr=&

    I've shied away from DTS after using it in the late 90s with SQL 7. From a maintenance point of view, having these separate packages outside of the code is another item to keep track of, and most of our transfers run once a minute and are parameterized to pull just the new activity in a table or set of tables. Using openquery to do so (if it works reliably) is very clean because then all the code is in one place.

    I wonder if it's like sqlmail - looks good in concept, but unreliable in practice (when I last used it, sqlmail also required a stop/restart of SQL Server if the slightest thing went wrong with the downstream mail server... I couldn't believe it at the time but it was confirmed by a Microsoft tech, they said there was no other way to fix sqlmail if it became unhappy).

    Can we get some input from Microsoft on this

    Thanks,
    Frank.


  • shahrul

    Hi Sloppypoet,

    When you check sysprocesses after the openquery() hangs, does it show open transactions If so, does the database ID refer to tempdb or another database (perhaps one of the Reporting services databases )

    After a reboot and running the first openquery(), if you check sysprocesses does it show a hanging process Perhaps the very first openquery() is holding a lock on a destination table.

    When this happens, if you check the Oracle side, does it show a hanging connection

    I think our customer is running now with the setup you are using that is hanging after the first query (Oracle ODBC). I will ask them to run sp_linkedservers and post the results.

    What version of the Oracle drivers are installed Can you use the Oracle query tool to query the database (Start | Programs | Oracle - OraHome | Application Development | SQL Plus)

    Our customer also had a problem once with the Oracle drivers install - if you accept the default folder location, then backup and specify a different location, the installation is toasted and cannot be fixed. The only thing we found that works in this case is to uninstall the Oracle drivers, search the registry for all references to Oracle, delete them, then reinstall, being very careful to specify the installation folder the first time when it asks. I thought this was crazy but then found a web posting documenting the same behavior.

    Best of luck,

    Frank.


  • archimed01

    Hi Microsoft,

    This problem [Openquery() hanging, leaving open connection with open tran, causing templog to blow up] has started happening more frequently. While this is good for troubleshooting, it is bad for the customer, who has to stop and restart SQL Server every time it happens.

    Here is the output from sp_linkedservers for the suspect linked server:

    SRV_NAME PROD2
    SRV_PROVIDERNAME MSDAORA
    SRV_PRODUCT Oracle
    SRV_DATASOURCE PROD2
    SRV_PROVIDERSTRING
    SRV_LOCATION NULL
    SRV_CAT NULL

    After this latest incident, the customer tried to KILL the hanging SPID, but it doesn't go away, sysprocesses lists the same SPID with cmd = KILLED/ROLLBACK, and continues to show open_tran = 2. Thus templog will continue to grow until they stop/restart SQL Server.

    I would like to get this issue resolved - telling a customer that they must stop/restart SQL Server on a production machine whenever something odd happens with an OPENQUERY() connection to Oracle is unacceptable for us. It seems to me that no matter what odd state the OPENQUERY() connection gets into, it should be possible to kill the process and move forward without having to stop/restart SQL Server.

    Thanks for any help you can provide.

    Regards,
    -Frank.


  • saravanakumar38

    We are running SQL Server 2005 and are experiencing a similar problem. After restarting the service we are able to pull back results from 1 openquery statement. After that, each subsequent openquery statement locks and we are unable to kill them. The only solution we can find is restarting the service. Also, unlike fdc2005, our templog.ldf is not growing.

    Report Services is calling reports using the openqueries and therefore this is a production impacting issue. We were using a link with msdaora previously which did not have this issue. However, after a reboot of the server, we were unable to restore that link and had to move to Oracle odbc. So there are two questions:

    1) How can we prevent the linked server from hanging

    and/or

    2) How can we re-initialize the MSDAORA linked server

    linked server script that did work prior to reboot and does not work after reboot is:

    EXEC sp_addlinkedserver @server='GLOG',

    @provider='MSDAORA',

    @srvproduct = 'Oracle',

    @datasrc='CRSTPGC3'

    EXEC sp_addlinkedsrvlogin

    @rmtsrvname ='GLOG',

    @useself ='false',

    @locallogin =NULL,

    @rmtuser = 'glogowner',

    @rmtpassword = 'glogowner'

    Any help is appreciated.


  • AlexBB

    Hi All,

    It has been two weeks now without a response from Microsoft. In that time this problem has happened again and my customer is becoming quite frustrated with having to stop and restart SQL Server at random times because a hung SPID cannot be killed.

    I have opened this issue as a bug:

    https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx FeedbackID=187192

    Please visit this bug report and vote on it if you are also having trouble with this issue.

    Thanks,
    -Frank.


  • pimmy21

    I think you can get more definite answer in the engine forum for such issue.
  • StGeorge

    Hi Nan,

    Thanks, I wasn't sure where it would go, there didn't seem to be a perfect spot for linked server questions like this.

    -Frank.


  • Autofreak

    We were able to get the link working again, though we do not have a clear resolution. Ultimately we attempted to reinstall the Oracle Client. That process hung but we were able to back out of it. We then re-started SQL Services to clear the deadlocked SPIDs. Miraculously, after these steps were done, the link began working again. Our gut feeling is that the root cause was the Oracle client but we do not have any evidence to support that conclusion. We will be testing this theory soon to determine a clearcut resolution.

    Thanks fdc2005 for the insight on the Oracle client itself.


  • Wil Burton

    I am having the same issue with a light modification. I am using OpenRowset() to open a connection with Visual FoxPro and I cannot see any lingering spid's or anything in sql server to indicate that a connection with FoxPro exists. One the server where FoxPro runs there are a .dbc, .dbt and .dbx file that remain locked. The problem is once these lockes are manually broken (via Computer Management -> System Tools -> Shared Folders -> Open Files) the SQL Server can no longer establish a connection to the files.

    Rebooting servers fixes the problem, but rebooting production regularly servers just doesn't cut it.

    The error I get when trying to connect via SQL is:

    Server: Msg 7399, Level 16, State 1, Procedure sprcFoxProUpdates_TEST, Line 74
    OLE DB provider 'MSDASQL' reported an error.
    [OLE/DB provider returned message: [Microsoft][ODBC Visual FoxPro Driver]Cannot open file \\crius\tpe\test2\.]
    OLE DB error trace [OLE/DB Provider 'MSDASQL' IDBInitialize::Initialize returned 0x80004005: ].

    Any help on this issue would be greatly appreciated, that means YOU MICROSOFT.


  • mexy

    Hi Juanita,

    Thanks for the link (and the vote on the bug report).

    We considered SP4 but have been wary of it due to a problem with DBCC INDEXDEFRAG holding locks and causing problems on production servers that are under high load... my understanding is that this problem was introduced by SP4. See for example this posting: http://www.sqlteam.com/forums/topic.asp TOPIC_ID=63194

    "paulrandal Microsoft SQL Server Product Team: ... Yup - unfortunately SP4 had a bug in the lock manager that means the short duration page locks that DBCC INDEXDEFRAG takes are not dropped when it drops them, leading to eventual escalation to an exclusive table lock..."

    Looks like there is a post-SP4 hot fix for the DBCC INDEXDEFRAG problem, however:

    http://support.microsoft.com/kb/907250/
    "FIX: You may experience concurrency issues when you run the DBCC INDEXDEFRAG statement in SQL Server 2000"

    If you find that SP4 fixes your problem with Sybase I will be interested to hear about it, perhaps we should install SP4 and this hotfix.

    Thanks,
    -Frank.


  • OpenQuery() to Linked Server hangs, leaving SPID with open tran, then templog.ldf grows without limit.