Mystic TCP/IP packet loss and MTU 1504 in Windows 7

We recently have observed some strange things in our local are network. Some Internet resources, mostly from Microsoft, failed to open in any web browser (IE, Firefox, Chrome, Opera, Telnet to port 80). As it happened on all computers, we thought that this is our ISP issue and wrote to support. ISP shortly responded that everything works on their side.

Some random/example resources that didn’t work:

We started to dig deeper. And after some troubleshooting and debugging found, that Microsoft resources do not open because URLs/sites related to CDN does not send any info. The strange thing was, that TCP/IP connection is established, but no data is coming our way (later I learned that partial first packed come through, but it was not visible in Telnet console).

Some of the problematic CDNs:


It is important to note, that at this point we thought that this is DNS issue, because everything worked well for our 2nd ISP, and we tried to use IP addresses from our second ISP in Windows hosts file, and sites seemed to start working.

Days past… conversation with our ISP… most of Internet works for us including Gmail, news, etc… Microsoft websites still does not work…

ISP is sending technician to check issue on site. Comes with laptop, and to my big surprise, everything works flawlessly on his PC.

Started to debug, comparing IP addresses, DNS, changing IPs, changing DNS, using Goolge DNS, switching cables… and nothing works on our Windows 7 but still works on his laptop.

Out of curiosity I start virtual machine with Windows 7, open IE and… Microsoft sites are opening in virtual machine, that is on the same physical PC.

Ok. Now try to disable Windows Firewall, Antivirus, etc., etc…

Now I clearly see that problem is related to our Windows PCs, not the ISP, so I start to think of all dark scenarios — rootkit, virus, broken hardware driver, broken hardware on all our PC simultaneously… still no progress…

Starting Wireshark. Connection to CDN is established… but data is not coming except partial first packet. Looking closer, Wireshark shows multiple [TCP Previous segment lost] and [A segment before this frame was lost]. Search for this in Google and one topic talks about rare TCP segment loss:

This must be it, because it looks like Rare Case :)

From this point it was straightforward. The article talks about MTU size mismatch. The first thing to do, I check MTU for my network adapter. Unfortunately my NIC does not support changing MTU via GUI interface, so I use netsh. How do I change the MTU setting in Windows 7?

To view MTU use the following command:
netsh interface ipv4 show subinterfaces

For me it was mysteriously changed from 1500 to 1504. Still do not know why and how it was changed.

To change it to 1500 (default for Ethernet):
Start command prompt cmd.exe
set subinterface "Local Area Connection" mtu=1500 store=persistent

In my case it was “Local Area Connection 2″.

See also:
The default MTU sizes for different network topologies

Dell Rack UPS 5600W + EBM + NMC + Environment probe

We got hand on brand new Dell Rack UPS 5600W. It comes with EBM (Extended battery module), NMC (Network managament card) and Environment probe (Temperature and humidity). It has some nice integration with VMware vCenter for automatic host/guest shutdown. This automation is included in the price.

It has run time of impressive 49 minutes at half load. For our current configuration (about 1000W) it will last for about 5 hours without external power. Also, it needs so much power, that its Power Cord must be hardwired.

Some images below.

Solution for: A transport-level error has occurred when receiving results from the server…

We have a program written in C# using Visual Studio 2012 and .NET 4.0 / .NET 4.5. It rapidly executes many queries in Microsoft SQL Server 2008 R2 and then exits. Executed query count varies from several hundred to many thousand.

Today, when executed particular task with very many queries, I noticed an error in the log file:
A transport-level error has occurred when receiving results from the server. (provider: Session Provider, error: 19 - Physical connection is not usable)
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning()
at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
at System.Data.SqlClient.TdsParserStateObject.ReadSni(DbAsyncResult asyncResult, TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket()
at System.Data.SqlClient.TdsParserStateObject.ReadBuffer()
at System.Data.SqlClient.TdsParserStateObject.ReadByte()
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader()
at MyTestApp.Program.QueryOneLineOneValue(String cmdText)
at MyTestApp.Program.Worker(String[] args)
at MyTestApp.Program.Main(String[] args)

At first, the error message lead me to think that something is wrong with the network. “Physical connection is not usable”… anyway it didn’t say anything useful… and SQL Server log file didn’t have anything useful either.

A quick Google search revealed that many people have the same problem:

  1. Minimizing Connection Pool errors in SQL Azure — it has code example with an serious bug, and it leads to Reliability Update 1 for the .NET Framework 4 which may help judging from the description – “Issue 14 – A transport-level error has occurred when sending the request to the server.” I didn’t tried it, mostly because it talks about Azure services.
  2. Discussion in the Microsoft SQL Server Database Engine forum which leads to MSDN article about SQL Server Connection Pooling which again does not help much, because I have only one connection in my application.
  3. And a couple not very useful pages from the Stack Overflow: One suggests to call ClearAllPools and another one is about Azure again.

Again, I had to find solution by myself. I looked at the code and didn’t see anything unusual. I did another run under the VS debugger, and after a while the same exception was thrown. Looking under debug log file, I noticed that program stopped in the same place that in the production server. And by the same place I mean, the same amount of SQL commands were executed.

I quickly put a counter on right before SQL command execution and found that .NET throws exception always on 32767 command. 32767 is very familiar constant for programmers, and it is obvious that some resource leakage is happening. A quick glance at the code again revealed that a call to Close or Using statement is missing.

SqlConnection conn;
private static void QueryOneLineOneValue(String cmdText)
      string q = "SELECT ... FROM cmdText...";
      SqlCommand command =
         new SqlCommand(q, conn);
      SqlDataReader reader = command.ExecuteReader();
      while (reader.Read())
         //do something with the data...
      reader.Close(); //this line was missing

Real men don't make backups