Warren  P Warren P - 4 months ago 55
C# Question

dotnet process inside IIS crashes. How can I determine cause and remediate?

We are testing an ASP.net core application which crashes while running the dotnet.exe process inside the "reverse proxy" system that IIS automatically uses, if you have installed the IIS .net core hosting option into your machine.

Once the dotnet processes (five in our case) have crashed, all requests return 502 errors, and IIS does not recover by relaunching the dotnet processes.

There is nothing in the IIS logging (in event viewer) to indicate why the pool is crashing, and sometimes the IIS admin UI shows that the pool is "stopped" and sometimes that it is "started", but in task manager, all dotnet.exe processes have gone away.

We're going to bisect (rebuild from previous commits of our code) until we isolate it, but this question is mostly about the ISS and DotNet reverse proxy:


  • Is there anywhere else than Event viewer (
    eventvwr.msc
    ) that could contain relevant logging on the IIS reverse proxy and the dotnet.exe host process? There are no current items in event viewer that seem relevant. There are messages for days other than the days we have had the crashes, that show WAS starting and restarting IIS pools, but none when the dotnet processes have just gone away.



iis messages in event viewer


  • Is there any known bug in dotnet runtime causing it to crash, and IIS does not recover and restart the process?

  • The only logging I see from IIS is the GET/POST requests, no useful information in
    C:\inetpub\logs\LogFiles\...



Update: Starting today I now see some relevant event viewer entries:
Failed to start process with commandline '"dotnet" .\MyCompany.Authorization.Api.dll', ErrorCode = '0x80004005'.
from Source: IIS AspNetCore Module

Answer

It seems that IIS itself is crashing in this case, not the dotnet process inside my app pool. My evidence is that the log files in C:\inetpub\logs\LogFiles\W3SVC1 show this, which looks like IIS itself is crashing and being restarted. Note the "#" at the beginning of what looks like some initial bringup/startup logging:

2016-08-03 13:30:09 W3SVC1 192.168.215.35 GET /api/v3/fhir/patient/311 - 80 - 192.168.215.164 HTTP/1.1 python-requests/2.10.0 - dev-html5.mycompany.biz 404 0 2 1405 1099 0
2016-08-03 13:31:19 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/code _dc=1470231079494 80 - 192.168.225.119 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 200 0 0 206 544 109
2016-08-03 13:31:21 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/imagingstudy/297 _dc=1470231081268 80 - 192.168.225.119 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 200 0 0 206 556 312
2016-08-03 13:31:33 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/code _dc=1470231093098 80 - 192.168.225.119 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 200 0 0 206 544 53
2016-08-03 13:31:34 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/imagingstudy/297 _dc=1470231094714 80 - 192.168.225.119 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 200 0 0 206 556 59
#Software: Microsoft Internet Information Services 8.5
#Version: 1.0
#Date: 2016-08-03 13:33:03
#Fields: date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
2016-08-03 13:33:03 W3SVC1 192.168.215.35 GET /api/v3/fhir/organization/1 - 80 - 192.168.215.35 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.3;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 - dev-html5.mycompany.biz 502 5 0 1627 1255 519
2016-08-03 13:33:10 W3SVC1 192.168.215.35 GET /api/v3/authorize/login username=admin&password=mycompanyx 80 - 192.168.215.164 HTTP/1.1 python-requests/2.10.0 - dev-html5.mycompany.biz 502 5 0 1627 210 520
2016-08-03 13:33:23 W3SVC1 192.168.215.35 GET /api/v3/fhir/organization/1 - 80 - 192.168.215.35 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.3;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 - dev-html5.mycompany.biz 502 5 0 1627 1255 523
#Software: Microsoft Internet Information Services 8.5
#Version: 1.0
#Date: 2016-08-03 13:34:06
#Fields: date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
2016-08-03 13:34:06 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/linkage/patient/315 _dc=1470231246063 80 - 192.168.225.30 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 502 5 0 1627 511 548
#Software: Microsoft Internet Information Services 8.5
#Version: 1.0
#Date: 2016-08-03 13:34:18
#Fields: date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
2016-08-03 13:34:18 W3SVC1 192.168.215.35 GET /api/v3/fhir/organization/1 - 80 - 192.168.215.35 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.3;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 - dev-html5.mycompany.biz 502 5 0 1627 1255 531
2016-08-03 13:34:41 W3SVC1 192.168.215.35 GET /api/v3/authorize/login username=admin&password=mycompanyx 80 - 192.168.215.164 HTTP/1.1 python-requests/2.10.0 - dev-html5.mycompany.biz 502 5 0 1627 210 531
2016-08-03 13:34:46 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/code _dc=1470231282672 80 - 192.168.225.119 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 502 5 0 1627 544 777
2016-08-03 13:34:46 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/imagingstudy/297 _dc=1470231284575 80 - 192.168.225.119 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 502 5 0 1627 556 1037
2016-08-03 13:34:57 W3SVC1 192.168.215.35 GET /api/v3/fhir/organization/1 - 80 - 192.168.215.35 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.3;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 - dev-html5.mycompany.biz 502 5 0 1627 1221 530
2016-08-03 13:35:00 W3SVC1 192.168.215.35 OPTIONS /api/v3/fhir/code _dc=1470231299728 80 - 192.168.225.119 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://localhost:8086/ dev-html5.mycompany.biz 502 5 0 1627 544 527

I am going to try to make a reproducible test base but for now this appears to be related to:

  1. IIS on Windows Server 2012 R2 with the dotnet IIS hosting update installed, and all the latest windows updates.

  2. IIS is hosting multiple applications in a single application pool. This decision (by me) to host multiple applications in a single pool may be "not recommended" by Microsoft due to known problems (like this), but I'll update this answer when I have better information.

Also it seems that IIS Failed Request Tracing (FREB) might be useful when IIS is falling over. See this article.