Junos
Highlighted
Junos

Strange RG failower

2 weeks ago

Good day,

i have two SRX240H2 in cluster mode.

 

Some time ago i have starnge failower of nearly all my redundancy groups. All of them failovered by reason: Monitor failed: IF . 

 

But when i checked inforamtion about monitored interfaces (related to RG), all of them have no flaps (connect\disconnect).

In jsrpd log i have find such events at failover time:

Jul 27 00:18:21 Remote node suspended fabric monitoring
Jul 27 00:18:22 Successfully sent an snmp-trap due to a failover from secondary to primary on RG-1 on cluster 1 node 1. Reason: Remote is in secondary hold
Jul 27 00:18:22 entering primary for RG: 1

Jul 27 00:18:22 updated rg_info for RG-1 with failover-cnt 7 state: primary into ssam. Result = success, error: 0
Jul 27 00:18:22 reth1 ifd state changed from node0-primary -> node1-primary for RG-1
Jul 27 00:18:22 updating primary-node as node1 for RG-1 into ssam. Previous primary was node0. Result = success, Unknown error: 0
Jul 27 00:18:22 success or tried over 0: updating primary-node as node1 for RG-1 into ssam. Previous primary was node0.
Jul 27 00:18:22 Successfully sent an snmp-trap due to a failover from secondary to primary on RG-1 on cluster 1 node 1. Reason: Remote is in secondary hold
Jul 27 00:18:22 Successfully sent an snmp-trap due to a failover from secondary to primary on RG-2 on cluster 1 node 1. Reason: Remote is in secondary hold
Jul 27 00:18:22 entering primary for RG: 2

Jul 27 00:18:22 updated rg_info for RG-2 with failover-cnt 15 state: primary into ssam. Result = success, error: 0
Jul 27 00:18:22 reth3 ifd state changed from node0-primary -> node1-primary for RG-2
Jul 27 00:18:22 updating primary-node as node1 for RG-2 into ssam. Previous primary was node0. Result = success, Unknown error: 0

 

etc. for all other RG

 

In chassis cluster information i have found foloowing events:

Fabric link events

node1:
Jul 18 20:21:13.772 : Fabric link fab0 is up
Jul 18 20:21:13.773 : Fabric link fab1 is up
Jul 18 20:31:17.721 : Fabric link fab0 is up
Jul 18 20:31:17.735 : Fabric link fab1 is up
Jul 23 01:39:44.693 : Fabric monitoring is suspended by remote node
Jul 23 01:40:04.201 : Fabric monitoring suspension is revoked by remote node
Jul 26 12:00:45.540 : Fabric monitoring is suspended by remote node
Jul 26 12:01:00.571 : Fabric monitoring suspension is revoked by remote node
Jul 27 00:18:21.058 : Fabric monitoring is suspended by remote node
Jul 27 00:18:42.085 : Fabric monitoring suspension is revoked by remote node

node0:
Jul 27 00:18:41.907 : Child ge-5/0/2 of fab1 is up
Jul 27 00:18:41.922 : Fabric link fab0 is up
Jul 27 00:18:42.001 : Fabric link fab0 is up
Jul 27 00:18:42.006 : Child ge-0/0/3 added to fab0
Jul 27 00:18:42.006 : Child ge-0/0/3 of fab0 is up
Jul 27 00:18:42.008 : Child link-0 of fab0 is up, pfe notification
Jul 27 00:18:42.016 : Fabric link fab0 is up
Jul 27 00:18:42.019 : Child ge-0/0/2 added to fab0
Jul 27 00:18:42.019 : Child ge-0/0/2 of fab0 is up
Jul 27 00:18:42.022 : Child link-0 of fab0 is up, pfe notification

 

It`s seems that some fabric link connnection problem was. So i also checked detailed information about Fabric Link interfaces, but it still haven`t any flaps count.

 

Any idea what was the case of my failower?

 

I also hawe some core dump on primary node in failover time, but i can`t read it. And my monitoring system informs about high CPU utilization on primary node at this time.

 

Best Regards,

Oleg

 

5 REPLIES 5
Highlighted
Junos

Re: Strange RG failower

2 weeks ago

Hello Oleg,

 

Can you answer the below questions?

 

  1. What is the Junos version running on this device?
  2. I believe all the RG1+ groups have been failed over from Node 0 to Node 1. Am I right?
  3. You had failover only for RG1+ group or for RG0 group as well?
  4. jsrpd logs only provide you with the info related to cluster transitions. If you want to actually know the root cause for the IF monitoring failure you need to check messages log and chassisd logs during the time of issue.
  5. What core-dump do you see on the Primary Node? Besides, is the timestamp of the core-dump and the IF monitoring failure are the same?
  6. When you mean High CPU utilization, are your referring to Control Plane CPU or Dataplane CPU?

If you would like to decode the core-dumps, you need to contact JTAC for analysis. However, let me help you on a best effort basis.



Thanks,
π00bm@$t€®.
Please, Mark My Solution Accepted if it Helped, Kudos are Appreciated too!!!
Highlighted
Junos

Re: Strange RG failower

2 weeks ago

Hi,

 

1. JUNOS Software Release is 12.3X48-D101

2. Yes, you are right. All RG1+ groups are failower from Node 0 to Node 1

3. Yes, i had failover only RG1+ groups. RG0 didn`t moved. It works on its primary node 1

4. There is no information in chassisd logs during the time of issue . I will check messages log for more information.

5. I have ksyncd core-dump on primary node and its timestamp matches exactly  IF monitoring failure.

6. About CPU utilization. I have information from zabbix. It checks chassis routing-engine CPU utilization, user activity plus kernel activity.

 

I will sent more information, after messages log checking.

 

Best Regards,

Oleg

Highlighted
Junos

Re: Strange RG failower

2 weeks ago

Hi Oleg,

 

Thank you for providing that information.

 

At this point, I would suggest you to open a case with JTAC because decoding that core should be the first step that needs to be carried out.



Thanks,
π00bm@$t€®.
Please, Mark My Solution Accepted if it Helped, Kudos are Appreciated too!!!
Highlighted
Junos

Re: Strange RG failower

2 weeks ago

Please provide output of below mentioned commands:

 

show chassis cluster interfaces | no-more

show chassis cluster information detail | no-more

jsrpd logs from node0

 

 

Thanks,
Nellikka
JNCIE x3 (SEC #321; SP #2839; ENT #790)
Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too!!!
Highlighted
Junos

Re: Strange RG failower

a week ago

Hi Nellika,

 

here is asked infromation:

show chassis cluster interfaces | no-more
Control link status: Up

Control interfaces:
Index Interface Monitored-Status Internal-SA
0 fxp1 Up Disabled

Fabric link status: Up

Fabric interfaces:
Name Child-interface Status
(Physical/Monitored)
fab0 ge-0/0/2 Up / Up
fab0 ge-0/0/3 Up / Up
fab1 ge-5/0/2 Up / Up
fab1 ge-5/0/3 Up / Up

Redundant-ethernet Information:
Name Status Redundancy-group
reth0 Up 6
reth1 Up 1
reth2 Up 3
reth3 Up 2
reth4 Up 5
reth5 Down Not configured
reth6 Up 7

Redundant-pseudo-interface Information:
Name Status Redundancy-group
lo0 Up 0

Interface Monitoring:
Interface Weight Status Redundancy-group
ge-5/0/7 127 Up 1
ge-5/0/6 127 Up 1
ge-5/0/5 127 Up 1
ge-5/0/4 127 Up 1
ge-0/0/7 127 Up 1
ge-0/0/6 127 Up 1
ge-0/0/5 127 Up 1
ge-0/0/4 127 Up 1
ge-0/0/9 255 Up 2
ge-0/0/8 255 Up 3
ge-0/0/11 255 Up 5
ge-0/0/15 255 Up 6
ge-5/0/14 255 Up 7

 

show chassis cluster information detail | no-more
node0:
--------------------------------------------------------------------------
Redundancy mode:
Configured mode: active-active
Operational mode: active-active
Cluster configuration:
Heartbeat interval: 1000 ms
Heartbeat threshold: 3
Control link recovery: Disabled
Fabric link down timeout: 66 sec
Node health information:
Local node health: Healthy
Remote node health: Healthy

Redundancy group: 0, Threshold: 255, Monitoring failures: none
Events:
Jul 1 13:28:42.295 : hold->secondary, reason: Hold timer expired

Redundancy group: 1, Threshold: 255, Monitoring failures: none
Events:
Jul 27 02:00:10.317 : secondary->primary, reason: Remote yield (100/0)
Jul 30 05:00:16.971 : primary->secondary-hold, reason: Monitor failed: IF
Jul 30 05:00:17.979 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 06:00:09.553 : secondary->primary, reason: Remote is in secondary hold
Jul 30 18:03:43.251 : primary->secondary-hold, reason: Monitor failed: IF
Jul 30 18:03:44.253 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 23:00:09.687 : secondary->primary, reason: Remote yield (100/0)
Aug 1 20:55:13.576 : primary->secondary-hold, reason: Monitor failed: IF
Aug 1 20:55:14.579 : secondary-hold->secondary, reason: Ready to become secondary
Aug 1 20:57:14.810 : secondary->primary, reason: Remote yield (100/0)

Redundancy group: 2, Threshold: 255, Monitoring failures: none
Events:
Jul 27 00:18:43.131 : secondary->primary, reason: Remote is in secondary hold
Jul 30 05:00:16.749 : primary->secondary-hold, reason: Monitor failed: IF
Jul 30 05:00:17.751 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 05:00:34.193 : secondary->primary, reason: Remote is in secondary hold
Jul 30 18:03:43.252 : primary->secondary-hold, reason: Monitor failed: IF
Jul 30 18:03:44.254 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 18:04:07.295 : secondary->primary, reason: Remote is in secondary hold
Aug 1 20:55:13.577 : primary->secondary-hold, reason: Monitor failed: IF
Aug 1 20:55:14.583 : secondary-hold->secondary, reason: Ready to become secondary
Aug 1 20:55:35.677 : secondary->primary, reason: Remote is in secondary hold

Redundancy group: 3, Threshold: 255, Monitoring failures: none
Events:
Jul 27 00:18:43.132 : secondary->primary, reason: Remote is in secondary hold
Jul 30 05:00:16.750 : primary->secondary-hold, reason: Monitor failed: IF
Jul 30 05:00:17.752 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 05:00:34.194 : secondary->primary, reason: Remote is in secondary hold
Jul 30 18:03:43.253 : primary->secondary-hold, reason: Monitor failed: IF
Jul 30 18:03:44.256 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 18:04:07.298 : secondary->primary, reason: Remote is in secondary hold
Aug 1 20:55:13.578 : primary->secondary-hold, reason: Monitor failed: IF
Aug 1 20:55:14.587 : secondary-hold->secondary, reason: Ready to become secondary
Aug 1 20:55:35.678 : secondary->primary, reason: Remote is in secondary hold

Redundancy group: 5, Threshold: 255, Monitoring failures: none
Events:
Jul 27 00:18:43.133 : secondary->primary, reason: Remote is in secondary hold
Jul 30 05:00:16.860 : primary->secondary-hold, reason: Monitor failed: IF
Jul 30 05:00:17.863 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 05:00:34.194 : secondary->primary, reason: Remote is in secondary hold
Jul 30 18:03:43.254 : primary->secondary-hold, reason: Monitor failed: IF
Jul 30 18:03:44.257 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 18:04:07.299 : secondary->primary, reason: Remote is in secondary hold
Aug 1 20:55:13.174 : primary->secondary-hold, reason: Monitor failed: IF
Aug 1 20:55:14.177 : secondary-hold->secondary, reason: Ready to become secondary
Aug 1 20:55:35.678 : secondary->primary, reason: Remote is in secondary hold

Redundancy group: 6, Threshold: 255, Monitoring failures: none
Events:
Jul 27 00:18:43.134 : secondary->primary, reason: Remote is in secondary hold
Jul 30 05:00:17.070 : primary->secondary-hold, reason: Monitor failed: IF
Jul 30 05:00:18.072 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 05:00:34.195 : secondary->primary, reason: Remote is in secondary hold
Jul 30 18:03:43.256 : primary->secondary-hold, reason: Monitor failed: IF
Jul 30 18:03:44.261 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 18:04:07.299 : secondary->primary, reason: Remote is in secondary hold
Aug 1 20:55:13.175 : primary->secondary-hold, reason: Monitor failed: IF
Aug 1 20:55:14.178 : secondary-hold->secondary, reason: Ready to become secondary
Aug 1 20:55:35.686 : secondary->primary, reason: Remote is in secondary hold

Redundancy group: 7, Threshold: 255, Monitoring failures: none
Events:
Jul 1 13:28:42.806 : hold->secondary, reason: Hold timer expired
Control link statistics:
Control link 0:
Heartbeat packets sent: 3017880
Heartbeat packets received: 3017405
Heartbeat packet errors: 0
Duplicate heartbeat packets received: 0
Control recovery packet count: 0
Sequence number of last heartbeat packet sent: 3017906
Sequence number of last heartbeat packet received: 3017472
Fabric link statistics:
Child link 0
Probes sent: 6043389
Probes received: 6043379
Child link 1
Probes sent: 6043385
Probes received: 6043376
Switch fabric link statistics:
Probe state : DOWN
Probes sent: 0
Probes received: 0
Probe recv errors: 0
Probe send errors: 0
Probe recv dropped: 0
Sequence number of last probe sent: 0
Sequence number of last probe received: 0

Chassis cluster LED information:
Current LED color: Green
Last LED change reason: No failures
Control port tagging:
Disabled

Cold Synchronization:
Status:
Cold synchronization completed for: N/A
Cold synchronization failed for: N/A
Cold synchronization not known for: N/A
Current Monitoring Weight: 0

Progress:
CS Prereq 1 of 1 SPUs completed
1. if_state sync 1 SPUs completed
2. fabric link 1 SPUs completed
3. policy data sync 1 SPUs completed
4. cp ready 1 SPUs completed
5. VPN data sync 1 SPUs completed
6. Dynamic addr sync 1 SPUs completed
CS RTO sync 1 of 1 SPUs completed
CS Postreq 1 of 1 SPUs completed

Statistics:
Number of cold synchronization completed: 0
Number of cold synchronization failed: 0

Events:
Jul 1 13:30:26.874 : Cold sync for PFE is RTO sync in process
Jul 1 13:30:28.859 : Cold sync for PFE is Post-req check in process
Jul 1 13:30:30.617 : Cold sync for PFE is Completed

Loopback Information:

PIC Name Loopback Nexthop Mbuf
-------------------------------------------------
Success Success Success

Interface monitoring:
Statistics:
Monitored interface failure count: 532

Events:
Aug 5 08:00:19.553 : Interface ge-5/0/7 monitored by rg 1, changed state from Up to Down
Aug 5 08:00:19.752 : Interface ge-5/0/4 monitored by rg 1, changed state from Up to Down
Aug 5 08:00:19.910 : Interface ge-5/0/5 monitored by rg 1, changed state from Up to Down
Aug 5 08:00:20.072 : Interface ge-5/0/6 monitored by rg 1, changed state from Up to Down
Aug 5 08:00:20.152 : Interface ge-5/0/7 monitored by rg 1, changed state from Down to Up
Aug 5 08:00:20.362 : Interface ge-5/0/4 monitored by rg 1, changed state from Down to Up
Aug 5 08:00:20.602 : Interface ge-5/0/5 monitored by rg 1, changed state from Down to Up
Aug 5 08:00:20.863 : Interface ge-5/0/6 monitored by rg 1, changed state from Down to Up
Aug 5 11:00:10.197 : Interface ge-5/0/6 monitored by rg 1, changed state from Up to Down
Aug 5 11:00:10.273 : Interface ge-5/0/6 monitored by rg 1, changed state from Down to Up

Fabric monitoring:
Status:
Fabric Monitoring: Enabled
Activation status: Active
Fabric Status reported by data plane: Up
JSRPD internal fabric status: Up

Fabric link events:
Aug 1 20:55:35.038 : Child ge-5/0/2 of fab1 is up
Aug 1 20:55:35.039 : Fabric link fab0 is up
Aug 1 20:55:35.073 : Fabric link fab0 is up
Aug 1 20:55:35.085 : Child ge-0/0/3 added to fab0
Aug 1 20:55:35.086 : Child ge-0/0/3 of fab0 is up
Aug 1 20:55:35.086 : Fabric link fab0 is up
Aug 1 20:55:35.093 : Child ge-0/0/2 added to fab0
Aug 1 20:55:35.093 : Child ge-0/0/2 of fab0 is up
Aug 1 20:55:35.783 : Child link-0 of fab0 is up, pfe notification
Aug 1 20:55:35.783 : Child link-1 of fab0 is up, pfe notification

Control link status: Up
Server information:
Server status : Inactive
Server connected to None
Client information:
Client status : Connected
Client connected to 130.16.0.1/62845
Control port tagging:
Disabled

Control link events:
Jul 1 13:28:11.425 : Control link fxp1 is down
Jul 1 13:28:22.598 : Control link fxp1 is down
Jul 1 13:28:23.376 : Control link fxp1 is up
Jul 1 13:29:06.458 : Control link fxp1 is up
Jul 1 13:29:11.841 : Control link fxp1 is up
Jul 1 13:33:11.365 : Control link fxp1 is up
Jul 18 20:21:00.486 : Control link fxp1 is up
Jul 18 20:31:03.475 : Control link fxp1 is up

Hardware monitoring:
Status:
Activation status: Enabled
Redundancy group 0 failover for hardware faults: Enabled
Hardware redundancy group 0 errors: 0
Hardware redundancy group 1 errors: 0

Schedule monitoring:
Status:
Activation status: Disabled
Schedule slip detected: None
Timer ignored: No

Statistics:
Total slip detected count: 3
Longest slip duration: 9(s)

Events:
Jul 1 13:28:26.443 : Detected schedule slip
Jul 1 13:29:28.495 : Cleared schedule slip
Jul 1 13:30:38.318 : Detected schedule slip
Jul 1 13:31:38.406 : Cleared schedule slip
Jul 1 13:33:22.031 : Detected schedule slip
Jul 1 13:34:22.122 : Cleared schedule slip

Configuration Synchronization:
Status:
Activation status: Enabled
Last sync operation: Auto-Sync
Last sync result: Succeeded
Last sync mgd messages:
mgd: rcp: /config/juniper.conf: No such file or directory
Non-existant dump device /dev/bo0s1b
mgd: commit complete

Events:
Jul 1 13:29:16.147 : Auto-Sync: In progress. Attempt: 1
Jul 1 13:33:13.972 : Auto-Sync: Clearing mgd. Attempt: 1
Jul 1 13:33:22.023 : Auto-Sync: Succeeded. Attempt: 1

Cold Synchronization Progress:
CS Prereq 1 of 1 SPUs completed
1. if_state sync 1 SPUs completed
2. fabric link 1 SPUs completed
3. policy data sync 1 SPUs completed
4. cp ready 1 SPUs completed
5. VPN data sync 1 SPUs completed
6. Dynamic addr sync 1 SPUs completed
CS RTO sync 1 of 1 SPUs completed
CS Postreq 1 of 1 SPUs completed

node1:
--------------------------------------------------------------------------
Redundancy mode:
Configured mode: active-active
Operational mode: active-active
Cluster configuration:
Heartbeat interval: 1000 ms
Heartbeat threshold: 3
Control link recovery: Disabled
Fabric link down timeout: 66 sec
Node health information:
Local node health: Healthy
Remote node health: Healthy

Redundancy group: 0, Threshold: 255, Monitoring failures: none
Events:
Jul 1 13:27:23.178 : hold->secondary, reason: Hold timer expired
Jul 1 13:27:40.247 : secondary->primary, reason: Only node present

Redundancy group: 1, Threshold: 255, Monitoring failures: none
Events:
Jul 27 02:00:12.507 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 05:00:16.617 : secondary->primary, reason: Remote yield (1/0)
Jul 30 06:00:09.525 : primary->secondary-hold, reason: Monitor failed: IF
Jul 30 06:00:10.552 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 18:03:43.256 : secondary->primary, reason: Remote is in secondary hold
Jul 30 23:00:10.746 : primary->secondary-hold, reason: Monitor failed: IF
Jul 30 23:00:12.025 : secondary-hold->secondary, reason: Ready to become secondary
Aug 1 20:55:12.874 : secondary->primary, reason: Remote yield (1/0)
Aug 1 20:57:15.826 : primary->secondary-hold, reason: Monitor failed: IF
Aug 1 20:57:17.077 : secondary-hold->secondary, reason: Ready to become secondary

Redundancy group: 2, Threshold: 255, Monitoring failures: none
Events:
Jul 27 00:18:44.096 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 05:00:16.747 : secondary->primary, reason: Remote is in secondary hold
Jul 30 05:00:34.121 : primary->secondary-hold, reason: Preempt (1/100)
Jul 30 05:00:35.140 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 18:03:43.419 : secondary->primary, reason: Remote is in secondary hold
Jul 30 18:04:07.230 : primary->secondary-hold, reason: Preempt (1/100)
Jul 30 18:04:08.246 : secondary-hold->secondary, reason: Ready to become secondary
Aug 1 20:55:13.029 : secondary->primary, reason: Remote yield (1/0)
Aug 1 20:55:35.621 : primary->secondary-hold, reason: Preempt (1/100)
Aug 1 20:55:36.629 : secondary-hold->secondary, reason: Ready to become secondary

Redundancy group: 3, Threshold: 255, Monitoring failures: none
Events:
Jul 27 00:18:44.107 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 05:00:16.794 : secondary->primary, reason: Remote is in secondary hold
Jul 30 05:00:34.140 : primary->secondary-hold, reason: Preempt (1/100)
Jul 30 05:00:35.153 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 18:03:43.467 : secondary->primary, reason: Remote is in secondary hold
Jul 30 18:04:07.245 : primary->secondary-hold, reason: Preempt (1/100)
Jul 30 18:04:08.257 : secondary-hold->secondary, reason: Ready to become secondary
Aug 1 20:55:13.089 : secondary->primary, reason: Remote yield (1/0)
Aug 1 20:55:35.628 : primary->secondary-hold, reason: Preempt (1/100)
Aug 1 20:55:36.638 : secondary-hold->secondary, reason: Ready to become secondary

Redundancy group: 5, Threshold: 255, Monitoring failures: none
Events:
Jul 27 00:18:44.121 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 05:00:16.857 : secondary->primary, reason: Remote is in secondary hold
Jul 30 05:00:34.153 : primary->secondary-hold, reason: Preempt (1/100)
Jul 30 05:00:35.167 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 18:03:43.539 : secondary->primary, reason: Remote is in secondary hold
Jul 30 18:04:07.256 : primary->secondary-hold, reason: Preempt (1/100)
Jul 30 18:04:08.267 : secondary-hold->secondary, reason: Ready to become secondary
Aug 1 20:55:13.188 : secondary->primary, reason: Remote is in secondary hold
Aug 1 20:55:35.638 : primary->secondary-hold, reason: Preempt (1/100)
Aug 1 20:55:36.652 : secondary-hold->secondary, reason: Ready to become secondary

Redundancy group: 6, Threshold: 255, Monitoring failures: none
Events:
Jul 27 00:18:44.129 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 05:00:17.068 : secondary->primary, reason: Remote is in secondary hold
Jul 30 05:00:34.164 : primary->secondary-hold, reason: Preempt (1/100)
Jul 30 05:00:35.174 : secondary-hold->secondary, reason: Ready to become secondary
Jul 30 18:03:43.722 : secondary->primary, reason: Remote is in secondary hold
Jul 30 18:04:07.265 : primary->secondary-hold, reason: Preempt (1/100)
Jul 30 18:04:08.282 : secondary-hold->secondary, reason: Ready to become secondary
Aug 1 20:55:13.233 : secondary->primary, reason: Remote is in secondary hold
Aug 1 20:55:35.650 : primary->secondary-hold, reason: Preempt (1/100)
Aug 1 20:55:36.666 : secondary-hold->secondary, reason: Ready to become secondary

Redundancy group: 7, Threshold: 255, Monitoring failures: none
Events:
Jul 1 13:27:23.215 : hold->secondary, reason: Hold timer expired
Jul 1 13:27:40.473 : secondary->primary, reason: Only node present
Control link statistics:
Control link 0:
Heartbeat packets sent: 3017460
Heartbeat packets received: 3017880
Heartbeat packet errors: 0
Duplicate heartbeat packets received: 0
Control recovery packet count: 0
Sequence number of last heartbeat packet sent: 3017473
Sequence number of last heartbeat packet received: 3017906
Fabric link statistics:
Child link 0
Probes sent: 6043540
Probes received: 6043388
Child link 1
Probes sent: 6043538
Probes received: 6043385
Switch fabric link statistics:
Probe state : DOWN
Probes sent: 0
Probes received: 0
Probe recv errors: 0
Probe send errors: 0
Probe recv dropped: 0
Sequence number of last probe sent: 0
Sequence number of last probe received: 0

Chassis cluster LED information:
Current LED color: Green
Last LED change reason: No failures
Control port tagging:
Disabled

Cold Synchronization:
Status:
Cold synchronization completed for: N/A
Cold synchronization failed for: N/A
Cold synchronization not known for: N/A
Current Monitoring Weight: 0

Progress:
CS Prereq 1 of 1 SPUs completed
1. if_state sync 1 SPUs completed
2. fabric link 1 SPUs completed
3. policy data sync 1 SPUs completed
4. cp ready 1 SPUs completed
5. VPN data sync 1 SPUs completed
6. Dynamic addr sync 1 SPUs completed
CS RTO sync 1 of 1 SPUs completed
CS Postreq 1 of 1 SPUs completed

Statistics:
Number of cold synchronization completed: 0
Number of cold synchronization failed: 0

Events:
Jul 1 13:30:26.271 : Cold sync for PFE is RTO sync in process
Jul 1 13:30:26.385 : Cold sync for PFE is Completed

Loopback Information:

PIC Name Loopback Nexthop Mbuf
-------------------------------------------------
Success Success Success

Interface monitoring:
Statistics:
Monitored interface failure count: 475

Events:
Aug 5 08:00:19.382 : Interface ge-5/0/7 monitored by rg 1, changed state from Up to Down
Aug 5 08:00:19.588 : Interface ge-5/0/4 monitored by rg 1, changed state from Up to Down
Aug 5 08:00:19.764 : Interface ge-5/0/5 monitored by rg 1, changed state from Up to Down
Aug 5 08:00:19.979 : Interface ge-5/0/6 monitored by rg 1, changed state from Up to Down
Aug 5 08:00:20.168 : Interface ge-5/0/7 monitored by rg 1, changed state from Down to Up
Aug 5 08:00:20.353 : Interface ge-5/0/4 monitored by rg 1, changed state from Down to Up
Aug 5 08:00:20.611 : Interface ge-5/0/5 monitored by rg 1, changed state from Down to Up
Aug 5 08:00:20.887 : Interface ge-5/0/6 monitored by rg 1, changed state from Down to Up
Aug 5 11:00:10.010 : Interface ge-5/0/6 monitored by rg 1, changed state from Up to Down
Aug 5 11:00:10.294 : Interface ge-5/0/6 monitored by rg 1, changed state from Down to Up

Fabric monitoring:
Status:
Fabric Monitoring: Enabled
Activation status: Active
Fabric Status reported by data plane: Up
JSRPD internal fabric status: Up

Fabric link events:
Jul 26 12:00:45.540 : Fabric monitoring is suspended by remote node
Jul 26 12:01:00.571 : Fabric monitoring suspension is revoked by remote node
Jul 27 00:18:21.058 : Fabric monitoring is suspended by remote node
Jul 27 00:18:42.085 : Fabric monitoring suspension is revoked by remote node
Jul 30 05:00:15.615 : Fabric monitoring is suspended by remote node
Jul 30 05:00:33.643 : Fabric monitoring suspension is revoked by remote node
Jul 30 18:03:42.092 : Fabric monitoring is suspended by remote node
Jul 30 18:04:05.225 : Fabric monitoring suspension is revoked by remote node
Aug 1 20:55:11.875 : Fabric monitoring is suspended by remote node
Aug 1 20:55:36.063 : Fabric monitoring suspension is revoked by remote node

Control link status: Up
Server information:
Server status : Connected
Server connected to 129.16.0.1/60967
Client information:
Client status : Inactive
Client connected to None
Control port tagging:
Disabled

Control link events:
Jul 1 13:26:47.513 : Control link fxp1 is down
Jul 1 13:26:55.586 : Control link fxp1 is down
Jul 1 13:26:58.261 : Control link fxp1 is up
Jul 1 13:27:40.251 : Control link fxp1 is up
Jul 1 13:28:07.511 : Control link fxp1 is up
Jul 1 13:28:50.220 : Control link fxp1 is up
Jul 1 13:28:55.103 : Control link fxp1 is up
Jul 1 13:29:54.844 : Control link fxp1 is up
Jul 18 20:21:13.797 : Control link fxp1 is up
Jul 18 20:31:17.750 : Control link fxp1 is up

Hardware monitoring:
Status:
Activation status: Enabled
Redundancy group 0 failover for hardware faults: Enabled
Hardware redundancy group 0 errors: 0
Hardware redundancy group 1 errors: 0

Schedule monitoring:
Status:
Activation status: Disabled
Schedule slip detected: None
Timer ignored: No

Statistics:
Total slip detected count: 5
Longest slip duration: 40(s)

Events:
Jul 1 13:27:03.130 : Detected schedule slip
Jul 1 13:27:24.208 : Detected schedule slip before it got cleared
Jul 1 13:28:51.168 : Detected schedule slip before it got cleared
Jul 1 13:29:15.599 : Detected schedule slip before it got cleared
Jul 1 13:29:38.294 : Detected schedule slip before it got cleared
Jul 1 13:30:40.291 : Cleared schedule slip

Configuration Synchronization:
Status:
Activation status: Enabled
Last sync operation: Auto-Sync
Last sync result: Not needed
Last sync mgd messages:

Events:
Jul 1 13:28:50.180 : Auto-Sync: Not needed.

Cold Synchronization Progress:
CS Prereq 1 of 1 SPUs completed
1. if_state sync 1 SPUs completed
2. fabric link 1 SPUs completed
3. policy data sync 1 SPUs completed
4. cp ready 1 SPUs completed
5. VPN data sync 1 SPUs completed
6. Dynamic addr sync 1 SPUs completed
CS RTO sync 1 of 1 SPUs completed
CS Postreq 1 of 1 SPUs completed

 

 

Jsrpd log file in attachment. Its logs are from first failover (27 Jul at 00:18). There is strange flapping of RG1, but it should be some another question in my opinion.

Attachments

Feedback