Hi All,It is possible to automatically remove the collectors from CUG if the cpu/memory have reached above 90 % . The failover of devices should work with the performance of DC load as well ,not only down.

Hello Isaac, At the moment, the CUG load balancing does not have this capability. I recommend submitting this as a feature enhancement to the Ideas Hub here in the Nexus Community, so that our Product Management team is aware there is a desire for increased performance considerations in regard to CUG load balancing. Antonio Andres Principal Technical Support Engineer | ScienceLogic

Automatic CUG Addition /Removal | Nexus ScienceLogic Community

12 Replies

TonyAndres
Moderator
2 months ago
Hello Isaac,

Currently, aligning and removing collectors from a collector group is a manual process. Removing a collector if it has CPU or memory utilization over 90% is not advisable because we do not know what the root cause of what is causing the performance load. For example, if there is a single device that is misconfigured and is causing CPU to spike above 90%, removing the collector will simply load balance this device to a different collector and then that new collector will have its CPU spike to over 90%. Not to mention the added load of redistributing those devices to the remaining collectors.

If you do see a collector consistently at over 90%, adding a collector generally helps with spreading the load. If that still does not work, please submit a case to ScienceLogic Support and we can investigate the root cause of the issue and provide a solution.

Antonio Andres

Principal Technical Support Engineer | ScienceLogic
- Issac
  Expert
  2 months ago
  Hi Andres ,
  At high level these must be considered for load balancing , not only DC down can cause service outage , if we have resource crunch the polling is going to be affected for end devices and there will be missed polls where the data and alerts might be missed.
  - TonyAndres
    Moderator
    2 months ago
    Hello Isaac,
    
    At the moment, the CUG load balancing does not have this capability. I recommend submitting this as a feature enhancement to the Ideas Hub here in the Nexus Community, so that our Product Management team is aware there is a desire for increased performance considerations in regard to CUG load balancing.
    
    Antonio Andres
    
    Principal Technical Support Engineer | ScienceLogic
Jasonkeck-GDIT
Leader
28 days ago
if there was some feature that facilitated this I would suggest that it was triggered via runbook actions that can be configured and not something directly related to the CUG setup. For instance in the use case given, if you oversubscribe a CUG and all of the collectors were running over a specific CPU utilization you would not want all of your collectors removed from the group or put into a failed state. that would create complete data loss and you would be much worse off than just having some data gaps or delayed data. Lots of use cases would need to be explored to develop this kind of feature.

Jason.
- Issac
  Expert
  24 days ago
  Yes I agree its needs some logic to be defined as well for this not use case.
Jasonkeck-GDIT
Leader
21 days ago
Issac if this is something you want in the product we should come up with some reasonable use cases and then you can write something up in depth in Ideas Hub. But this would take a long time to implement into the product and has the potential to to cause more issues than solve them in my opinion. That said, its going to be a tough sell to the product management team in charge of CUG management.
- Issac
  Expert
  21 days ago
  Jasonkeck-GDIT Assume that you have CUG running with 6 collectors with 700 devices each , if one of your collector process is hung or busy the polling of 700 devices are impacted . Since the collector availability is up. During this case you are getting data gaps for 700 devices as well as alerts are missed.
  If we have a smart way of actually checking the core process loads and performance the failover should happen . Example if mysql is consuming more than 90% cpu ... we should failover the devices to other collectors and it should come out of CUG after that it can recycle its own services.
  - Jasonkeck-GDIT
    Leader
    20 days ago
    Ahhh,, I see where you are going, not just removing the collector from the cug but follow a workflow like this.
    if zero collectors are going through "this process" run load balancing without collector that triggered this process > wait 2 minutes to finish current collections (prevent data loss) > reboot server > wait 5 minutes, validate rows behind on server = zero > mark collector as working > trigger load balancer with this collector in the CUG to re-align devices to this this collector for collection , if anything fails leave out the collector, create a critical event for SL1 Administrators, if everything works as expected create a minor event for Sl1 administrators to look at health of the CUG.
    Does a process like that make sense to you?
Jasonkeck-GDIT
Leader
20 days ago
Sounds like a really cool use case that Skylar Analytics should be able to handle.
Jasonkeck-GDIT
Leader
20 days ago
mweathersbee since you are the PM for Analytics right now bringing this to your attention.

Forum Discussion

Automatic CUG Addition /Removal

12 Replies

Recent Discussions

Device Groups - Dashboard Widget

Run Book Schedule Monitoring

Monitoring GPU's

event_insight database

Windows SNMP is slipping away