Tips for How to Optimize your SL1 System
ScienceLogic SL1 is a very powerful system that can make the lives of your operations staff easier, but those outcomes require some maintenance and optimizations. Read on for a few tips about how to get the most out of your SL1 System 1. Use Event Insights You probably know that Event Insights (Expand left menu -> Events -> Event Insights) is a powerful view to see how SL1 is reducing noise in your system, but did you know that it helps you see potential optimizations also? On the right side of the Event Insights page there is a section called “Tuning Targets”. By looking at what devices are creating the most events and which event policies are resulting in the most events, you can see actions that you can take to clean things up. In the example screenshot above, I would want to look at device 203.0.113.249 to see what is going on to trigger so many events. It could be a misconfiguration on the system, a hardware issue that needs to be remedied, or a threshold that could be tweaked. The lower section shows event policies that are being triggered the most. Noise-reduction options here include requiring multiple triggers within a time frame to be sure that the events are persisting, suppressing the events for test or dev devices, or tweaking thresholds to be sure that the event reflects an actionable problem to fix. 2. Use Operational Insights Operational Insights is a powerpack built by ScienceLogic to help manage your SL1 stack. There are separate versions for self-hosted and SaaS customers, but most of the functionality is the same. Once it’s installed and configured, it presents data on a series of dashboards. These dashboards can help you see the status of your collectors, keep track of the trends in the number of devices discovered and business services configured, and even keep an eye on under-the-hood metrics like Configuration Push time and the Event processing backlog. 3. Daily Health Tasks Did you know that the SL1 Documentation has a list ofdaily health tasks? Some of these items are covered by Event Insights and Operational Insights, but some are not. For example, it’s a great idea to check the System Logs periodically to make sure you know how things are running. If you have concerns about other items in the “Healthy SL1 System” table, you can set event policies and automations to make sure that you are notified in case of any issues. 4. Join the ScienceLogic Nexus Community! Finally, the ScienceLogic Nexus Community is a great resource to keep things working at your best. Interacting with fellow customers, submitting enhancement ideas to the Product Management team, and keeping up with the latest information on new releases helps you plan appropriately and use your time and energy well. To get started: Register with the Nexus Community (its easy) Visit ourCommunity Information Forum for assistance and tips Thanks for your time and I would love to see some feedback and ideas for topics of interest.137Views4likes0CommentsIntroducing our newest Blogger: Joshua Ellsworth- Sr. Sales Engineer
Hello Nexus Community Members and ScienceLogic Customers, I wanted to take a minute to introduce our newest blogger Joshua Ellsworth- Sr. Sales Engineer. Joshua Ellsworth is a Senior Sales Engineer primarily working with Enterprise customers at ScienceLogic. In his eleven years with the company, he filled roles in post-sales professional services and sales operations prior to moving to the pre-sales. He has a wide-ranging background in IT from end user support to systems administration and web development, and has experience programming with python, php, and javascript. Fun fact: Josh has a Master’s Degree in Library and Information Science.20Views0likes0CommentsSkylar AI - The Future of AIOps and Observability
As continued from Part1 of this blog ScienceLogic’s flagship AIOps product, has arguably the industry’s most complete set of integrations for telemetry collection, enabling insights and rapid automation, and supports almost any kind of device and environment - from legacy through to containers and cloud platforms. Skylar AI is designed to work alongside SL1 to deliver a new class of insights, recommendations, visualizations and user experience. At a high-level, the engine behind Skylar AI receives telemetry from SL1 and sends insights back to SL1 that can drive automations, third-party integrations and leverage all other capabilities of SL1. It also has future support to ingest third-party streaming telemetry and customer documents (such as KB articles and support tickets). The data lake within Skylar AI can also be accessed via third-party tools that support ODBC. Skylar AI is Being Released as a Suite of Services Skylar AI will initially comprise three primary services which are briefly described below. 1. SkylarAutomated RCA - AI-based Root Cause Analysis for Logs Solves the pain point of identifying and troubleshooting problems found in application and infrastructure logs. Streaming logs are analyzed in real-time to quickly identify details of problems and summarize their root cause showing key log lines and using GenAI to create plain language summaries. Skylar Automated RCA is available now. 2. Skylar Analytics – A set of AI/ML and advanced analytic and data exploration capabilities Skylar Analytics is designed to derive new value from data collected by SL1 by offering: Predictive Alerting – Accurately models and predicts impending issues with key resources such as disks, memory and CPU. Predictive alerting is designed to let you know about potential future issues with enough time to act and prevent them from happening in the first place. Always on Anomaly Detection – When troubleshooting a problem, allows you to see any associated metrics that appear to be abnormal. There is also a capability to set up alerting when a particular metric is anomalous. Data Visualization – A modern and highly flexible dashboarding environment and no-code visualization builder with over 40 visualization types. Data Exploration – The ability to use third-party reporting and data exploration tools that support the industry standard ODBC interface. Skylar Analytics is planned for release in the fall. 3. Skylar Advisor – Automated guidance to optimize IT operations and proactively avoid issues Skylar Advisor introduces a fundamentally new user experience for the IT Operator. It will automatically provide the user with important insights, predictions and actionable recommendations to keep everything running optimally. Details onavailability of Skylar Advisor are coming soon. The latest information about Skylar AI can be found by visiting: https://sciencelogic.com/platform/skylar-aihttps://sciencelogic.com/platform/skylar-ai.90Views1like0CommentsSkylar AI - The Future of AIOps and Observability
Everything’s down. Frustrated customers are calling, emailing and posting on social media. You assemble a war room. News of the problem is spreading like wildfire and has just reached the CEO at the most inopportune moment (think dinner/vacation/out with the kids). Your cell phone and Slack feed are going nuts. Meanwhile, the front-line technical teams are looking at dashboards and digging through logs, events and metrics, trying to understand what happened. They’re not making much progress, so you escalate and call in more experts. An hour later there’s still no obvious answer. You escalate it again, but this time to the development team. And, eventually, 14 very long hours later, a “rockstar” developer figures it out. The immediate crisis is over, but there will be painful days of work ahead for you and your team dealing with disgruntled customers, driving a detailed postmortem process and then coming up with an action plan to prevent a repeat occurrence of what just happened. Enter Skylar AI - The Future of AIOps and Observability Now imagine a different scenario: Skylar: “Your orders server just went down because it hit a race condition in one of the open source components. Good news, there’s a quick solution. You are running v2.3.1.2 of the component and the issue has been fixed in v2.3.1.4. Would you like me to show you how to upgrade to the fixed version?” The goal of Skylar AI is to reason over not just telemetry, but also the stored knowledge of an organization to deliver accurate insights, recommendations, and predictions, so that: When something fails, it will tell you in plain language what happened and how to fix it. If something is going to fail, it will tell you how to prevent it from failing and impacting production. It will also be able to deliver all insights and answer any question asked of it by drawing on telemetry together with relevant information from a company’s stored knowledge (e.g. KB articles, support tickets, bug databases, product documentation, etc.) Gen AI is Everywhere,how is Skylar AI Different? The Fundamental Challenge Before explaining how Skylar AI is different, it’s important to understand the problem Skylar AI was designed to address: that today’s best-of-breed AIOps, monitoring and observability tools place too much reliance on human expertise. The situation above is a prime example. The problem had never been seen before (an “unknown/unknown”) and so there were no rules in place to catch it. This meant human experts with the right tribal knowledge were needed to figure things out. The tools provided all the information needed to ultimately solve the problem, but at each step of the way, the right human expert(s) needed to interpret and analyze the data, decide on the next course of action and keep iterating until the problem was finally solved. Ultimately, this required multiple escalations and a lot of wasted time and frustration. The goal of Skylar AI is to make Level 1 and 2 teams more effective and productive and able to solve a broader range of problems more quickly. So that situations like the above can be handled without the pain and wasted time. So why not Follow the Industry Trend and Build an AI Assistant? The industry is quickly moving towards the use of Generative AI (GenAI) and large language models (LLMs) in the form of “AI Assistants” or “AI Chatbots”. These appear as small chat panels in a product’s UI and allow a user to ask questions of the tool in plain language. So, instead of learning how to construct a complex query and visualize it on a chart, with an assistant, you can simply ask, “create a dashboard showing average latency for the top 10 devices over the last 3 months”, and it will do just that. However, although assistants are useful in simplifying the way a user interacts with a tool, it is important to understand that a skilled user still needs to know what questions to ask of the assistant at each step of the way – in other words reliance on deep expertise is still required to get the most out of the tool. Our Breakthrough Invention: An AI Advisor, rather than an AI Assistant To deliver on our vision, a new paradigm was needed: an “AI Advisor.” The concept of Skylar Advisor is that instead of a skilled user having to always know what questions to ask of the tool, the Advisor automatically tells the user the answers to the curated questions tailored to the user role without the user having to even ask them in the first place! In other words, Skylar Advisor automatically tells the user what the user needs to know in the form of easy-to-understand insights, predictions, and actions to take. This allows teams of all levels to be far more effective, without wasting precious time and perform more tasks with less effort. The creation of Skylar Advisor, didn’t just necessitate building a pipeline of AI and machine learning (ML) technologies including a self-hosted LLM, — it also required a completely reimagined user experience: An uncluttered and intuitive UI (not a traditional dashboard/event list display) A curated list of what’s most important to user based on the user’s role and areas of responsibility A multi-modal interface to describe a problem The use of contextual “Quick Prompts,” where Skylar suggests what actions a skilled expert would take at each step of the way To see a demonstration of Skylar Advisor, please visithere. Make sure to read Part 2 of this blog that explains the components of Skylar AI and how it works with SL191Views1like0CommentsIntroducing our newest Blogger: Gavin Cohen VP of AI Product Management
Hello Nexus Community Members and ScienceLogic Customers, I wanted to take a minute to introduce our newest blogger who will be keeping us up to date on AI Product related announcements, trends and best practices. Gavin Cohen is VP of AI Product Management at ScienceLogic and has over 20 years’ experience across a diverse range of technology roles. At ScienceLogic, he is responsible for defining the company’s AI/ML product roadmap and strategy, working closely with customers, partners and internal teams. Gavin joined ScienceLogic through the acquisition of Zebrium, where he was VP of Product and Marketing and part of the founding team. Prior to joining Zebrium, he was VP of Product and Solutions Marketing at Nimble Storage where he redefined the company’s category and positioning leading to a successful acquisition by HPE. He has also held senior product management, business development and technical evangelist roles in Australia and the U.S. Gavin has a Bachelor of Computer Science and an MBA.29Views1like0CommentsBe the Operations Superstar with the Hollywood/OL8 Upgrade Guide
Hollywood- the latest release of ScienceLogic SL1- isn’t simply another extension with powerful new features. It’s a breakthrough upgrade for existing customers, incorporating hundreds of user suggestions and major platform enhancements. It delivers enhanced performance, new unified user interfaces, enhanced automation, and cutting-edge security features to name but a few. But Hollywood (v12.1) goes even further, enabling a new monthly feature upgrade framework. With this upgrade, keeping your SL1 platform up-to-date with the latest capabilities and fixes will be easier than ever. Hollywood is another example of ScienceLogic’s vision to help your team achieve truly optimized human-hybrid IT operations- Autonomic IT ScienceLogic Support has released a series of support guides and how-to videos to walk you through your upgrade to Hollywood. First, we’ve included a new step-by-step upgrade video in the Nexus Resource Center. The video includes an overview with prerequisites, a step-by-step outline, and a live demonstration of the upgrade process. Second, the 12.1 OL8 upgrade resource center in the Support Portal provides complete details and full upgrade documentation. And for SaaS-hosted customers, upgrading to Hollywood is even easier. Simply open an upgrade ticket in the Support Center, and our experts will take care of it for you. Unlock the full potential of your IT operations with ScienceLogic SL1 Hollywood 12.1. Upgrade today and join the revolution in intelligent IT operations management!95Views0likes0CommentsAutomated Root Cause Analysis: Finding Diamonds in Mountains of Logs
We’ve all been there at one time or another in our careers: a business-critical service failed, and the emergency recovery clock starts ticking with a vengeance. Not just ticking; it’s a blaring siren and a firehose of inquiry from concerned application owners and executives. Worse, both your customers and your team are frustrated. Customers because an app they depend on has brought them to a halt, and your team because resources are diverted to fix the crisis. And perhaps the most perennial of troubleshooting resource drains during application outages is the manual analysis of dozens or hundreds of logs, and millions or billions of messages. It would be easy to think that by now ops teams would have access to powerful analytic tools to make quick work of automating root cause discovery. To be fair, both vendors and open-source have delivered log aggregation and query platforms that at least simplify the first-order log problem- making log data easier to access. But they still require admins with talent and deep understanding of applications to spot a never-ending list of novel failures deep within application frameworks. Fortunately, machine learning and artificial intelligence are now being combined to assist operators in quickly identifying the root cause of issues and begin resolution right away. ScienceLogic Zebrium AI Log Analysis is a great example. Automated Analysis Begins With Automated Learning Step one in supercharging incident response is the automatic ingestion and processing of millions or even billions of log messages in real-time. However, that function must be truly automatic. Overloaded human admins do not have time to train yet another tool. ScienceLogic Zebrium AI Log Analysis automatically learns how to understand log messages, including what data is significant, which messages are unusual, which are noisy, and even how to decode the details in previously unseen log formats. This unsupervised machine learning typically begins delivering results within 24 hours of exposure to new logs. Better, it can result in a tenfold faster resolution process. Untangling Unknown Unknowns Modern applications are complex, and the novel nature of many errors makes understanding what broke a daunting task. This is why logs remain the gold standard for troubleshooting issues. Well-understood failure modes send alerts or event messages clearly indicating the issue and providing context for repair. However, most critical outages result from issues never previously encountered, and the only evidence might be an obscure, single message among millions of lines of noise. Zebrium correlates unusual behavior with recent changes and performance metrics, helping you understand potential business service impacts before they become full-blown incidents. Fluent Klingon Not Required You're not alone if reading logs feels like deciphering a foreign language. Each log has its own unique syntax and vocabulary, making troubleshooting challenging. That challenge is multiplied for each new log that must be manually investigated. Zebrium AI Log Analysis automatically translates arcane formats and fragmented details into plain language that’s easy for the whole team to understand, naturally. Going beyond identifying which log lines are related to the cause of issues, Zebrium’s AI engine explains issue details in plain language. Its natural language model goes further to generate root cause summaries that describe the systems involved and the relationships between application elements. It also visualizes the most critical keywords from related log messages. When teams immediately recognize application details, they can trust the accuracy of automated analysis. Ready, Set, Analyze! If you’re ready to transform your log analysis and incident resolution process or are simply curious about how automated root cause analysis might streamline your troubleshooting effectiveness, you can request a free trial of ScienceLogic Zebrium AI Log Analysis today. It’s SaaS-hosted and easy to get up and running in minutes. Experience the next level of incident troubleshooting today and get back to doing what you love- delivering great service for your customers.87Views2likes0CommentsRestorepoint
Restorepoint is a Disaster Recovery and Secure Configuration Management appliance for network devices such as routers, switches, proxies, and firewalls. Restorepoint can automatically retrieve your network device configurations, detect changes and compliance violations, and report these automatically to network administrators. In this Powerhour session we will share with you how to add devices into Restorepoint, why having backups collected is useful in an autonomic IT environment, and how to leverage the SL1 platform in support of key workflows. Additionally, we will present the benefits of Governance, Risk and Compliance (GRC). Within the ScienceLogic Network Change and Configuration Management (NCCM) platform you will be backup over 100 different types of network equipment including firewalls to a central repository. Once the backup is collected then you can track change over time for auditing purposes as well as day to day operational needs to manage effective change control. Integration to the monitoring and automation platform adds layers of value which we will discuss during our session. When your organization is tasked with managing GRC then Restorepoint will apply your rules to assist in real time awareness of compliance. When configuration drift occurs then an alert will be sent to the SL1 platform for execution of automations in support of your defined workflow. For instance if you need to collect the difference of the last two configurations to compare an unplanned change while opening an incident into your IT Service Management (ITSM) product that can be completed so your operations team only need to receive the enhanced information set and begin resolutions steps. Often the best step is to revert the change and of course that’s supported from SL1 to allow for reduction in your Mean Time to Repair (MTTR). At the end of the June PowerHour you will have learned how to add devices into Restorepoint, why having backups collected is valuable and how to leverage the SL1 platform in support of key workflows. Additionally, we will present the benefits of Governance, Risk and Compliance (GRC). While blending into the overall SL1 platforms workflow operational needs59Views1like0CommentsHarden the SL1 Platform with Oracle Linux 8 (OL8)
In the upcoming ScienceLogic PowerHour we are covering the ‘Harden the Foundation’ of SL1 topic and why you should upgrade from existing Oracle Linux 7 (OL7) to the new Oracle Linux 8 ((OL8) platform. The virtual appliance format that SL1 utilizes allows us to harden the core with OL8 to improve platform security, scalability, and application performance. This session will share the value of that migration with your teams along with details that can be utilized in your internal conversations about the upgrade process. One of the most important aspects of the upgrade is Enhanced Security with OL8; which will enable SL1 users to support advanced security features. For instance: dedicated OL8 STIG builds with FIPS 140-2, TLS 1.3. Additionally, Package Application Streams DNF YUM Package Manager and Faster SL1 System Updates. Another major reason to join the PowerHour is the SL1 application improvements. With Increased Processing Speed of the Database, I/O Performance (Open 7K Business Svcs in 10sec) and, Large SQL queries which will enable the platform to have a roughly 30% faster query response. Join us for the May 22nd PowerHour and learn about all the options and value in the newest upgrades from ScienceLogic. To learn more about the conversion process please visit the Conversion Resource Center.193Views5likes2CommentsBuilding Effective Run Book Automations: Maximizing Operational Efficiency with SL1
In today's dynamic IT landscape, operational efficiency and control are paramount for businesses to stay competitive and resilient. ScienceLogic'sRun Book Automation (RBA) offers a comprehensive solution aimed at streamlining operations, enhancing control, and identifying critical events. In SL1, creating an automation policy will define the event conditions that must be met before SL1 will trigger an automatic action. Consider a scenario where an unplanned network device configuration change triggers a compliance alert in SL1, but the alert doesn’t provide all the information necessary to determine the best action for resolution. ScienceLogic’s automations can collect additional event information through a python script and return the data to SL1 and/or your incident platform. With this full information set, the best course for remediation can be determined. If appropriate, automations can also assist with resolution steps to avoid human error as you work to reduce your mean time to repair (MTTR). This systematic approach ensures that key events are promptly addressed, reducing the risk of compliance violations and operational disruptions. Operational efficiency is further enhanced through tailored automation actions that alleviate repetitive tasks. Another common scenario is for a web server supporting the front-end of your most important application to have a performance problem that can come and go at a moment's notice. At the time of occurrence, the SL1 platform can trigger an automation to collect the necessary data that allows your support team to decide on the best resolution. Furthermore, if the resolution also includes a scriptable solution (and SL1 has many) the resolution step can be performed and tracked, therefore reducing MTTR. ScienceLogic RBAs offer a powerful solution by identifying critical events, streamlining processes, and enhancing operational control. With the ability to align automation policies with critical events, organizations can adapt to evolving challenges with agility and confidence. To learn about how to build effective RBAs to maximize your operational efficiency with SL1, attend our upcoming PowerHour session on April 24, 2024. I’ll walk you through how to align automation policies with critical events and how to create automations that help reduce repetitive tasks. If you have automation questions leading up to the event, let me know. Post them below!85Views5likes0Comments