Why will you enjoy this new opportunity?
VMware's Cloud Solution SRE team is seeking an exceptional Site Reliability Engineer to join us in our SaaS transformation journey! As part of VMware's next compelling cloud offering, we are building a talented team to operate, maintain, and improve the quality of our service. Bring your passion for cloud technologies and experience operating SaaS to influence how our development teams release quality products. While you manage the availability of our services to meet our service level targets and drive fleet-wide operations, you will also play a critical role in ensuring proper operational readiness is met by teams as new services and features are onboarded.
As part of this role in the SRE team, you will be responsible for operating hundreds of SaaS services running on Kubernetes supporting our customers using orchestration and operational tools. We are responsible for the availability, reliability, performance of these services based on SLA/SLI/SLO definitions. Also, you will have the opportunity to collaborate with, development teams across VMware, and other architects in the team, who work on various areas such as Autoscaling, Performance, Reliability, Availability, DevOps tooling, Security, Monitoring, all cutting edge technologies to help develop modern applications.
If you think you are that SaaS-minded engineer with customer empathy and a relentless focus to solve any complex problem at scale, we would love to talk!
Success in the Role: What are the performance outcomes over the first 6-12 months you will work toward completing?
The following is a schedule for the first year of joining:
Within 1 month of employment, become familiar with VMware's rules, guidelines, and standard operating procedures.
Within 3 months of employment, learn the product/SaaS offering which you will help secure, the technology stack which they use, become familiar with all the operational aspects, and get accustomed to the regular on-call shifts.
Within 6 months of employment, work towards becoming in handling priority Issues and become the go-to person for the owned service.
Within 1 year of employment, work towards becoming expert in handling multiple products and SaaS offerings, along with automating Runbooks.
The Work: What type of work will you be doing? What assignments, requirements, or skills will you be performing on a regular basis?
Your regular activities may be modified to suit your professional strengths, but you can expect a variety of the following assignments:
The Staff SRE will be responsible for automating and maintaining the delivery of SaaS services and monitoring solutions hosted and consumed by the VMC on AWS team and our customers. This includes the automation of core application services and associated Infrastructure services. You will also be required to enable other product engineering teams to drive towards automated problem resolution. On the observability side, we help service owners define and instrument SLOs & alerts that follow best practice, build tools and dashboards, facilitate post-mortems, and look to continuously enhance our existing systems and process to improve the reliability of the SaaS offering.
You should have a strong DevOps oriented mindset, be willing to take on challenges, maintain a high degree of ownership & transparency and work effectively both on a team and independently.
The SRE role is a great fit for engineers who want to own production solutions while getting hands on with a wide variety of the latest and greatest open-source technologies, and love to push the boundaries of what cloud infrastructure software, observability and tooling can achieve.
Role Responsibilities:
Providing technical expertise and direction for multiple teams across multiple geos
Driving the improvement of development practices and tooling as well as reviewing designs or improvements to our code base and services.
Mentor and coach site reliability engineers and seek opportunity for continuous improvement, champion engineering best-practices, tooling, and efficiency.
Development, configuration and maintenance of CI/CD pipelines and tools to facilitate rapid deployment of code and faster testing of new features and products.
Ensure fast and reliable delivery of the automation code to production datacenters. Emphasis on code quality through code reviews, CI process for the automation code and unit tests where applicable.
Participate in team on-call rotation
Work closely with internal software engineering teams to improve availability & observability of services & design and develop deployment automation pipelines for new cloud service offerings.
Manage multiple different types of infrastructure services efficiently leveraging practices for configuration management, Infrastructure as Code, efficient auto-remediation etc.
Working under pressure in production environments running production customer workloads and services. Work per escalation, notification, and incident management practices.
Drive the product towards higher availability and reliability & assist with on-call support on a rotating schedule for incident escalations.
Required Skills:
10+ years of relevant experience in a Development, SaaS Operations, Site Reliability, or comparable Cloud Engineering position with a demonstrated track-record of execution and delivery
Excellent system design and development skills
Proven development and programming background with experience at least in the following languages: Python, Bash
Experience deploying, provisioning, and administering production SaaS systems
Experience with both Jenkins and Concourse build automation tools and CI/CD orchestration tools
Experience with monitoring, logging, and remediation services including but not limited to: Wavefront, vRLIC, AppDynamics, Elasticsearch, Stackstorm)
Experience with container technologies and microservice architecture including Kubernetes and Docker
Experience with on-premises and cloud-based infrastructure and services including but not limited to: vSphere products, VMware Cloud on AWS, AmazonMQ, Amazon RDS, Amazon EC2, Amazon S3, Amazon SNS
Experience with deployment frameworks including terraform
Proven ability to handle multiple, complex technical projects and have the flexibility to work in a very dynamic environment
BS/MS/PhD in computer science or related technical field
Preferred Skills:
Have developed software using one or more object-oriented programming languages, i.e. Java, Python, Golang
Knowledgeable of security in the cloud: intrusion, penetration, and vulnerability scanning
Identify, gather, analyse and automate responses to key performance metrics, logs, and alerts.
Good working knowledge of build automation and continuous integration/delivery ecosystem: Git, Gerrit, Maven/Gradle, Jenkins, Docker, Nexus, Artifactory, Selenium.
Have used tools such as Atlassian's JIRA and PagerDuty
Curious to learn new other coding languages and technologies as needed
Hands-on working experience with AWS toolsets is a plus
Background with infrastructure configuration and automation processes and tools: Terraform, Puppet, Ansible, Chef, Fabric, Helm, etc. is a plus.
Knowledge of VMware's SDDC stack or any related VMware certification is a big plus
What is the leadership like for this role? What is the structure and culture of the team like?
The Cloud Solutions SRE and Monitoring team is headed by Matt Williamson, who is the Director and based out of Austin, TX USA. The leadership team reporting to Matt is based out of India, Europe and the US. As part of this role, you will be joining the US team, which is led by Matt with engineers located around the US. The leadership encourages independent thinking and gives a free hand to innovate. We have a diverse, inclusive, and open culture in the team. We encourage continuous learning, sharing of ideas and thoughts, and growing together as a team. The team consists of majorly SRE engineers with experience ranging from
Where is the role located?
Remote: The location of this role is remote within the US.
What are the benefits and perks of working at VMware?
You and your loved ones will be supported with a competitive and comprehensive benefits package. Below are some highlights.
Employee Stock Purchase Plan
Medical Coverage, Retirement, and Parental Leave Plans for All Family Types
Generous Time Off Programs
40 hours of paid time to volunteer in your community
Financial contributions to your personal/career development (conference participation, training, course work, certifications, etc.)
Wellness reimbursement and online fitness and wellbeing classes
This job may require the candidate to travel and/or work from a facility that requires full vaccination prior to entry.
Category : Engineering and Technology
Subcategory: Site Reliability
Experience: Business Leadership
Full Time/ Part Time: Full Time
Posted Date: 2022-10-11
VMware Company Overview: At VMware, we believe that software has the power to unlock new opportunities for people and our planet. We look beyond the barriers of compromise to engineer new ways to make technologies work together seamlessly. Our cloud, mobility, and security software form a flexible, consistent digital foundation for securely delivering the apps, services and experiences that are transforming business innovation around the globe. At the core of what we do are our people who deeply value execution, passion, integrity, customers, and community. Shape what's possible today at http://careers.vmware.com.
Equal Employment Opportunity Statement: VMware is an Equal Opportunity Employer and Prohibits Discrimination and Harassment of Any Kind: VMware is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. All employment decisions at VMware are based on business needs, job requirements and individual qualifications, without regard to race, color, religion or belief, national, social or ethnic origin, sex (including pregnancy), age, physical, mental or sensory disability, HIV Status, sexual orientation, gender identity and/or expression, marital, civil union or domestic partnership status, past or present military service, family medical history or genetic information, family or parental status, or any other status protected by the laws or regulations in the locations where we operate. VMware will not tolerate discrimination or harassment based on any of these characteristics. VMware encourages applicants of all ages. Vmware will provide reasonable accommodation to employees who have protected disabilities consistent with local law.