In-demand jobs: Site Reliability Engineer

08 Aug 2020

Article Image

LinkedIn’s latest report on ‘Emerging jobs 2020,’ lists Site Reliability Engineer (SRE) among the top 10 in-demand jobs for 2020. Companies in Telecom, Marketing, Advertising, and Information Technology and Services industries, across the US, UK, and India are at the forefront in hiring SREs, a trend that is explained by their aggressive automation and digitization strategies. Site reliability engineers are among the highest-paid IT professionals; this has made the job coveted by candidates.

What do Site Reliability Engineers do

Site Reliability Engineers are ‘custodians’ of the hundreds and thousands of software systems owned by a company. Their job is to ensure that these systems continue to be ‘reliable’ for the customers as their company scales. SRE is a comparatively new role and is a combination of software engineering and IT systems management. On a typical day at work, an SRE spends his time writing code to automate the management of software systems and resolving the infrastructural and operational challenges thrown by them

A little back story

SRE was conceptualized by Google in 2003. Back then, Google teamed up a few software engineers to tackle its large-scale site problems. The performed the task so well that it captured the attention of other technology companies like Amazon, Dropbox, and Netflix. Overtime SRE became a distinctive IT domain dedicated to developing automated practices like disaster response, change management, latency, performance, and capacity planning. Today, SRE a huge community and has its own conference, SREcon.

The nitty-gritty

SREs expertly balance between operations and development work. They help teams understand which and when the novel feature should be initiated by Service-level agreements (SLAs) to mark out the essential reliability of the system through Service-level indicators (SLI) and Service-level Objectives (SLO). An SLI is a described extent of particular aspects of provided service levels. An SLO is about the target value or extent for a specific service-level based on SLI. Key SLIs are defined as error rate, system throughput, and request latency. An SLO for the demanded system reliability is then analyzed based on the downtime as per the measure. Thus, the determined downtime level is called an error budget, the maximum allowable requirement for errors and outages.

When releasing the new feature, the development team will be spending the error budget. Utilizing the SLO and error budget, the development team can conclude whether or not a product or service can launch based on an accessible error budget. If a service is running as per the error budget, then the development team can launch anytime, but if the system detects too many errors or is down for extended periods than that specified by error budget, then no new launches can happen until the errors are down to the specified limit.

How to become a Site Reliability Engineer

For an entry-level SRE role, companies prefer candidates with a bachelor’s degree in Computer Science or related fields. Certification as a Site Reliability Engineer or a Software Engineer is an added advantage. Those with work experience as a Systems administrator, DevOps engineer, or Software engineer have a significant edge over others. The hiring manager thoroughly scrutinizes the candidate’s aptitude for programming languages, familiarity with operating systems, and knowledge of automation technologies before offering him the job. Here is the technical skillset that makes a candidate sought-after in the SRE job market:

  • Scripting and coding languages like Java, JavaScript, and .NET, Python, Ruby, Bash, Pearl, Node.js, Golang or Scala
  • Operating systems like Linux and Windows
  • Automation technologies
  • Cloud computing technologies including Software as a Service, Platform as a Service and Infrastructure as a Service
  • Container orchestration like Kubernetes or docker swarm, configuration management tools like Ansible, Chef, Puppet, and SolarWinds

After getting through the door, an SRE has to strive to stay on top of technology shifts constantly. He should gain familiarity with front-end and back-end technologies that make up a software system to be able to understand the problem and expedite a solution.

From SRE to where?

Junior level SREs who perform exceedingly well in their roles get a chance to work on larger and more complex computer systems. Their willingness to collaborate with other IT experts, enthusiasm to learn new technologies, ability to thrive under pressure, think on toes and solve problems, and excellent communication skills go a long way and push them up the ladder. Senior Site reliability engineers often choose to continue in the same role for a long time before moving into managerial positions. Shifting into a similar role, such as DevOps or System Administrator, is not an unusual move

Some companies that frequently hire SREs are Google, Tesla, Microsoft, Twitter, Adobe, Slack, Apple, and Non-Tech giants like The Walt Disney Co., Mastercard, and Capital One

You Might also Like

The 15 Highest-Paying IT Jobs in 2021
The 15 Highest-Paying IT Jobs in 2021

With advancements in technology, the global IT job market is booming. Thi

Freelancing Trends To Watch Out For In 2021
Freelancing Trends To Watch Out For In 2021

Due to Covid 19, the trends of freelancing are one to watch out for

Job Search during pandemic – The WHATs and HOWs
Job Search during pandemic – The WHATs and HOWs

With over nine million people infected with the novel coronavir

Best Paying Seasonal Jobs
Best Paying Seasonal Jobs

Seasonal jobs are a lot different than part-time or full-time jobs. These are the rec