Site Reliability Engineer (All Levels, Remote Worldwide)
Posted on: January 16, 2022
Who We Are
Element is the startup that employs the core team behind matrix.org
- the leading project for secure, open decentralised
Matrix's mission is to make messaging as open as email - allowing
everyone to choose where their data is hosted, enjoy private
conversations thanks to advanced encryption, and ultimately be in
control of their own communication.
Matrix powers our flagship messaging apps for the web, iOS &
Android, along with Element Matrix Services, our SaaS platform for
personal & professional use.
We build things for everyone, and we know we can't succeed without
a diverse team. Our hiring process is designed to give candidates
the best chance to show us what you can do. If we ever fall down on
this, please let us know.
About Your Team
We are a small team today of five engineers working hard at
transforming how operations and infrastructure is done within the
organisation. We come from various backgrounds and are today a
remote-first team with all of us working from different countries
across Europe and the UK.
As part of our day-to-day operations, we use or touch on (in no
particular order) AWS, UpCloud, Postgres, Grafana, Prometheus,
Loki, Elasticsearch, PagerDuty, Python, AWS EKS, Red Hat OpenShift,
GitLab, GitHub, Ansible, AWX, Terraform, Keycloak, Linux,
Containers, Golang, HAProxy, Nginx to name a few.
The Team Today
- We manage both internal and client infrastructure across
private and public clouds, in private data centers and on
kubernetes clusters. Translation - we ssh into boxes, apply
ansible, use terraform, manage kubernetes clusters, manage
configurations and release roll-outs.
- We react to and resolve various issues within the
infrastructure. Translation - we respond to alerts and pages, we go
on-call, look at grafana dashboards, isolate/debug production
issues, roll-out mitigation where we can etc. We are predominantly
responsible for the availability of most services deployed in the
- We are responsible for internal IT. Translation - we help
on-board new employees, manage things like mail, calendaring
access, sso etc.
- We help our clients understand their needs, identify
bottlenecks and manage their on-premise Matrix services.
Translation - we do a bit of consulting work with our Professional
Services team and also manage services on-premise on behalf of our
- We are working with intent towards our tomorrow. Translation -
we dedicate time where we can to automate, modernise or fix our
current assets, improve our processes/platforms and build best
practices for our engineering teams.
The Team Tomorrow
- We are cloud native and container first. Translation - our
focus is on developing and delivering artefacts and automation
predominantly targeting cloud environments. Particularly, we are
focused on container native environments running on both managed
and self-managed kubernetes clusters at scale.
- We are focused on developer enablement. Translation - we focus
on enhancing developer experiences by improving CI/CD pipelines,
sharing cloud native development expertise and codifying our
expertise in this area. We provided reliable infrastructure and
platform for developers to build and deploy services to production.
Developers are responsible for the day-to-day automation of their
services, assisted by the tools and processes provided by us.
- We are focused on Site Reliability. Translation - we codify the
operational tasks, automate the recovery from incidents and we
manage cattle not pets. We provide infrastructure, platform and
tools for services to run at scale. And we enable automated
operations across most if not all our mission critical
We are presently in the process of working towards our tomorrow.
And we want to bring more team members along for this journey. We
want to work with collaborative and kind people who do not mind
experimenting with the unknown.
What We Care About
- You are kind, empathetic and willing to share your knowledge
- You are willing to ask for help and to provide it when you
- You are keen on learning new things and figuring out how to
improve the status quo.
- We try (operative word here) to focus on getting things done
right, but this does not mean we do it right the first time as
being pragmatic is important to us.
What Are The Basics You Need
- We do not need you to have any experience in decentralised
communication, nor do we expect you to be experienced and
knowledgeable in everything. However, there are few things that
being familiar with will let you get your job done.
- Linux Servers - You can ssh into a machine, update packages,
get at logs, figure out why it is misbehaving.
- Containers - You have built your own containers before, used it
in anger, and understand the basics of how they work.
- Infrastructure automation - You have worked with at least one
of Terraform or Ansible. We are also happy if you have worked with
similar tools, like Puppet, Chef or Saltstack etc.
- Public/Private Cloud Providers - You have used at least one of
AWS / Azure / GCP. Hopefully, you have used terraform to automate
infrastructure on them.
- Programming languages - You have written some meaningful code
that did some of your automation for you. Preferably in Python or
Go. You are able to look at an unknown code base and understand it
enough to try debugging it in production.
About The Process
- Opportunity fit. At this stage you will be talking to the
person you will be reporting to and a member of our people team. We
will talk a bit about the company, about you, about the role etc.
You will get a chance to ask questions and understand better if the
role fits what you are looking for. Obviously, we will have a few
questions for you as well, but no whiteboards or algorithms
- Offline coding exercise. We will send you a coding exercise,
nothing overly complex. Something that might take someone 1-2 hours
to complete. We are looking for your approach, an insight into how
you solve a problem and basic smell tests on your coding
- Interview with the team. In this stage, you will be talking to
a couple of members from the team. We will talk about your coding
exercise submission and talk about improvements etc. You will get
an opportunity to talk to the team about the day-to-day, the
organisation, the team, their experiences etc. We will definitely
ask some technical questions here. We are looking for insights into
how you communicate your ideas and technical solutions. We expect
the conversation to be two-way.
- Architecture. Here, you will be talking to our VP of
Engineering and the Founder/CTO. Expect conversations around
architecture, building at scale etc. You will get the opportunity
to understand more about Element's history, it's future direction
If you have any questions before making an application reach out to
Mohand (@mohandbouhadouf:matrix.org) via https://app.element.io
Our general approach is to treat people like adults and acknowledge
that by being flexible we create an environment for people to do
their best work. For more details here is our manifesto. That said
specific points that differentiate us.
- Our projects are almost entirely Free and Open Source Software,
with high visibility and a large, enthusiastic community.
- We fully support remote and flexible work, but also maintain
offices in London and Rennes.
- We strive to create a family friendly environment, many of the
team have small children and we look to accommodate that as best we
- People tend to stay with the company for a long time, we take
this as a sign that we have a cohesive supportive culture, that we
have engaging challenging work and that people can develop their
skills and careers here for the long term.
- Since our technology is relevant to anything that requires
real-time comms, the role provides exposure to a wide range of
domains from more traditional web and app development down to VoIP
Element does not discriminate on the basis of race, sex, colour,
religion, age, national origin, marital status, disability, veteran
status, genetic information, sexual orientation, gender identity or
any other reason prohibited by law in provision of employment
opportunities and benefits.
Keywords: Element, Garland , Site Reliability Engineer (All Levels, Remote Worldwide), Professions , Garland, Texas
Didn't find what you're looking for? Search again!