Lead Site Reliability Engineer Engineering - Stow, MA at Geebo

Lead Site Reliability Engineer

Job DescriptionJoin a team of more than 30,000 team members, comprised of our Club Support Center and over 230 clubs and 7 distribution centers.
We're committed to delivering value and convenience to our Members, helping them save every day on everything they need for their families and homes.
BJ's Wholesale Club offers a collaborative, team environment where all team members can learn, grow and be themselves.
The Benefits of working at BJ'so BJ's pays weeklyo Generous time off programs to support busy lifestyles o Vacation, Personal, Holiday, Sick, Bereavement Leave, Jury Dutyo Benefit plans for your changing needs o Three medical plans , Health Reimbursement Account (HRA), Health Savings Account (HSA), two dental plans, flexible spending eligibility requirements vary by position medical plans vary by location Reliability Engineering (SRE) is an engineering discipline that combines software engineering and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.
A Lead SRE within the BJs Digital team is a hands-on role focusing on increasing our tooling and automation and improving resilience and availability of digital platform applications and processes through SRE principles and practices.
Major Tasks, Responsibilities, and Key Accountabilities Has end-to-end availability, security and performance of mission-critical applications and services that are part of the Digital systems Analyze technical issues and identify the root cause and provide fix in production environment.
(Never solve the same problem twice) Partners with multiple internal teams to groom the nonfunctional requirements and work on implementations Automate or streamline manual tasks and redundancies within the infrastructure organization Implement best SRE practices to ensure availability/reliability and fault tolerance and wherever applicable Be the SRE ambassador on an Agile software development team.
Drive product reliability improvements through monitoring, alerting, and application of software development best practices.
Identify creative ways to break the system, uncover and report nonfunctional defects, as well as validate systems/solutions are operating as intended.
Perform proof of concepts to proof new technologies and integrations Able to work fast and reliably under pressure A strong critical thinker who identifies problems before they happen Troubleshoot performance and stability issues using a wide variety of tools Evaluate and manage application and environment security Share off hours on call with team for any production issues Qualifications Bachelor's degree in Computer Science or related field with continuous and progressive experience 6
Years of total IT experience ( 2
years' experience in development roles and 3
years' experience in Reliability engineering role) Hands on experience with performance analysis, scalability, and reliability testing techniques Experience with APM tools (New relic, Dynatrace or similar tools), and log monitoring tools (Splunk , scalyr or similar tools) Strong knowledge and hands on experience with Linux, SQL, and Shell scripting Familiarity with object-oriented programming languages (Java) and concepts and hands on experience in Java applications (spring boot services) Hands on experience with any cloud service concepts, preferably AWS Hands on experience with SRE practices and writing, running Chaos engineering experiments Knowledge on HCL commerce and IBM sterling platforms is advantageous A strong critical thinker who identifies problems before they happen Strong written and oral communication skills with a high degree of comfort speaking with engineering management, developers, and leadership Demonstrated ability to adapt to new technologies and learn quickly Nice To Have Previous experience in an eCommerce based company Experience implementing CI/CD Blue/Green Deployments using CI/CD Environmental Job Conditions Support and maintain globally distributed, multi-cloud (public and/or private) environments Automate common, repeatable tasks at large scale to streamline operational procedures Follow change management processes during implementations Use and maintain version control for application infrastructure Work in a diverse and global team environment Cross-train with other global team members Participate in an on-call rotation as required Promote the DevOps/SRE mindset Recommended Skills Adaptability Agile Methodology Amazon Web Services Automation Change Management Cloud Computing Estimated Salary: $20 to $28 per hour based on qualifications.

Don't Be a Victim of Fraud

  • Electronic Scams
  • Home-based jobs
  • Fake Rentals
  • Bad Buyers
  • Non-Existent Merchandise
  • Secondhand Items
  • More...

Don't Be Fooled

The fraudster will send a check to the victim who has accepted a job. The check can be for multiple reasons such as signing bonus, supplies, etc. The victim will be instructed to deposit the check and use the money for any of these reasons and then instructed to send the remaining funds to the fraudster. The check will bounce and the victim is left responsible.