QCon 2022 @San Francisco Diaries

2022-10-28: This is my second time in California and my very first time in San Francisco, and what better time to visit again the West Coast than the QCon2022 ! Here’s a partial list of the talks that I’ve attended, including a brief description of each talk taken from https://qconsf.com. You’ll find some happy emojis close to the ones that I liked the most πŸ˜πŸ‘Œ. And a not surprising honorable mention to the one and only Kent Back for the closing speech, on the top of my very personal list!

Regular bad selfie
The one and only

Tidy First?
Kent Beck Original Signer of the Agile Manifesto, Author of the Extreme Programming Book Series, Rediscoverer of Test-Driven Development

Programmers face a software design dilemma hourly–I need to change this code, it’s messy, do I tidy first? The answer is, of course, it depends. It depends on coupling/cohesion, economics, psychology, & teamwork. These are the same factors affecting all software design decisions but here we can study them under a microscope.

On Beyond Serverless: CALM Lessons and a New Stack for Programming the Cloud πŸ˜πŸ‘ŒπŸ˜πŸ‘ŒπŸ˜πŸ‘ŒπŸ˜
Joe Hellerstein CS Professor @UCBerkeley, ACM Fellow, Forbes list of β€œ50 smartest people in technology”

Serverless computing promised a boundless programmable cloud, but delivered an army of incommunicado amnesiacs. Still, serverless computing is a hint of a better future: clouds as globe-spanning, auto-expanding supercomputers for anyone to program. In this talk I’ll share lessons from researchβ€”both foundations like the CALM Theorem and practical experience from open source like the Anna KVS prototypeβ€”on how we can do so much more to deliver stateful, communicating, autoscaling cloud software. Then I’ll describe our ongoing work in the Hydro Project at Berkeley, which looks ahead to a fully-featured programmable cloud, powered by a new data-centric compiler stack.

Adopting Continuous Deployment at Lyft
Tom Wanielista Senior Staff Software Engineer @Lyft

Opening talk

All organizations, regardless of size, need to be able to make rapid changes and improvements in their constantly growing systems. How can we handle all this change while maintaining a reliable product?

In 2018, Lyft operated a few hundred services. Deploying a change was difficult: a developer had to take a lock, read a runbook, then execute several manual steps, all while monitoring for potential problems. This took time, resulting in large, delayed, deploy trains, and ultimately, reliability issues. Today, Lyft operates over 1,000 services, and, by adopting continuous deployment, more than 90% are now automatically deployed to production, with no manual intervention. This has significantly improved reliability, freed up developer time, and sped up our ability to ship changes.

Enabling Change @ Scale Roundtable
Tom Wanielista Senior Staff Software Engineer @Lyft

Increasing the safe delivery of change has immense business value across a number of dimensions, so how can we improve our ability to manage change at scale? In this roundtable, we will be joined by software engineers and leaders with experience improving large-scale change mechanisms in their own companies. Topics will include technical strategies, operational considerations, people and culture, and general lessons learned. As an audience member, you will also have the chance to ask the panel questions.

Dark Side of DevOps
Mykyta Protsenko Senior Software Engineer @Netflix

Topics like β€œyou build it, you run it” and β€œshifting testing/security/data governance left” are popular: moving things to the earlier stages of software development, empowering engineers, shifting control definitely sounds good.

Yet what is the cost? What does it mean for the developers that are involved?
The benefits for developers are clear: you get more control, you can address issues earlier in the development cycle, you shorten the feedback loop. However, your responsibilities are growing beyond your code – now they include security, infrastructure and other things that have been β€œshifted left”. That’s especially important since the best practices in those areas are constantly evolving – the demand of the upkeep is high (and so is the cost!).

The topics covered in this talk:

the trade-offs that companies face during the process of shifting left
how to ease cognitive load for the developers without mandating a one-side-(doesn’t really)-fit-all solution
how to keep up with the evolving practices without putting even more load on engineers.
How far can you go with DevOps and Shifting Left? What can we do to break a grip of the dark side? Let’s find out!

Infrastructure as Code: Past, Present, Future πŸ˜πŸ‘ŒπŸ˜πŸ‘ŒπŸ˜πŸ‘ŒπŸ˜πŸ˜πŸ‘ŒπŸ˜πŸ‘ŒπŸ˜πŸ‘ŒπŸ˜πŸ˜πŸ‘ŒπŸ˜πŸ‘ŒπŸ˜πŸ‘ŒπŸ˜
Joe Duffy Founder and CEO @PulumiCorp

Infrastructure as code enables us to automate and manage all sorts of infrastructure, from on-premises virtual machines to cloud resources, and everything in between. Using code, we can codify definitions and processes, rather than performing them manually, ensuring repeatability, scalability, and many of those other “-ilities” we know so well in the software industry.

But alas, there is no single flavor of infrastructure as code. Indeed, there are a dozen well-known tools in this space, each with its own unique benefits. In Infrastructure as Code: Past, Present, Future, we’ll discuss why and how IaC came about, where it has gone, and where it is going. We’ll look at some of the challenges (and solutions) that we’ve experienced and how this shapes the future of IaC. After this talk, we’ll all be informed and ready to choose the right IaC solution now and for new projects.

Leveraging Determinism
Frank Yu Senior Engineering Manager @Coinbase, previously Principal Engineer and Director @FairX

Determinism is a very powerful concept when paired with fast business logic. We discuss both intuitive and not-so-obvious architecture choices that can be made to dramatically scale and simplify systems with these properties.

We built our latency sensitive exchange around a blazing fast open source raft cluster. After some time in production, observations about the nature of our request and event messages coupled with some timely advice led us to upend our service topology. Find out how rerunning core logic at the edges of a system can:

Decrease bandwidth usage and buffering across the system
Protect against thundering herd problems by making network usage more predictable
Simplify the logic of gateway and persistence services downstream from your core logic

Innovating for the Future You’ve Never Seen: Distributed Systems Architecture & the Grid πŸ‘ŒπŸ˜
Astrid Atkinson CEO & co-founder @CamusEnergy, previously early leader in SRE @Google

As climate action accelerates, the existing electrical grid plays a central role in decarbonizing our energy supply. We know that software can transform how we manage networks – now we need to take what we’ve learned, and apply it to managing the grid. Bringing software innovation to critical real-world infrastructure requires careful attention to reliability – moving quickly, without breaking things. How do we leverage experience with high reliability innovation in big tech, to transform our energy system and decarbonize the grid?

API Evolution Without VersioningπŸ‘ŒπŸ˜πŸ‘ŒπŸ˜πŸ˜πŸ˜πŸ˜πŸ˜πŸ˜πŸ˜πŸ‘ŒπŸ‘ŒπŸ‘ŒπŸ˜πŸ˜πŸ‘Œ
Brandon Byars North America Head of Technology @thoughtworks

Versioning is usually the first–and too often, the only–technique architects reach for when imagining a breaking change to an API’s interface. Based on my experience managing the evolution of a public API, I’ve recently cataloged several alternative techniques and their tradeoffs. I released the first version of an open-source service virtualization tool called mountebank in early 2014. Since that time, I have made or incorporated several breaking changes to its public API, some of them significant, without once requiring users to manage a version upgrade process.

In this presentation, I’ll share the learnings I’ve collected over nearly a decade of changes, including:

Patterns of evolution in addition to versioning

The natural tradeoffs that exist between API elegance, obviousness, and stability

Broadening the conversation from API evolution as an architectural concern to a broader product management concern

Dark Energy, Dark Matter and the Microservices Patterns?!
Chris Richardson Creator of microservices.io, Java Champion, & Core Microservices Thoughtleader

Dark matter and dark energy are mysterious concepts from astrophysics that are used to explain observations of distant stars and galaxies. The Microservices pattern language – a collection of patterns that solve architecture, design, development, and operational problems β€” enables software developers to use the microservice architecture effectively. But how could there possibly be a connection between microservices and these esoteric concepts from astrophysics?

In this presentation, I describe how dark energy and dark matter are excellent metaphors for the competing forces (a.k.a. concerns) that must be resolved by the microservices pattern language. You will learn that dark energy, which is an anti-gravity, is a metaphor for the repulsive forces that encourage decomposition into services. I describe how dark matter, which is an invisible matter that has a gravitational effect, is a metaphor for the attractive forces that resist decomposition and encourage the use of a monolithic architecture. You will learn how to use the dark energy and dark matter forces as guide when designing services and operations.

Scaling GraphQL Adoption at Netflix
Tejas Shikhare Senior Software Engineer @Netflix

GraphQL is steadily gaining popularity as an API technology choice for Client to Server communication. However, it can be daunting to realize the benefits of GraphQL without significant investment. Furthermore, there are migration pains and multiple architectural patterns for a GraphQL API strategy. Is it worth it?

At Netflix, we have been operating a Federated GraphQL platform where developers can contribute to the unified GraphQL API. This platform powers APIs for everything from Netflix Streaming to the Netflix Studio applications and most recently, internal development tools.

In this presentation, I will share

Key value propositions and some common misconceptions of GraphQL
Pros and Cons of Monolithic versus Federated GraphQL architecture
Challenges and lessons from operating the Federated GraphQL platform and some criteria to consider before adopting this approach
Best Practices for Schema Design and Evolution that will set you and your organization up for success
Developer tools we have built to facilitate collaborative schema design and ease GraphQL development.

Navigating Complex Environments and Evolving Relationships πŸ˜πŸ‘ŒπŸ˜πŸ‘ŒπŸ‘ŒπŸ‘ŒπŸ‘ŒπŸ˜πŸ‘ŒπŸ˜πŸ˜πŸ˜
Jennifer Davis Engineering Manager @Google

Organizations evolve. Industry tools and practices change. Individuals have a wide array of opportunities. As leaders, we have to navigate ambiguity, and provide structure for individuals to support their growth, while also enabling a group of individuals to come together and align to deliver business value. There is no perfect path to help build a team that is optimized for every context that can weather each change. But, there are practices and tools you can adopt to support your teams’ journey towards team flow.

In this talk, I will share my people-centric approach to building teams with individuals that engage, connect, align and support one another to deliver value in a sustainable manner. You’ll walk away with some tips that you can implement (as well as the context as to when to apply them) in three key areas:

Functional Leadership – Different situations require different kinds of leadership. You need to enable the right kind of leadership at different moments in time. How do you enable individuals to embrace their leadership capabilities?

Boundary Setting – What boundaries do you need to set that support the individuals and the team? How do you enforce the boundaries?

Learning and Adaptability – How do you make time for people to not only learn, but share what they’ve learned so the rest of the team can make sense of the decisions they’ve made and the team as a whole can adapt to this knowledge?

Harnessing Technology for Good β€” Transformation and Social Impact
Lisa Gelobter CEO and Founder @tEQuitable, previously Chief Digital Service Officer @Ed.gov, Chief Digital Officer @BET.com

Using real world examples, Lisa Gelobter will explore how we really can use technology to make the change we want to see in the world. From healthcare to education to workplace culture and from public sector to private, we will look at how to use the same best practices, innovative strategies, and a product development approach to affect societal and systemic level change.

Amazon DynamoDB: Evolution of a Hyper-Scale Cloud Database Service
Akshat Vig Principal Engineer NoSQL databases @awscloud

Amazon DynamoDB is a cloud database service that provides consistent performance at any scale. Hundreds of thousands of customers rely on DynamoDB for its fundamental properties: consistent performance, availability, durability, and a fully managed serverless experience. In 2022, during the Amazon Prime Day shopping event, Amazon systems — including Alexa, the Amazon.com sites, and Amazon fulfillment centers — made trillions of API calls to DynamoDB, peaking at 105.2 million requests per second, while experiencing high availability with single-digit millisecond performance. Reliability is essential, as even the slightest disruption can significantly impact customers.

Since the launch of DynamoDB in 2012, its design and implementation have evolved in response to our experiences operating it. The system has successfully dealt with issues related to fairness, traffic imbalance across partitions, monitoring, and automated system operations without impacting availability or performance. This talk presents our experience operating DynamoDB at massive scale and how the architecture continues to evolve to meet the ever-increasing demands of customer workloads.

Honeycomb: How We Used Serverless to Speed Up Our Servers πŸ˜πŸ‘ŒπŸ‘ŒπŸ‘ŒπŸ‘ŒπŸ‘ŒπŸ‘ŒπŸ˜πŸ˜πŸ˜
Jessica Kerr Principal Developer Evangelist @honeycombio

Honeycomb is the state of the art in observability: customers send us lots of data and then compose complex, ad-hoc queries. Most are simple, some are not. Some are REALLY not; this load is both complex, spontaneous, and urgent. It would be prohibitively expensive to size a server cluster to handle these big queries quickly, so we took a different approach: farm the work out to Lambda, Amazon’s serverless offering.

In this model, Lambda becomes an on-demand accelerator for our always-on servers. The benefits are immense, improving response times by an order of magnitude. But the challenges are numerous and often unexpected. In this talk, I’ll review the benefits (user experience on demand!) and constraints (everything in AWS has a limit!) of serverless-as-accelerator, and give practical advice based on our own hard-won experience.

The Engineer/Manager Pendulum πŸ˜πŸ‘ŒπŸ‘ŒπŸ‘ŒπŸ‘ŒπŸ‘ŒπŸ‘ŒπŸ˜πŸ˜πŸ˜
Charity Majors CTO @Honeycombio, Previously engineer & manager @Facebook @Parse & @Linden Lab

Should you be a manager? Or should you be an engineer? The old wisdom used to say that you should pick a lane and stick to it, but this is bad advice. The best tech leads I have ever worked with are ones who have done time as a manager, learning how to influence and persuade others, or how to connect their work to business goals. The best engineering managers I have worked with are never more than a few years away from hands-on coding, because they periodically go back to the well to refresh their skills. And the most powerful senior engineering leaders of all stripes? tend to be people who have done both, swinging back and forth between management and engineering over the course of their careers. Repeatedly. Like a pendulum.

We’ll talk about how this benefits companies as well as individuals, and about how to craft the sociotechnical systems that encourage this kind of career development. We’ll show you how to convince your leadership hierarchy to buy into this model. And we’ll talk about how you can successfully move back into an engineering role, even if it’s been a long time or you aren’t sure how to do it.

Magic Pocket: Dropbox’s Exabyte-Scale Blob Storage System
Facundo Agriel Software Engineer / Tech Lead @Dropbox, previously @Amazon

Magic Pocket is used to store all of Dropbox’s data. It is a horizontally scalable exabyte-scale blob storage system which operates out of multiple regions, is able to maintain 99.99% availability and has extremely high durability guarantees, while being more cost efficient than operating in the cloud.

This system is able to facilitate new drive technology, handle millions of queries per second, and automatically identify and repair hundreds of hardware failures per day. We are constantly innovating in this space and work closely with hard drive vendors to adopt the latest drive technology (https://techcrunch.com/2020/10/26/dropbox-begins-shift-to-high-efficiency-western-digital-shingled-magnetic-recording-disks/). Each storage device contains 100+ drives and is multiple petabytes in size. Given the blast radius of single device failures, it is critical that our erasure codes and traffic are all built with this in mind.

In this talk we will deep dive into the architecture of Magic Pocket, some early key design patterns that we still live by to this day, and the challenges of operating such a system at this scale in order to be cost efficient and support many critical requirements.

The key takeaways for this talk are:

Provide an overview of the architecture of Magic Pocket. This includes key services, databases, how multi-region replication works, repairs, and a discussion on the storage devices.
Key architecture lessons, which had the most impact on Magic Pocket.
How we are able to operate such a system, while being extremely cost efficient.
Our system is much cheaper than operating in the cloud, but it operates with a high bar. We discuss these challenges in more detail for others looking to make this transition and what these trade-offs look like.