Episode 034: Cloud Native Patterns and Practices

Mike Pfeiffer on August, 14, 2019

In this episode we chat with Cornelia Davis about her latest book “Cloud Native Patterns Designing change-tolerant software”.

Cornelia Davis is a software technologist with more than 25 years experience who helps to drive technical strategy, product development and go to market, and to help customers leverage said technology to further their business goals. I prefer to spend half my time directly engaged with customers and prospects deeply understanding their needs and helping them solve their problems. The other half of the time I distill what I learn through these engagements and use that to drive product evolution as well as industry advancement through evangelism – conferences and writing. My personal mantra is “free your mind.” And I still cut code, even if only a bit of the time.

A self-proclaimed propellerhead, Cornelia Davis is Chief Technology Officer at Weaveworks, the leading provider of operational (#gitops), Kubernetes-based solutions. In that role she is responsible for the technology strategy of the company, hyper-focused on helping customers develop and execute on their cloud platform strategies. Ultimately, her aim is to enable developers and operations teams to fully support the business needs of their organizations.

When not doing those things you can find her on the yoga mat or in the kitchen.

Here are the resources we discussed in this episode:

Full Transcript:

Mike Pfeiffer:
All right everybody, welcome back to another episode of CloudSkills.fm. As usual, I really appreciate you guys being here. Today’s episode, I’m really excited for. We’ve got Cornelia Davis. She’s the vice president of technology at Pivotal Software, and she’s also the author of a new book called Cloud Native Patterns: Designing Change-tolerance Software. Cornelia, welcome to the show.

Cornelia Davis:
Thank you so much. I’m so delighted to be here.

Mike Pfeiffer:
It’s really exciting because everybody’s talking about cloud native these days, but it seems like there might be a little bit of confusion, so we’ve got lot to talk about. I started reading your book this week. It’s amazing. People are talking about pivotal software cloud foundry, which you guys produce and work on, but to get started, maybe we could talk about your career, your backstory, and what you do as a vice president of technology at a major software company.

Cornelia Davis:
Sure, sure. Thank you. I am a computer scientist by training, and I have been in the industry for about 30 years, which is really fun, especially now that we’re in this cloud native space because it is so different. I’ll touch upon that a little bit. I am still, after 30 years, is still cutting code as you can see in my book. Maybe it’s mostly on the nights and weekends. I think my day job during the day, I spend a lot of time doing technical strategy, not so much cutting code anymore, although those are my best days when I get to do that.

Cornelia Davis:
I have spent time in my career. I started my career in aerospace actually working for Hughes Aircraft, doing imaging systems for them. Then I moved over into the commercial space, and about 20 years ago, almost exactly 20 years ago, went to work for a company called eRoom Technology, which did web-based collaboration. It’s the things that I spent a lot of time actually speaking with young people as well early in their careers, and when I describe to them what we did 20 years ago in eRoom, where we did things like, “Ooh, web-based document sharing or web-based threaded discussion,” they look at me like I have two heads.

Cornelia Davis:
Like, “What? That hasn’t always existed?” Long story short, very small startup company got acquired by Documentum. Documentum got acquired by EMC. I went from being employee number 65 in eRoom to being employee one of 30,000. I worked for small companies and large companies across the board, pretty much always stayed in a technical capacity. At EMC, I worked in the corporate CTO office doing architecture and emerging technology, and then EMC and VMware did a spinoff, which is now pivotal and that am I still in my role as VP of technology, still focus on architecture and emerging technology.

Cornelia Davis:
Now, the emerging technology space that I started working on about six or seven years ago, around the time of the pivotal spinoff, was Platform as a Service. Now, Platform as a Service really where cloud foundry was successful was really successful in this space of… The timing was perfect. Cloud native was starting to become understood. I won’t sit here and tell you that we had it all figured out at Pivotal and in the cloud foundry platform six years ago, but what ended up emerging was there’s this whole new class of software that we need to build as we move into the cloud, because where we used to think of software…

Cornelia Davis:
We used to build software assuming some level of stability from the infrastructure. Now, when we moved to the cloud, we can no longer make that assumption. Our software has to remain running even when AWS has a regional outage. That’s in fact how I start the book is talking about this regional outage that AWS had and all these businesses that were affected by it. For Netflix, it was a shrug. They were like, “Minor, minor glitch,” and yet there were companies that were offline for 12 hours. It was far more than a minor glitch for them.

Cornelia Davis:
That’s what cloud native is all about, and so I focus on these emerging spaces and help our customers come along into these emerging spaces, and cloud native has been by far the overarching, biggest disruptor in the last I would say half dozen years or so.

Mike Pfeiffer:
That’s so fascinating. I was actually had planned on asking you about that because when I was reading the first part of the book, I remember that incident, and I remember Netflix being responsive to it because, I think, they went through a bigger one maybe several years ago before that, and they’ve learned from that lesson just to your point. They do things like chaos engineering, which is a new concept for a lot of folks, but Netflix got it so figured out. To your point, they’re doing stuff in multiple geographic regions so they can actually recover from these outagess [inaudible 00:05:36].

Mike Pfeiffer:
That’s something we’ve never been able to really do before, the low-cost entry, which is fascinating with these cloud platforms. When it comes to the book, I know there’s a lot in there. Is it something where it’s going to be, “I have to know Pivotal’s cloud boundaries software or I have to be a specific platform expert at AWS or Google or Azure,” or what’s the idea? Is that more of a big picture type of thing?

Cornelia Davis:
It’s definitely more of a big picture thing. I like to describe the book as it’s a software architecture book supported by code samples and supported by real deployments. Now, the deployment, so when I have my readers go through the exercises… By the way, I was just speaking with somebody who’s read the book, who did not do any of the code exercises and said, “My goodness, I learned so much just understanding the concepts without going through the code exercises.” She’s not somebody who cuts code on a regular basis, so she didn’t need to take it to that level, but when I do have my readers, if you do follow along and do the exercises, I actually have you do the deployments into Kubernetes.

Cornelia Davis:
Most of them, you can do on a local Kubernetes instance, no cost, nothing at all. There are a couple of places in the book where I need a larger deployment of Kubernetes because I do a simulation. Actually, I do a simulation of a network outage, and I want to see some of the cascading failures that happen. If you follow along with those, you need a bigger cluster so you can do that on GKE or EKS, Amazons or Azures, or if you have some other capacity where you can get Kubernetes. Kubernetes, by the way, is the new shiny emerging technology that I’ve focused on for the last two or three years and have been working with customers to help them understand what’s the actual business value that you can glean from adopting Kubernetes.

Mike Pfeiffer:
It’s been interesting to see so much around Kubernetes. Even in just the last six, seven months, there’s so much coming through on a daily basis on social. You can tell it’s very hot. Is that something that you guys… I haven’t really spent a lot of time lately digging into what you’re doing at Pivotal. Is that what cloud foundry is leveraging these days is Kubernetes?

Cornelia Davis:
Absolutely. It’s moving in that direction. We have always been a container-based platform. We have always had what I like to say the DNA of Kubernetes built into it, but it predates the existence of Kubernetes. In fact, it even predates the existence of Docker, but we have been leveraging containers for seven or eight years in the cloud foundry system, but now that Kubernetes has emerged, we are moving over to rather than maintaining our own orchestration engine for containers, leveraging where all of this innovation is happening around Kubernetes. You will see us moving in that direction.

Mike Pfeiffer:
Wow. That’s really interesting. I think that it’ll be fun to see how things play out because it seems like there’s a lot of people that maybe are going down the road of Kubernetes or things like micro services. Maybe they don’t know what they’re getting into yet, and they don’t know that maybe that is going to work for them or maybe it’s too much for them. Are you seeing that? Are you seeing anything where folks maybe or taken a sledgehammer to a thumbtack, so to speak, like they’re over-engineering or doing things when it might be a little bit easier, or do you find that these newer of patterns and practices are easy for teams to pick up and it’s pretty straight forward?

Cornelia Davis:
I wouldn’t say that it’s easy for people to pick up. If we circle back to the book, one of my goals… I would love to say that three years ago, I had this goal in mind, but I was a novice author three years ago and learned as I wrote the book. I teach patterns in the book. As you said, the book has cloud native patterns. I teach patterns like retries or circuit breakers, which are terms that we hear as you were just mentioning, social or articles. We see lots of articles about those types of things.

Cornelia Davis:
Those patterns in and of themselves actually aren’t that hard. A retry, it’s a super simple concept. We no longer throw up our hands and say, “Shrug. I made a request, never got a response, so therefore, it’s okay for me to fail.” No. A retry says, “If I make a request and I time out, well, maybe I’ll just retry that request.” We do that as humans. If we’re navigating the web and we get to a webpage and the little icon’s spinning and the page never renders, we hit stop and refresh. That’s our retry. We can do that in software, and the concept of a retry is really that simple.

Cornelia Davis:
Now, I was just mentioning a moment ago that I did a simulation where I wanted to show some cascading failure. It turns out that if you do retries in a very naive way, where you just keep retrying, then you run the risk of creating what’s called a retry storm, where you’ve got a whole bunch of retries that are queued up so that when the network does come back, let’s say the reason you’ve never heard back was that there was a network blip. When the network comes back, you overwhelm the system at the other end with a whole bunch of queued up retries. Then that in fact is exactly what was the cause of that Amazon outage, that regional outage, was it ended up being a retry storm on RDS at the back end.

Mike Pfeiffer:
I see. Interesting.

Cornelia Davis:
Exactly. That’s what then created this cascading failure. You do need to understand retries at the next level. There’s some fairly simple heuristics that you can put in place to protect yourself from retry storm, so you can make sure that from a client perspective, when you’re doing retries that you don’t just do it indefinitely. Maybe you say, “I’m going to do no more than three or four retries, and then I will actually error out.” You can put in place caches so that after three or four times, if I’m still timing out, I’m going to leverage cash data.

Cornelia Davis:
You can also put in exponential back offs or logarithmic back offs where you say, “Okay, well, I’m not going to try every half a second, but I’ll try, and if I get past three, then I’ll wait, and I’ll wait five seconds before I do the retry.” You can throttle from the client’s side and then you can also protect yourself on the service aside from retry storms with things like circuit breakers. Again, circuit breaker, the concept is relatively simple. It’s just like in your house. If you are getting overwhelmed with load, and your wires are getting very hot, then you flip the breaker, and you say, “I’m not going to actually address any incoming requests until I give my system a chance to relax and then somebody is going to go flip that breaker back.”

Cornelia Davis:
Again, a relatively simple concept, but back to your question of are people able to just do this? Are they picking up on this? The concepts themselves are simple. Knowing when to apply them and how to apply them is actually where the complexity is. What I do in the book and why I think my colleague who said she didn’t execute the code samples but still gleaned a lot of value was that she understood I probably spend more time on the context than I do the patterns themselves, recognizing when and where to apply retries and how to do it carefully, when and where to apply circuit breakers.

Cornelia Davis:
I think a lot of people talk about circuit practice, but when do you do it and in what context? That’s super important. That is far more difficult is to get that holistic picture, and that takes time. That takes time for engineers who are coming maybe from a traditional place, three tier client server architectures with assumptions of stable infrastructure, and release cycles that were 18 months or maybe six months but not release cycles that are three times a day. Moving over into this new world I think is definitely something that requires getting your legs under you.

Mike Pfeiffer:
It makes a lot of sense. It’s easy to see the slew of tutorials that fly through your stream. Then like, “Oh, it looks so easy,” but to your point, it’s simple perhaps concepts, but in practice and knowing when, I really love that you brought that up because knowing the context of when to use it actually makes a lot of sense. Going down that road a little bit, it’s interesting how established software patterns just in writing code have been so important for developers, but now, this concept of knowing cloud native patterns, just what you were talking about, knowing those circuit breaker and all of this, and actually coding to a contract might be a complete departure for folks.

Mike Pfeiffer:
We’ve got people that are operations focused listening to the show. We’ve got developers as well, but one of the things that people keep talking about is the concept of getting into cloud native and really programming to a contract and things like the 12-factor application model. I think that’s something that you talk about in the book, the 12-factor app concepts?

Cornelia Davis:
What’s interesting about the 12 factors is, and just to give you a little bit of history, ironically, when Manning reached out to me initially three years ago, it’s been about three years, the first person who reached out to me from Manning said, “Hey, you seem to know something about 12 factors.” That was the topic that they teed up, and that then eventually turned into cloud native, cloud native patterns and those types of things, but one of the things that I think is interesting about the 12 factors, and for your listeners who maybe aren’t familiar with this, if you go to 12 factors, the number 12factor.net is where you’ll find the 12 factors.

Cornelia Davis:
It’s very brief. Some of those factors are software architectural patterns. It does talk about things like design for failure, which can set some of the things we’ve just been talking about like retries and circuit breakers and those types of things. It also talks about practices, so it talks about various practices, some of which are again around developing codes. It talks about single repose and things like that for each service. It also talks about some of the operational practices. I think it’s critically important that we as developers understand that element of the context as well.

Cornelia Davis:
Not only the context of what does the architecture look like and what kind of architectural changes do I need to be able to adapt to, but also what are some of these operational practices? I actually spend an entire chapter in the book. The book is broken up into two parts. The first part has no specific patterns, no code samples, but it is setting the overall broader context. In chapter one, I talk about how I define cloud native and differentiate it from cloud. In chapter two, I actually dedicate the whole chapter on what does it mean to operate cloud native software, because it’s important for us as developers to have empathy and to have an understanding what those operational practices are so that we can support them.

Cornelia Davis:
Then the third chapter says, “Well, as you’re learning these patterns, in fact, you’re not necessarily as a developer responsible for implementing every one of them. You’ve got to understand them,” but the implementation of some of these things actually will come through a platform. Kubernetes as a platform provides implementations of some of the patterns that you need to understand and understand when to leverage them from the platform in your cloud native software.

Mike Pfeiffer:
It’s interesting that you mentioned empathy because the light bulb went off in my head, because I’ve been dealing with a lot of teams lately, enterprises that are getting into cloud for the first time, and they’re still in the traditional model of IT teams. You see a lot of lack of empathy, and you see a lot of people stepping out of their lane because your point there is you want to control everything. It’s been interesting the cultural shifts as well as the technical side of it.

Mike Pfeiffer:
Are you seeing that as a challenge for the folks that you work with and the customers that you deal with? Has that been a common thing?

Cornelia Davis:
No question. This is a constant dialogue that we have within Pivotal. I’ve been, like I said, with Pivotal since the spin off. I’ll tell you that the first couple of cloud foundry sales, if you will, that we had where we sold cloud foundry as a product, I don’t think we even have Pivotal had fully appreciated the other changes that needed to come along with embracing a new platform like that, because the platform again is really designed for cloud native software. In retrospect now, I will tell you that it’s become really, really valuable to understand and to really break out a couple of different personas, and for the customers, where they’ve really embraced this model and they’ve actually changed some of their organizational structures and changed even the way that IT worked.

Cornelia Davis:
IT historically has been an application team comes along. They’ve done their evaluation of what they need. They come to the IT team and they say, “Here’s the infrastructure I need. When will you have it ready for me to be able to deploy my application into production on this?” Each one was maybe bespoke and those types of things. Well, you talked about this earlier actually Mike, where you said it’s about the contract, right? If we’re coding to a contract, this understanding around cloud native and around cloud native patterns, then we can start to make some of those primitives available in a platform, and a platform team can be responsible for those.

Cornelia Davis:
They can be responsible for providing a platform as a product to an app team that can then depend on some of those things that they no longer have to code themselves. Now, we start to have fewer and fewer snowflakes, and fewer and fewer snowflakes is so essential to these agile deployment practices that we put in place, where we want to do deployments very frequently. We can’t do deployments frequently or go from code complete to a deployment and production if each time we need to do some bespoke configuration or even bespoke standing up of infrastructure in some way. We need to have a little bit more of a repeatable pattern there.

Cornelia Davis:
We definitely have to help our customers get there.

Mike Pfeiffer:
Totally. It’s an interesting time because so many people are not only trying to figure out the tech, but they’re trying to figure out the bigger picture. I think they’re struggling there. I think that’s what I love the most about your book was that what we talked about earlier is it’s not specific to a particular technology. It’s the patterns, the practices that can be applied everywhere. One of the things I see people struggling is they’re getting caught up in the details of a certain cloud platform like, “Oh, I need to learn this specific piece of AWS or this specific piece of Google,” but the reality is all the major public cloud platforms have the same services, right?

Mike Pfeiffer:
Different names, but ultimately, they’re very similar. It’s almost like understanding the big picture is the first part. Understanding that cloud native patterns is the first key, and then you can drill into the details because everybody’s talking about multi-cloud, right? This all supports that, your book and even Pivotal cloud foundry if I’m not mistaken, right?

Cornelia Davis:
That’s right. That has been the mainframe for us from the very beginning is to be multi-cloud.

Mike Pfeiffer:
People seem like they’re talking about a lot this year. Part of the other big difference I’m seeing too with enterprises is that before, they weren’t really serious as much as they are now. They’re really doing true proof of concepts now, where before, a developer maybe were spinning up their own account and their trial account in their own credit card. Now, stuff’s actually more formal. It’s been really interesting. Is there any other major struggling points that you’ve seen out there for folks getting into this new world that somebody listening might be able to close the gap on that as they’re getting started?

Cornelia Davis:
I mean, it’s very much related to what we were just talking about a moment ago, and that is that when we go into organizations and start to chat with them about cloud-native, both software patterns and operational patterns and deployment and how you get to production, I think probably the biggest barrier that we run into is that we’ve always done it this way. When we think about it, I just want to really say this directly is that… I started my career about 30 years ago, and I was at the university right when we were making that from mainframe into client server.

Cornelia Davis:
This was in the early, well, actually probably mid to late '80s. When I started at the university in 1983, I still coded on the mainframe, but not for too, too long. After that, we were moving over to client server. The phase that we’re going into now, where we’re going from that going into cloud native, that is the shift that we’re having for the first time in three decades. When we think about it, we can sometimes be a little bit judgmental and we say a legacy system. Legacy system sounds so judgmental or legacy prophecy. That’s the way we’ve always done it.

Cornelia Davis:
Even just now when I said that, we’ve always done it this way. I guess to some extent it might’ve found it a little judgmental, but the thing that I want to point out is that 30 years is a long time, and we generated, I learned in the university, software engineering practices that will waterfall. That’s what we were taught. That was the state-of-the art in terms of software engineering practices. A great number of my colleagues come from that era as well. Even if you went to school 10 years ago, you probably were still learning about waterfall because that was still relevant.

Cornelia Davis:
Things like ITIL and these development practices were established, and they truly were best practices for the time, and so there shouldn’t be judgment on that. I think that’s one of the areas that we struggle is we get a little judgmental about, “Oh gosh, look how silly that was,” but it wasn’t silly at all. We do have to recognize that we are right now coming up with ITIL, if you will, for the next big wave. Now, do I think this wave is going to be 30 years? I don’t know. I think that we’re accelerating the rate of change, so I don’t know if it’s going to be 30 years, but I certainly think that these fundamentals around cloud native are going to be around for the next 10 to 20 years, and we still have to generate all the practices around those.

Cornelia Davis:
I think what is the ITIL for this next generation? I was hinting at that a little bit with the the platforms and platform teams and all of that stuff, but we’re still in the early days of coming up and scaling those best practices.

Mike Pfeiffer:
That’s really interesting because I think one of the things people tend to do is, and you’re right, they tend to stick with what they know, and that can prohibit them from going forward, but we shouldn’t judge it. We should just acknowledge it and say, “Okay, it’s all right. I’ll keep moving.” It seems like a lot of people are having a problem with letting go of the idea of you’re going to have to always keep learning something. I think you’re a great example of somebody that has been in the game 30 years, and you’re still writing code. You’re still doing all these different things.

Mike Pfeiffer:
I think maybe for some folks, the thought that you’re just going to get to a certain point coast for a while, but it’s just in this industry, it’s even faster than ever now. I think that’s going to be a big thing for people to wrap their brain around going forward as well.

Cornelia Davis:
Personally, that would drive me nuts if I wasn’t learning something new every day. I’m totally a change junkie.

Mike Pfeiffer:
I’m the same way. I cannot not learn. It’s what I’m all about. That’s why I love this business. There’s always something new to learn, right? Just a couple other things. I know you’re super busy, but I know that we’ve worked with Manning, the publisher of your book, Cloud Native Patterns. I love the subtitle designing change-tolerant software, because you have to think that way, right? When I worked at AWS, I would always hear Werner Vogels, the CTO of Amazon always talk about everything fails all the time, and just trying to ingrain that in folks minds.

Mike Pfeiffer:
I love that you’re doing that with your book, but we’re working with your publisher Manning to give away a certain number of free copies so the folks listening, you guys can hit the show notes. There’ll be links to all that stuff in the show notes. Where else can we send people to after this episode, Cornelia, that might be interesting for stuff you’re working on or stuff that Pivotal’s working on or even just maybe supplemental things for your book? Any other resources we might want to point folks at?

Cornelia Davis:
A couple of things, I mean, the first thing is that the company that I work for in my day job, not associated with the book directly, but the company I work for my day job as we’ve talked about Pivotal, because we focus so much on cloud native and we focus on this new way of building software. In fact, I mean that’s our tagline is we transform the way the world builds software, which software architecture and platforms to support those as well as working stuff. We do a lot with Agile and XP, extreme programming and those agile ways of working.

Cornelia Davis:
We have a very vibrant blog where you can learn about a lot of the things that are relevant to our customers, and our customers are the ones that are embracing cloud native. I think that’s a really good place to just look at the Pivotal blog. I will tell you that one of the sources that I use, because I’m still learning, I don’t know everything about cloud native and there’s always the next new thing as we were just talking about, is I am a huge fan of Medium and the Medium platform.

Cornelia Davis:
When I of course registered at Medium, I was able to go in and select the areas that I was interested in, which includes things like software and cloud and those types of things, and so my daily news feed has lots from lots of stuff around Kubernetes and those types of things. Kubernetes is a very vibrant community that is very much in that cloud native space. That is a great place where you can see conversations. The other thing that I’ll tell you just a little bit of… There’s never enough hours in the day, but I watch videos when I’m on the treadmill in the morning, and so I have my feed.

Cornelia Davis:
Of course, YouTube is good enough that once I start watching things, like this morning, I was actually watching some videos on RSocket, which is hugely interesting. I want to do a whole nother chapter on going deeper into the network and what does cloud native mean for the network. In my book, we certainly as an industry have talked a lot about what cloud native means at the application layer. I do touch a little pin upon cloud native and the implications on the data layer. I definitely talk about interactions like when we talked about retries and circuit breakers and things like that, but I made the assumption of those interactions happening largely over the existing protocols that we have, which is http, TCP.

Cornelia Davis:
RSocket is now saying, “Well, what if we actually make the network itself cloud native?” That’s a hugely interesting thing. All of that is to say that I learn a ton. I’m one of these people who likes to listen to things as well as watch, and so also YouTube. If you start watching a couple of things, do some Googles on cloud native, and then those things will start being suggested to you. I find that to be a great source of seeing broadly what’s happening in the market and then know where I want to drill into more details.

Mike Pfeiffer:
That’s like a master hack to do your cardio or get your workout in and get up to speed. I’m the same way. I’m always learning something. I know we’ll never get to the end and wrap my brain around everything. I think everybody listening has to commit to that concept of being an eternal student, so I’d love that.

Cornelia Davis:
Exactly.

Mike Pfeiffer:
Well, Cornelia Davis, I really appreciate you being on the show. Everybody listening, you should run out and buy the book, but make sure you join the giveaway. You might get a free copy. It’s going to be awesome. I really enjoyed having you on the show. Thank you so much, Cornelia.

Cornelia Davis:
It was such a delight to be here. Thank you so much.

Mike Pfeiffer:
Want to keep up with what’s going on in cloud computing? If so, subscribe to my weekly newsletter and get my top five tips every week for staying on top of Azure, AWS, and Google cloud. Just go to askmike.io/subscribe to join today. Every week, I’ll send out information about cloud architecture and development, containerized applications with Docker and Kubernetes, DevOps and automation and strategies for getting the latest cloud computing certifications. If that sounds awesome to you, go to askmike.io/subscribe to join the list today.

How to Make the Transition to DevOps Engineer

Discover a proven step-by-step game plan to move into a rewarding career in DevOps and automation.