Chris Pepin (00:04): Ladies and gentlemen, welcome back to the manufacturing talent podcast. I’m your host, Chris Pepin, founder of Progressive Reliability. On the seven part series I’m going to be joined by Tim Goshert and Doug Plucknette. And we’re going to be going through back to the basics, the fundamentals and the importance of getting it right within manufacturing, maintenance, and reliability. Welcome to the conversation!
Chris Pepin (00:26): Welcome back to the manufacturing talent podcast. My name is Chris Pepin, founder of Progressive Reliability. And I am joined by Doug Plucknette as well as Tim Goshert. Doug is an expert in today’s topic which happens to be failure mode. And then we’ve also got Tim’s view as having been a leader of a reliability at Cargill. To Doug, now this is a really great topic, yours someplace. You have a lot of experience and there’s two ways to go about it that we were just discussing before we started the conversation. So why don’t you go ahead and lead us away?
Doug Plucknette (00:59): Well, good morning, Chris. Or whatever time of day it happens to be that you’re all listening to this at any way of failure modes, your main strategy, and I don’t care where you are, what you do, what type of manufacturing, your rent, whether you’re in your utilities, your maintenance plan or your maintenance strategy should be based on failure modes. And this was something that, you know, I’d been working in the trades probably 10 years before somebody enlightened me to this. And the company that I worked for, Eastman [inaudible: 00:01:31] company, we fell into the same trap that everybody else did. You buy a new piece of equipment or you design it and install it and you do a great job. And then you wait for things to start to fail. And when they do, you say, hey, we really should have a PM for that, some type of maintenance plan to keep that from occurring again, so that we don’t have this downtime, and we make sure that we have the parts available and so on and so forth.
Doug Plucknette (01:47): And in doing so we created this culture of component replacers, people that didn’t think about why things failed. We just replaced components. And then the other piece that it created was a maintenance strategy that was heavily based on PM. And when you learn that PM or preventative maintenance, as it’s called only applies to about 7% of your component failures, all of a sudden you realize, holy smokes, we’ve gone and made a mess of what we have here, and no wonder when we struggled to get our maintenance completed. So I want to say it was in the early 1990s that I was introduced to this topic of failure modes and what a failure mode was.
Doug Plucknette (02:44): And realistically there wasn’t really a good definition. It was, you know, why does something fail? Well, realistically, a failure mode is three parts. It’s the part that failed the problem, and then the cause, all right? So it could be bearing, which would be your part seizes as the problem cause lack of lubrication. That’s a specific course. So, when you have those three parts and you make it out like that, once you get to the point where you’re at specific course you can say, all right, what do I need to do to make sure the sparing gets lubricated? How do I make sure it’s the right type, the right amount, the correct frequency, right? If we step back and look at just saying, all right, I’m going to make my maintenance strategy and call a failure mode the bearing failed, I could have 20 reasons why a bearing fails, right?
Doug Plucknette (03:37): And each of those reasons might have a different tasks. Now, realistically, with a bearing, you’re going to end up with four or five different tasks. You’re going to have some PM tasks to lubricate it. You’re going to have some on condition or predict a task to monitor it and to detect potential failures, right? You may have some type of operator care of cleaning those types of things. So getting to that failure mode and understanding failure modes is a key point. So when I was introduced to that, it was my introduction to RCM, of course. And I thought, wow, this is really fantastic. You know, this makes sense to me. Now we can develop a maintenance plan where you’re going to get the design reliability out of the [inaudible: 00:04:25] if you can complete these things and identify them and do them, right?
Doug Plucknette (04:31): And then you go and you do your first RCM analysis and you come to the realization, there’s no way we could do this on everything. Right? It would take forever, it would cost us a ton of money. And so, you take a step back and you go, all right, so what other way is there? So the first way to identify failure modes is RCM. The second way is to r[inaudible: 00:04:] library. All right, now the problem with failure modes libraries is who created it, what was their experience. And understand that because you have a library, and I’ve seen some libraries that had thousands of components and literally 35, 36000 tasks, right? And the next thing, you know, you match your hierarchy up with this and you have 35,000 things to go through to make your maintenance strategy. And it’s overwhelming. So understanding that there’s these two methods, you have to kind of find a way to marry the two of them. Right? And so for your noncritical assets, you’re going to use that library, but you’re going to use a library that’s realistic. All right? So what’s a realistic library look like? That’s more like 1800 parts or components as opposed to 18,000. Tim, I know has some experience with all this stuff and doing this across multiple business units at Cargill. Tim, what are your thoughts on a failure modes based maintenance strategy?
Tim Goshert (06:06): Well, that’s what you really have to get to eventually. And take me back about 20, 25 years in Cargill as you said, most of the work that we did at that time was failure maintenance, meaning it broke and we had to fix it. Or we did some type of PM that wasn’t typically wasn’t quantitative in nature, more qualitative in our PM system systems at the time were riddled with tasks that got there because something had failed in the past. We had an emotional moment because we lost production or quality or it cost us a lot of money. And we put in a PM and we said, we’re going to PM it till the cows come home. Right? Or another task we had is that we were maybe a little bit more proactive. We would have a piece of equipment.
Tim Goshert (07:15): We go to the OEM, the person, you know, the company that built the equipment. And then we follow what they said to do to their piece of equipment. And some of that stuff was pretty good, but other stuff was basically over maintaining or over PM-ing the system because it wasn’t the right thing. So we had, and at the time we really didn’t do much condition-based monitoring or much of the work was based upon, you know, the condition of the assets. So we were in a pretty sad state. And about the same time, Doug, that I learned about RCM is when you said in the mid nineties, and I said, this is the way to go. And how do you get there? And we took the two prong approach. We took the top of, first of all, we had to go through and do criticality assessment, which we talked over the last podcast. Once we knew what our most critical items were, where we talk about the top 10%, top 15, but we started to top 10 and we went on an endeavor to do RCM analysis and do it correctly.
Tim Goshert (08:37): And that’s probably when I met you, Doug, way back when we were on this path of we’re going to at least get through our critical equip top percent, 10% of our critical equipment and get to a failure modes based strategy. So we can keep our top 10% operating correctly. And then we worked down on it, and we ended up probably get doing maybe 15% of our equipment depending on the business unit or the facility, or the desire that unit wanted to do. The rest of the 80 to 85%, we use a failure modes library. And we used that on probably of that, let’s just say there’s 85% left, we used it on all, you know, maybe 70% of that 85 in the last 15% least critical. We just ran into failure. So we, and this, you know, this is sort of a journey.
Tim Goshert (09:50): What we started with in the late nineties. It probably took us through five, seven years as we went through plant by plant business unit, by business unit. And when we probably, as when I left Cargill, we probably still had units that we didn’t even get to on this, especially our small real small plants, elevators, feed mills, and things like that. So it is a process that you have to devote yourself to, you have to put resources to it, and it’s a goal to get there. But once you get there, what you find is once you have a strategy and it’s based on failure modes, and you do an implement, the tasks that are needed to be done, you’ll find that the plant runs better, operates better, makes more production, better quality. You have less failures, and over time the costs will be reduced.
Chris Pepin (10:54): And one of the questions that was coming up as I was listening to your journey is what about members of our audience who maybe don’t have the position you know, have authority to scope out the entire project or folks that are reaching, you know, running into a lot of internal pushback. What’s going to come up along the way of the journey within the organization and within the interpersonal dynamics that folks are going to need to look to that you’ve seen either something as big as Cargill or down just to a single site.
Tim Goshert (11:22): Well, I think it goes back to your sphere of influence. Everybody has a sphere of influence that you have the ability. So if you’re a department manager and your sphere of influence is that department and maybe some surrounding departments, but you can start where your sphere of influences, and you’re, you’re going to say, I am going to have a failure modes based strategy for the critical equipment in my department.
Tim Goshert (11:51): And, you know, you start where you’re at. And as you do that, and you implement it, the success will give you a larger sphere of influence that you might go from a department now to a large area of the planet, wherever they say, well, this works so well in this department. Will you help us get to that in the next area, this bigger area. And then that leads to a plant. And then that leads to maybe a business unit. And so your sphere of influence grows as you do the right things, and you have success throughout the years, but it doesn’t happen overnight. And it, this is hard work. It’s not easy. Oh, it’s, it’s fairly easy to do, but you have to be disciplined. You have to put it on a schedule. You have to have timelines. Then once you know what those failure modes are, you have to implement and do periodically do those tasks to understand what is the health of your asset. And then you have to fix things and fix them correctly. So this is a journey. Like I said, it takes a while.
Doug Plucknette (13:07): And if I could add to that, because there’s been, I would say, well over 50% of the places that I’ve been called into, we’re starting at, you got a maintenance supervisor that’s got a budget and he’s put some of his budget on the line to do this, right? So not a very big sphere or circle of influence. Right? And what I teach them is to say, all right, we’re going to do this and we’re going to implement it, but we’re going to show the business case of why. Right? And then we’re going to show that we’ve got a return on investment for doing it. And once you do that, it’s pretty easy to sell more. Right?
Tim Goshert (13:47): In fact, Doug, it sells itself. People wanted then jump on doing more RCMS before they get done implementing the tasks that they need to on the one they just completed. I know you’ve run across that, Doug, in your life. And I’ve certainly done it.
Doug Plucknette (14:07): A list of probably 200 or more people that sat across the table from me and said, this will never get done. Right? It goes from maintenance mechanics to operators, to supervisors, to engineers. And I tell them, all right, this is the deal we’re going to make. Right? We’re going to start out, we’re going to have a contract or a charter or whatever you want to call it. We’re going to get some people to sign it. Right. But we’re going to talk about this for one machine. Right? You’ve got 500 machines out there, right. We’re only going to do this on one, because we just need to prove to people that if we focused on and do the right things, things will change. And sometimes the only way you can do that is by doing it on one asset because, oh no, we’ll never get this done on everything.
Doug Plucknette (14:55): Well, of course not. Right? You just can’t sell it that much. Unless you’ve got some type of corporate sponsor. And in most cases they don’t. So it comes down to, can we agree to do this on one thing? And I’ve yet to have a company that the operations manager didn’t look at it and go, yeah, one thing I’ll give you one thing. And when it comes up on the schedule, we’ll make sure it gets done when it’s supposed to, and we’ll see what the results are. And if I don’t see results three months down the road, we’re not going to do another one. And I say, hey, I’m good with that. Right?
Chris Pepin (15:27): Yeah. It seems like the starting of things is always the most difficult part, to actually figure out how to break it down and get it going. Like we were talking about with the clean the garage analogy, a couple of episodes ago, just kind of getting it going with respect to the MRO. Yeah. I’m, I’m just kind of curious. It sounds like it’s a lot more of progress than perfection.
Doug Plucknette (15:53): It is a process and there are certainly smart ways to do this. And that’s, you know, the great part about Cargill is they had a leadership team that, you know, I could sit and talk with or send emails to. And we had discussions about what’s the best way to get this done. When you’re talking about a [inaudible: 00:16:10] that has got assets all over the world and business units that you know, are separated sometimes by hundreds of miles, the smart thing is to say, okay, here’s a critical asset that’s critical at 25 sites. Let’s bring some people in from multiple sites, and we’ll talk about the RCM that way, right? And then when we get done, we can implement this across multiple sites. Well, then it gets into the discussion. I want them to understand you’re going to have a template, but that doesn’t mean this is going to work at all the sites.
Doug Plucknette (16:42): And they look at me like, well, what are you talking about? Well, they may operate a little bit differently, right? The products they may have, they run, might be a little bit different. So they might see different failure modes. They work in a different environment. Some places are really cold. Other places are really hot. So your maintenance strategy might change. So we teach them to understand that their operating context is a little bit different. So they’re going to need to take this base of what we developed by having a group of people together, and then look at the outcome and say, all right, we do or don’t have that failure mode, right? Or we do have that failure mode and it doesn’t occur as often. So if it takes us five days to get through a major piece of equipment, it’ll take them a day to do their assessment to say, how does this apply? Right? And there some things that we have that are different that weren’t mentioned in here that we need to add? And by doing that, you can gain some momentum that way.
Tim Goshert (17:38): That strategy worked extremely well, especially in Cargill, which had business unit set, you know, had a fleet of 25, 50, 75 plants that were very similar and they had the same equipment. So it worked. That strategy saved a lot of money, saved a lot of time and got us huge traction when we picked the right piece of equipment. That was really critical to that business units, you know, production and cost parameters. So that works very well. We also used different strategies by business unit, depending on maybe its size or location of how did we get that RCM done in large business units, we had, Doug, train internal facilitators that understood RCM could lead at the large facilities, their own RCMS, and then work off their own lists. Whereas smaller ones, smaller business units and smaller sites.
Tim Goshert (18:51): We took the approach like we just talked about, can we do it on one similar piece of equipment and then leverage it over, or you know, 25 different plants or 25 different departments. That’s how it is. So how you do it, you need to be creative and think about what are the best ways to get it done in a cost effective and productive manner. And the two that were just pointed out here, there’s probably a couple more different ways of doing it that you’ve seen, Doug, but those were the two strategies that we worked at Cargill that worked pretty well
Doug Plucknette (19:29): And even those I would add, you know, I not only train them, I certify them. Right. So there’s a mentoring process and there were some that didn’t make it through. Right? So it’s of those things that picking the right people, getting them trained, getting them mentored, getting them certified to where you’re confident that they know what they’re doing. Right? And they can lead a group.
Tim Goshert (19:54): The thing about that, Doug, is it reminds me some of the folks that we initially trained and to do it at a facility today, you know, and they were just young reliability engineers. Some right out of school that, that had the desire, the motivation. And guess what, today, 20 years later now plant managers and business [inaudible: 00:20:18] operational leaders. Now, how great is that to have that type of knowledge and background in a plant manager or a business unit ops leader? I mean, having, they know the fundamentals of equipment failure modes and strategies. It’s a beautiful way to grow the organization, but it takes patience. So you can’t think it’s going to happen in two years. That might take 10, 15 years, but you’re much better off in the end.
Chris Pepin (20:50): Doug, I’m curious, you know, I know we’ve talked about the journey, talked about your experience back in the early days when this was coming about as well as 10 to high level view. What can we relate back to site today? What are you seeing out there in the field in terms of what people are missing and what the real opportunities are? You know, as we’re doing this recording and we’re coming out of 2020, and all the, everything that came along with it, what’s the best way to move forward?
Doug Plucknette (21:20): Yeah, the best way to move forward in terms of failure modes is the exact order. Go back to session one that we started with this and listen to it and do it exactly in that order, right? Start with the hierarchy. You do the criticality analysis, identify your top critical assets, start doing the RCM and do that first, right? There’s going to be an attraction to that failure modes library to say, well, we can do that on everything, right. And I’m going to tell you, it’ll help. Right? If you did that on everything, it would help, but you need people to really understand these concepts. And by doing the library, you miss out on that, right? Because it’s given you, here’s the common failure modes, here’s the common tasks that go along with it. It doesn’t ever consider the operating context which is where most of your failures result from is the context in which you operate the equipment. So once you identify the critical assets, do like Tim to take the top five, 10%. start doing your RCM analysis. Once you’ve got a couple of those implemented and people see that it makes changes, right. And they see that it delivers results, then you’re mature enough to start saying, well, let’s bring a failure modes library in for those noncritical assets and [inaudible: 00:22:45].
Chris Pepin (22:47): And so, the other challenge or temptation that would come along with, you know, every other fire to put out, and every other thing going on with the day-to-day operations is how do you make sure that each step is done at least well enough to move on to the next? How do you keep from skipping steps or inadvertently cutting corners, or you know, management shifts, and all the other things that can come and get in the way of that sort of process.
Doug Plucknette (23:11): Yeah. That really comes with experience. Right? I, in fact, tell people when we get started with this, right, we’re going to do a critical asset. We’re going to go through and do this RCM, it’s going to come out with a maintenance strategy. And it’s not going to deliver a result that eliminates all emergency demand jobs. And they look at me, they go, well, what are we doing this for all, it’s going to drastically reduce them. But to think that you can cover everything by sitting down and going through this, you’re going to miss some things, right. It’s just the way human nature is. But if you’re at a point where you’re 40, 50, 60% emergency demand work, that’s going to go away. Right?
Chris Pepin (23:54): And how do you track that and show it because especially in a reactive organization, I think it’s really important for anybody trying to do a change management process is to have really good data on here’s how things are moving, here’s how things are changing. So as we’re kind of at the mid point of these conversations, in the mid point of the basics, what’s a good way at this phase to make sure you’re reinforcing the actual progress when everything still looks like work, and frankly, things aren’t just perfect.
Doug Plucknette (24:21): Well, this is where good RCM training comes in. And a good RCM training course is going to teach, how do we measure this upfront? I want to know the performance of that asset before we start. I want to see the performance of it. If you could get me three years of data, I’d love it. Right? But typically I go, give me the three months previous data, how did that machine run? I want to see OEE – overall equipment effectiveness. And I want to see it broken into the major loss categories and show me what the reliability I’d asked that was, then we’re going to do this analysis we’re going to implement. And we’re going to continue to track that measurement. Right? And you’re going to see that measurement change drastically, right? If we do the right job, you will see that change, and that results in money, right?
Tim Goshert (25:09): In Cargill, Doug, remember in the beginning, if you pick the right asset and the reason you pick it is because you’re having huge problems with it. And you do the RCM and you implement the tasks, the outcomes self-evident to the entire organization. I can recall in our oil seeds business, that we had a, what was it, a cup filler line, Doug, that supplied stuff to McDonald’s. And we were struggling with that. We did an RCM analysis and it just flipped the results, and production became a hundred percent better. That’s just a recollection of if, if that, if you pick the right asset, that you’re having huge problems with, you do the right work on it, and then you implement it, the results just shine in the entire organization.
Doug Plucknette (26:10): And that was, when you have, you know, and I remember because at that plant, I did a number of RCMS. And when you come back in and the operator says, I’ve had three Saturdays [inaudible: 00:26:24] off, right? And you go, what are you talking about? You have no idea how much overtime I had to work. Right? Because this thing wouldn’t run, and I was the most experienced person. Right? And the maintenance guys are saying the same thing. Right? I don’t have to, you know, my phone hasn’t rang in the middle of the night, you know, I’ve been able to see my kids, little league games. It’s just, it changes not only what goes on at the plant, but it changes people’s lives. And they all of a sudden, you’ve got believers, right? And you’ve got operators and maintenance people telling the other operators and maintenance people, you get a chance to sit in on one of these, do it. Right? Because it’s going to change the way your machine runs. It’s incredible. All right? So, it’s pretty self-evident. I’m big on the measures, but as Tim says it, it really does become evident. And it goes across the site when people see it, they go, holy smokes.
Tim Goshert (27:23): Different problem we had there, Doug, if you recall. Then they wanted to do RCM after RCM after RCM. And we had to slow them down to say, okay, we’ll do a next RCM, which one we want to do. And before we do any more RCMS, we’re going to implement everything. Right? We’re going to implement what we say we’re going to do. That was the next challenge, because people get excited and they want it across all their equipment. Well, sure. But you can’t, you have to do it methodically.
Doug Plucknette (27:58): And in fact, they were cherry picking at first as well. And I call cherry picking, they go through the list of tasks and they go, okay, here’s the ones that are most important. Let’s do those first. And then they start seeing results. And then they say, let’s move on to the next thing I go, no, no, no, no, no. You only implemented 20% of what’s out there. You’ve got another 80% to implement before we do the next one. Right. At least 80% needs to be done. So they do. It gets exciting. It’s fun to see.
Chris Pepin (28:32): Well, it sounds like we’re at the point where there’s real hope and real light in the process on this is where the changes start to become evident and things really start to work the way they should. How, as a final question, how does this line up with our initial conversation around all the new technologies coming out, smart machines, you know, sensors, data and everything else that’s able to be used? At what point does this integrate really well with everything else, kind of new, shiny and exciting out there?
Doug Plucknette (29:04): I have to chuckle Chris, because you know, I see all this stuff and I go, understand if you’re adding 25 or 30 new [inaudible: 00:29:12] components, they all have their own failure modes. Right? And you need to assess those as well. Right? How can they fail? What causes them to fail? What can we do in terms of a strategy to either eliminate the failure mode, to check the failure mode or reduce the consequences of the failure mode? Right? And so that’s just the first way that lines up with it, right? The second is if we go through and do failure modes analysis on all the things that those sensors are looking for, so the components [inaudible: 00:29:50], or for the failure modes detecting, if we can eliminate a number of those that makes the purpose of this even a more direct, right? And it helps to ensure reliability so that we’re only seeing the failure modes that we see based on where, or in life, are those components, right, that we’re not seeing things that because we didn’t install it, right, we didn’t align it right, we didn’t balance it right. All those types of things, we eliminate those failure modes.
Chris Pepin (30:26): Well, I think with that ladies and gentlemen, thanks for joining us for another conversation. My name is Chris, and this is Tim and Doug at the manufacturing talent podcast. And please join us on the next episode where we will be talking about maintenance tasks development, and the five types of maintenance tasks and the requirement for each task type. Gentlemen, thanks for joining us and to our audience, thank you for being here as well.
Chris Pepin (30:54): Well, thank you for joining Tim, Doug and myself again. We encourage you to download the white paper, which you can find at our website www.proreli.Com so as you can find myself, Tim, and Doug on LinkedIn, we look forward to joining you on the next one.