Last weekend was the Deep Agile Embedded Conference that I participated in. In this article I’ll answer some of the panel questions related to concurrent hardware development. There seems to be a theme here, because the hardware is involved, an embedded development team really can’t be agile. That’s not my point of view, or my experience. I am not a hardware engineer, but I have worked near them. Let’s see some of the questions and answers.
With agile iterations only a few weeks long, do agile teams really expect to modify hardware that frequently?
It would be better to ask a hardware engineer this question. But anyway, hardware engineers I worked with did not get their complex designs right the first time. They would do some design, then build a prototype, then test the design. It’s an incremental process. Hardware, as I’ve watched its design, starts with a high level block diagram, then some specific area it taken to more detail. Circuits, like code, are incrementally designed and refined based on continual learning.
I know teams that do overnight board turns. Its expensive, so the don’t do it every night, but have that as an option, and find it a valuable capability.
Because hardware lead-times are long, don’t you have to design most of the system up-front? Isn’t this against what agile calls for?
and
In embedded systems development hardware, software and mechanics are built concurrently. That means a BDUF is needed. Comments?
Hardware lead times are long, and granted, HW is not as easy to change as software. Big design up front might seem like the answer. If that is the question, you might hear an agile wise guy ask “how’s that BDUF working for you?” After the laughter dies down (admitting that there is a problem is the first step to recovery) we can get real. BDUF has problems; there is still rework; there is still incremental learning and progress.
At an early stage of development we need to understand the HW/SW boundary in concept, but maybe not in detail. The actual boundary will be fuzzy in the beginning while engineering is still learning; it solidifies over time. Design is a creative learning process no matter how much we want to be able to finish it right the first time.
The fuzzy boundary does not have to halt progress. We can make good progress on both sides of the fuzzy boundary while learning the issues. An agile team will get good at keeping options open while making concrete progress toward the product vision. If you commit too early, you can end up with significant rework, the very thing that concerns you with agile.
The waterfall approach encourages committing before there is enough information to make a good decision on all the technical issues. I am not suggesting that we digress into analysis paralysis. The contrary, get good at keeping options open, make progress concretely by building something and testing it. As the lean folk say: commit at the last responsible moment. It’s a judgement call, but that guidance helps us keep design options open longer and helps avoid designing ourselves into a corner.
Design, be it hardware or software, is an incremental process. Code gets written one line at a time. Boards are designed one circuit at a time, mechanics are… well I don’t know much about that, but it is a creative process. Don’t do to much work on a speculative basis without trying the ideas out. The feedback and course corrections will lead to a better designed product with visible progress.
Hardware is often not ready until late in the product cycle, how can testing be continuous from the beginning without hardware?
From a software perspective, developers must decouple the core application code from the hardware. This allows code to be tested both on and off the target, called dual targeting. Where progress can be made early and often, code is first tested in the development system. Periodically code is compiled and run in the target. The target may not be available early in the development process. So dual targeting is critical for early software progress. That progress is measured in passing automated tests rather than document weight.
To pull this off, embedded software developers must manage their dependencies on the hardware and operating system. A thin layer should separate the actual hardware from the core system behaviors.
With the ability to run some tests on the development environment and some in target, the next thing to do is to get a continuous integration server going. There are a number of tools out there. I have found hudson a breeze to set up. What a CI server does is monitor your source repository and initiate a build when there are changes. First the code is automatically built and automated tests are run on the development system. Next the CI script should build for target and automatically deploy the tests executable to the target, or target compatible execution environment. If there are failures, all contributors to the build are notified so the build can be fixed immediately. Many integration problems are uncovered automatically with this approach.
The advice to abstract the hardware is nothing unique to agile. Think of your own recent history, if you had abstracted the hardware, would that major redesign have been needed when moving your software to the new hardware platform. Instead, you would be readapting your hardware abstraction layer to new HW. The TDD approach encourages the creation of these test seams in your code.
See Progress Before Hardware for a in-depth look at this topic.
Great stuff. Thanks for sharing it!
These are good points. We have discussed some of these before with our hardware people. The problem is one of “seeing is believing” and they don’t believe enough to try to see. Meaning, they hear the above arguments, declare them fantasy and refuse to try.
Other than calling in a management edict, I have not been successful getting over such flat dismissals of these agile hardware development benefits. Do you have any suggestions on providing the hard skeptics with a positive experience?
When you discussed hardware, you twice mentioned boards but never mentioned chips. Chips (of the non-FPGA type) take two to three months to make. Before sending the design off to be made into chips, hardware and software engineers can work together using agile techniques on FPGAs, virtual prototypes, and co-simulation. However, those techniques have their limitations, preventing the full system from being tested. At some point, the chip design has to be frozen and sent off to be fabricated. During the intervening two or three months, hardware is no longer agile. After the chips come back, then the full and complete system testing can be done with hardware and software integrated.
As you pointed out, if there is a problem with the board, you can get an overnight turn for a hefty fee. But no amount of money can respin a chip in less than two months (six weeks if you’re lucky.)
In order to increase the likelihood of a successful chip fabrication, many non-agile techniques must be used, such as designing for the future, putting in more features than you need right now, and adding lots of hooks to mitigate defects that will show up. And that creates a problem on the software side because now they have to provide support for it.
In addition to the agile techniques on the software side, focusing on and following design guidelines and best practices for the interface between hardware and software will give the product a better chance of a successful hardware/software integration and succeeding in the market.
Thanks for adding to the discussion.
It sounds like the self-preservation techniques, which you call non-agile, are very important. I assume it’s similar today, as days gone by, where there is an emphasis on test and simulation. In the olden days in the 80’s we would spend lots of effort on test vectors for custom gate arrays. We would not dream of sending the design to fab without them. You get to toss one golden horse shoe, you want to aim true.
So, when you commit to a design, you want to be pretty darn sure that it has a good chance of working. In the lean world they call that waiting for the last responsible moment. You also design in flexibility. It sounds like you hedge your bets by designing in design alternatives and probably some programability.
I see from your workshop that you value some of the same things agile software developers value: collaboration, practices, principles, post mortem (we call them retrospectives). I don’t see defect prevention, automated test or continuous improvement, but I suspect they are there.
I bet there are efforts underway to reduce the time from design submission to first silicon. Is it better today than 10 years ago? How many spins of silicon are typical? What are the practices to get it right the first time? What are the lean/agile techniques for silicon design?
I don’t know these answers but I am interested.
Yes, there is still a big emphasis on verification. In fact, the verification industry is very prominent, providing many tools, experts, and conferences in that area.
Usually when I see “collaboration” mentioned it is in context of having the same types collaborating together, such as two sw engineers doing pair programming, or one sw engineer practicing continuous integration so their sw is quickly made available to other sw engineers. In my workshop, I focus on the interface between hw and sw, which means my collaboration focus is on getting hw engineers to collaborate with sw engineers. Things like co-simulations and virtual prototypes are excellent tools to get the two sides working together.
Much of what I discuss in the workshop is focused on defect prevention by correct design. I’ve dealt with many defects over the years that simply would have been avoided had some design practices been followed. This is hard at the hw/sw boundary because hw engineers do not understand sw nuances. This is why collaboration is so important.
I do talk about continuous improvement but in the silicon world, it is 12 to 18 months (maybe) before you see improvement from one version to the next. Automated testing is not an area I cover.
“First-time-right” efforts have been underway for a few years. Ten years ago, companies had the money and time to do three, four, or five respins to get the chip right. But with the business emphasis on shortening time-to-market and minimizing development costs (one respin these days can cost up to three months and a million dollars,) much has been done to reduce respins. The numbers that I have seen lately are that an average chip requires 1.5 to 3 respins. That is really good considering that the leading process technology today is 45nm, which is one sixth of the 250nm process used to make chips 10 years ago. Because of the smaller geometries, chips now come with millions of gates, making for a very complex part to design and verify.
ESL (Electronic System Level) design is big in the industry right now where design efforts start at the system level. The hw/sw boundary is fuzzy at this time. Models are used to emulate components. Simulations help refine the models and designs. Once things are decided at the system level, then detailed design can progress. This helps cut out waste from designing the wrong building blocks or putting in the wrong features (i.e. lean and agile techniques.)
As far as practices to get it right the first time, several are mentioned in my newsletters.
It sounds like you are using a very similar approach to what in software we call the walking skeleton wherein we define a very early implementation of the architect. All the components of the architecture exist but are typically hard-coded inside each component to satisfy one or two key user stories (transactions.) We drive this effort using automated acceptance tests.
Building the walking skeleton helps us focus on the interfaces early in the project and to verify that the components will be able to interact to achieve the overall goal. Then teams can go off and add capabilities (muscles) inside each of the components to handle more user stories. Since the architecture is already integrated and running, we have a lot fewer integration problems later on. We can still evolve the interfaces but these are relatively small changes in the context of the overall architecture.
Does this sound similar to what you are doing in hardware?