Boeing has long embraced the power of redundancy to protect its jets and their passengers from a range of potential disruptions, from electrical faults to lightning strikes.
The company typically uses two or even three separate components as fail-safes for crucial tasks to reduce the possibility of a disastrous failure. Its most advanced planes, for instance, have three flight computers that function independently, with each computer containing three different processors manufactured by different companies.
So even some of the people who have worked on Boeing’s new 737 MAX airplane were baffled to learn that the company had designed an automated safety system that abandoned the principles of component redundancy, ultimately entrusting the automated decision-making to just one sensor — a type of sensor that was known to fail. Boeing’s rival, Airbus, has typically depended on three such sensors.
“A single point of failure is an absolute no-no,” said one former Boeing engineer who worked on the MAX, who requested anonymity to speak frankly about the program in an interview with The Seattle Times. “That is just a huge system engineering oversight. To just have missed it, I can’t imagine how.”
Boeing’s design made the flight crew the fail-safe backup to the safety system known as the Maneuvering Characteristics Augmentation System, or MCAS.
The Times has interviewed eight people in recent days who were involved in developing the MAX, which remains grounded around the globe in the wake of two crashes that killed a total of 346 people.
A faulty reading from an angle-of-attack sensor (AOA) — used to assess whether the plane is angled up so much that it is at risk of stalling — is now suspected in the October crash of a 737 MAX in Indonesia, with data suggesting that MCAS pushed the aircraft’s nose toward Earth to avoid a stall that wasn’t happening. Investigators have said another crash in Ethiopia this month has parallels to the first.
Boeing has been working to rejigger its MAX software in recent months, and that includes a plan to have MCAS consider input from both of the plane’s angle-of-attack sensors, according to officials familiar with the new design.
“Our proposed software update incorporates additional limits and safeguards to the system and reduces crew workload,” Boeing said in a statement.
But one problem with two-point redundancies is that if one sensor goes haywire, the plane may not be able to automatically determine which of the two readings is correct, so Boeing has indicated that the MCAS safety system will not function when the sensors record substantial disagreement.
Some observers, including the former Boeing engineer, think the safest option would be for Boeing to have a third sensor to help ferret out an erroneous reading, much like the three-sensor systems on the airplanes at rival Airbus. Adding that option, however, could require a physical retrofit of the MAX.
Andrew Kornecki, a former professor at Embry-Riddle Aeronautical University who has studied redundancy systems in Airbus and Boeing planes, said operating the automated system with one or two sensors would be fine if all the pilots were sufficiently trained in how to assess and handle the plane in the event of a problem. But, he said, if he were designing the system from scratch, he would emphasize the training while also building the plane with three sensors.
“As they say: belt and suspenders,” Kornecki said.
Boeing had been exploring the construction of an all-new airplane earlier this decade. But after American Airlines began discussing orders for a new plane from Airbus in 2011, Boeing abruptly changed course, settling on the faster alternative of modifying its popular 737 into a new MAX model.
Rick Ludtke, a former Boeing engineer who worked on designing the interfaces on the MAX’s flight deck, said managers mandated that any differences from the previous 737 had to be small enough that they wouldn’t trigger the need for pilots to undergo new simulator training.
That left the team working on an old architecture and layers of different design philosophies that had piled on over the years, all to serve an international pilot community that was increasingly expecting automation.
“It’s become such a kludge, that we started to speculate and wonder whether it was safe to do the MAX,” Ludtke said.
Ludtke didn’t work directly on the MCAS, but he worked with those who did. He said that if the group had built the MCAS in a way that would depend on two sensors, and would shut the system off if one fails, he thinks the company would have needed to install an alert in the cockpit to make the pilots aware that the safety system was off.
And if that happens, Ludtke said, the pilots would potentially need training on the new alert and the underlying system. That could mean simulator time, which was off the table.
“The decision path they made with MCAS is probably the wrong one,” Ludtke said. “It shows how the airplane is a bridge too far.”
Boeing said Tuesday that the company’s internal analysis determined that relying on a single source of data was acceptable and in line with industry standards because pilots would have the ability to counteract an erroneous input.
In addition to the imminent software fix for the MCAS, people familiar with Boeing’s plans said the company now intends to make standard two features that previously were optional add-ons at extra cost.
The MAX cockpit will now include a warning light that will illuminate when the two angle-of-attack sensors disagree. And airlines can opt to add, free of charge, angle-of-attack data to the primary flight display.
The company has also started holding information sessions with airlines and regulators about the proposed software fix.
Even before the MCAS system activated during the 737 MAX crash of Lion Air Flight 610 last October, the flight’s data showed signs of a problem.
While the pilots didn’t know it, the plane’s two angle-of-attack sensors were recording substantial disagreements even before takeoff. Once the plane left the ground, the pilots immediately got warnings about the airspeed and the risk of a stall.
The pilots managed to ascend because the MCAS system isn’t designed to operate until pilots retract the flaps used on takeoff. It also doesn’t operate in autopilot mode.
Once the Lion Air pilots retracted the flaps at an altitude of 5,000 feet, however, the MCAS interpreted the erroneous angle-of-attack information and automatically swiveled the airplane’s horizontal tail so as to push the jet’s nose sharply down.
Although the pilot countered the horizontal tail movement and brought the nose back up, the faulty signal continued, so the MCAS pushed it back down again repeatedly during a mortifying 12-minute roller-coaster ride. The pilot lost control and the plane plunged into the sea, killing 189 people.
After the crash this month of another 737 MAX, Ethiopian Airlines Flight 302, initial reports suggest that its shorter flight trajectory was similar. And a part found in the wreckage, the jackscrew, shows that on impact, the horizontal tail was swiveled so as to point the nose sharply down. These clues mean the MCAS is suspected in that tragedy also.
Matt Menza, a former Boeing pilot who worked on the MAX, said that during flight testing of planes ready for delivery, he wasn’t aware of any events that indicated a problem with the stall warning or the MCAS system. But he said an ideal system would have been built on two angle-of-attack probes, so that a single bad value wouldn’t cause problems.
Menza and two other pilots who have worked on the MAX said they were unaware that the system used only one AOA probe.
Still, Menza pointed out that handling uncommanded inputs from the MCAS would be the same as past procedures for any similar problems, with pilots able to easily flip cutout switches to regain manual control.
“A properly trained pilot should be able to solve an MCAS anomaly or any uncommanded flight-control input through procedures that are taught to all 737 pilots,” said Menza, noting that the emergency information Boeing distributed in December reiterated those procedures.
Boeing has contended since the Lion Air crash that the pilots, even though they’d been told nothing about the MCAS, should still have realized that the nose was turning down because of uncommanded movement of the horizontal tail. A large wheel beside the pilot is connected to the tail and would have spun each time the horizontal tail moved.
Boeing told The Times Tuesday that the company’s internal analysis determined that a pilot would be able to counteract an erroneous command by using trim switches on the control column, or by following the standard checklist to use cutoff switches that would have turned off all automatic movement of the horizontal tail.
When the preliminary investigation report into the Lion Air crash was published a month after the accident, Boeing issued a long statement that emphasized this perspective and pointed out that on the day before the crash, a different flight crew on that same jet had encountered similar behavior and had hit the cutoff switches, which allowed the flight to continue uneventfully.
On a flight the day before the Lion Air crash, Bloomberg reported, a third pilot who happened to be on board helped the two pilots figure out that they needed to trip the cut-out switches. So while three heads troubleshooting the problem managed to work out the correct response, it appears that on the subsequent flight, the two pilots were overwhelmed.
Two or three sensors
Peter Seiler, a professor at the University of Minnesota who previously worked on the flight-control electronics for the Boeing 787 aircraft, said it would be highly unusual to have a safety-critical system dependent on a single sensor.
“It’s a huge part of the design. It’s a huge part of the certification process,” Seiler said.
But Seiler said he thinks the MAX would be fine if the MCAS depended on two angle-of-attack sensors. If they disagree with each other substantially, the plane will know that one is malfunctioning and can then prevent the MCAS system from engaging. Seiler said pilots can be made aware.
Since it would be an unusual circumstance, Seiler said he thinks it would be fine for the pilots to continue flying for a few hours without the MCAS safety protection activated before getting the sensor fixed when back on the ground.
“The only issue you then get is if the system failed and the pilot is confused,” Seiler said. “You don’t want to operate the airplane all the time that way.”
The sensors don’t fail often, but FAA records reviewed by The Times show it’s happened on a wide variety of aircraft from Boeing and other manufacturers, including a 2009 flight of a 737 out of Dallas-Fort Worth and a 2013 flight of a 747.
In a 2014 flight of a 767 out of Miami, records show the flight crew reported a disagreement in airspeed readings after takeoff, according to FAA records. An emergency was declared and the plane returned to the airport, and the left angle-of attack sensor was replaced.
Angle-of-attack sensors have been around since the 1940s. On its airplanes, including the 737, Boeing uses two angle-of-attack sensors, one on either side of the plane, with each sensor feeding the angle measurement to instruments on the corresponding side of the cockpit.
If the sensor on one side indicates the plane is at a stall angle, it will trigger warnings for the pilot on that side of the plane, including a “stick shaker.” That pilot’s control column will begin to shake.
If only one side gives such warnings, the cockpit crew has the responsibility of assessing the overall circumstances of the flight to determine which side is correct and proceed accordingly.
In contrast, three angle-of-attack sensors are the norm on airplanes from Airbus, which also uses them to automatically move the plane’s nose down in the event of a stall.
Researchers who have studied Airbus’ system have said it considers all three sensor readings and generally relies on the middle of the three. If one of the sensors drifts far out of range, that sensor is ignored, and the flight-control system continues on using an average of the two remaining sensors.
That triple-sensor system isn’t foolproof, however.
In 2008, on a customer-acceptance flight of an Airbus A320, two of the angle-of-attack sensors froze and those two sensors then outvoted the third. When the pilots went to demonstrate the stall-prevention system, they were not aware of the malfunctioning sensors. The plane crashed, killing the seven people on board.
The same problem arose again on a 2014 Airbus A321 Lufthansa flight leaving Spain. Eight minutes after takeoff, two of the angle-of-attack sensors froze at the same pitch. This time, after a drop in altitude, the pilots were able to regain control and complete the flight.