FlightSmart: FlightSafety and IBM Launch AI-Based Pilot Training Performance Analysis and Predictive Tool
FlightSmart applies artificial intelligence and machine learning algorithms to evaluate a pilot’s ability to perform critical tasks and maneuvers during all phases of flight. Any identified deficiencies result in a remedial training action path, personalized to the pilot, to increase proficiency.
FlightSafety believes the tool’s capability to automatically predict student performance and identify corrective action is a key differentiator, beyond any other simulator data-driven product currently on the market.
The launch customer for FlightSmart is the US Air Force Air Education and Training Command (AETC) for implementation on T6A initial and operational flight training devices.
MS&T Editor in Chief Rick Adams spoke with Matt Littrell and Bert Sawyer of FlightSafety International. Littrell is Product Director, AI and Adaptive Learning; Sawyer is Director of Strategic Management.
MS&T: Let’s start with the elevator speech.
Sawyer: What we’re ultimately trying to do with the FlightSmart product is take training to the next level in terms of taking the subjectivity out of the evaluation of student performance and bringing objectivity to that.
The simulators, of course, are giant computers in essence. And we’re able to capture a tremendous amount of data out of them. As we collect that data, we utilize the algorithms of FlightSmart to analyze and grade the performance of the student a couple of different ways.
We do it on a basic level of simply evaluating against a known standard. In most cases, that’s going to be the applicable SOPs. It could be whatever standard is appropriate for that customer.
But we’ve taken a step further. Rather than just a binary, did you pass or fail, we apply a grading scale so we can get some granularity. Did you do better than or worse than the criteria?
We also look at another element – evaluating that performance against a “gold standard.”
MS&T: What parameters are used to determine the gold standard of performance?
Littrell: When we first set up FlightSmart for a particular aircraft, we gather a number of data points from what we would term “best of the best” pilots. Generally those are instructors, could be line pilots. They fly a whole slew of runs, covering all the training tasks that we’re looking at. On average, in a simulator, we’re collecting about two thousand parameters. As we work through the data, we’re actually training the system to do the initial parsing of the data and then the human will intervene and throw out the “corner cases.” Ultimately, we end up with a baseline.
We could have the simulator fly a perfect maneuver, but there is no such thing as a perfect maneuver or training task. There’s no human that can fly a perfect scenario. We intentionally use humans so we could capture the human element.
MS&T: How does FlightSmart evaluate the student against the standards?
Littrell: We evaluate the performance of the student against the baseline in two different ways. One, we measure the physical closeness of the student plot to the baseline plot.
But then we also look at how smooth were they. We can look at the frequency and the amplitude of that plot line where they were very smooth and precise, where they were over-controlling, where they under-controlling.
Ultimately that leads us to identify what we’re calling “personas.” It could be somebody that’s timid on the controls; they’re continually late to respond with control inputs to the deviations. Or maybe they’re aggressive; they’re just manhandling that airplane constantly.
Think about the binary grading: pass or fail? Let’s say the standard is plus or minus 100 feet on altitude. They could have been bouncing off the walls between 99 feet above and 99 feet below and, per the standard, technically they passed. The standard binary grading doesn’t take into account their precision. And what we’re accomplishing with this element of the evaluation against the baseline is determining were they bouncing off the walls constantly or were they very smooth and precise?
That helps feed into the overall score, but it also helps us down the line as we get to the predictive analytics and ultimately the adaptive learning so we can tailor the remedial learning. Ie, this is the ideal path or method to train this task for this “persona,” somebody that’s aggressive on the controls. Here’s the best path or it provides that information to the instructor to help them understand and tailor the training appropriately to that persona style and really enhance overall that performance.
So we have this objective measurement of training tasks that each student would go through. We’re looking at the competencies of that student against those tasks. We have signs that the algorithms are using to decide what is the performance of the student. Is he proficient at this task or does he need some remediation?
It’s all that root cause. Evaluation of his performance comes back to some prediction of the success of that student.
MS&T: What are some of the parameters that you’re measuring to get to this best-pilot profile?
Littrell: Let’s say that we want to look at steep turns, V1 cuts and rejected takeoffs. So the (best of the best) pilots fly a whole series of those training tasks. And repetition so we can get some good data on those. And, again, throw out those corner cases. We create an average of all those.
With big data, there’s a learning element or a self-learning element to that. Over time, as we collect more data, we’ll be able to continue to refine that baseline and make it even more representative of that top pilot population.
Sawyer: We also have a shaded area that represents the 25th and the 75th percentile of that gold standard population of pilots. You can see look at that the student plotline, and if you’re within that shaded area you know you’re within that 25th and 75th percentile of the gold standard.
MS&T: Explain, please, the difference between the diagnostic and predictive analysis.
Sawyer: The diagnostic is the why did you deviate? We may be focused primarily on seven parameters and identified by the SMEs on the aircraft as being the most important ones to focus on. But rather than just limiting our vision to just those seven, once we see that there has been a deviation, the machine learning will then go look at all two thousand of those parameters. And in essence, create an error chain to determine what ultimately led to you deviating from that standard.
So let’s say you deviated on your altitude. It may recognize that two minutes and 36 seconds ago you bumped the rudder pedal, which destabilised you, which led to this, this and this and ultimately led to that deviation. And it can really identify causal factors that the instructor may not be able to get to on their own or, if they ever do it, be a fairly significant period of time for them to actually get there. And that’s one of the tremendous values – giving the instructors insights into those things that they would not necessarily otherwise have visibility into, either physically can’t see in the simulator or cannot make all the connections to determine that it is causal.
Once we’ve figured out that that root cause, why did they do it, then we can absolutely provide remediation. It may be as simple as reduce the amount of control input in this particular situation, and you’ll do better.
Take something as simple as a glide slope. You have a tolerance of plus/minus 100 feet, but a student is outside of tolerance. It doesn’t say why he did that. (The instructor) doesn’t know that that the guy was on the throttles, that he was putting too much force on the stick. And that’s where the machine learning tells the instructor, this is the reason the guy got into that situation. This is why he was outside of the tolerance.
So with all those connections, based on historical data, you should go spend 25 minutes on the flat-panel trainer practicing this specific task and you need to go read these two paragraphs in your training manual and X, Y or Z, because, based on the data, we can prove that students that do that will have a high probability of being successful on this training task based on your specific situation.
It really takes the art out of it and applies science to figuring out how to improve their performance.
The vision of FlightSmart is connecting the entire training ecosystem, whether it’s the simulator or the aircraft or mixed reality or computer-based training. Whatever it is, any type of training the students are doing, we’re connecting together.
MS&T: Do you see the potential to analyze whether a pilot is competent in a certain training task or training requirement?
Littrell: That’s a challenging one. think everybody would love to figure out. I firmly believe that we will be able to get there. It’s a matter of finding the right combination of technology and algorithms to figure out how to do it. But the fact is, those are the soft skills, not something that there’s data directly tied to, that makes a lot more challenging to do.
Sawyer: In one example, during rejected takeoff training, FlightSmart showed the instructor that the co-pilot was compensating for the pilot on the controls. The instructor couldn’t tell that; he couldn’t see that from where he sat. That’s where I believe we’re helping these instructors see more about the competency of pilots than what we have in the past.
MS&T: With FlightSmart, how does the instructor’s role change?
Littrell: Whenever I talk to instructors, the very first question is always, “Are you trying to replace me?” And the answer is no. They’re already great at what they do. Our goal is to provide them with additional tools that make them even better at what they do and more efficient.
From the beginning, our goal has been that this cannot significantly increase the burden on the instructor. One of which is the automatic identification of training tasks rather than the instructor sitting in the simulator having to select, okay, now I’m going to do have them do a steep turn. The algorithms automatically identify those within the data.
The impact on the instructors, how we see their lives changing is reviewing the results on the dashboard. Seeing the insights that are generated, the recommended remediation and taking that knowledge, combining that with their knowledge and experience, and being able to more quickly and more effectively communicate to the student how to improve their performance.
MS&T: So we’re not talking about it being a robot instructor.
Littrell: No, not noticeably. We’ve had requests from a customer to integrate a coach, especially in freeplay type environments. If a student gets into an FTD and is practicing on their own without an instructor, we are looking at adding functionality to serve as, in essence, a robotic instructor to help coach them through that training. But in a live instructor training session, it’s not designed to replace that instructor – it’s designed to provide additional information to the instructor.
You need more pilots through the system faster and with the pilot shortage that means we have an instructor shortage. That’s within the DoD and in any commercial area. This tool is there to help supplement the instructor. It makes them a learning manager where they can manage more.
Sawyer: One of the reasons we picked the DoD is they have better control over the regulatory requirements and their syllabus. And they can decide how fast the student progresses in the system.
MS&T: What’s the status of the T6A launch program?
Littrell: We’re in the implementation phase right now, which includes modifying the simulators. Those are 20-plus-year-old simulators. So there’s a little bit of work we have to do for them to enable the data capture faster.
This spring, we will have the training developed. We’ll have an initial implementation on the base with the instructors utilizing the tool in their training. Late spring, we’ll be getting some really good insights into the improvements that it’s making to their training.
MS&T: How about on the commercial side? Do you anticipate a customer launch there?
Sawyer: Internally, we have this installed on several of our devices and we’ve done some test cases using our current customer base and our instructors. We’re looking at instructor utilization aircraft, platform types, putting a strategy together on how we’re going to roll this out.
I would expect that by next year we’ll have this in learning centers and available, but it’s going to be based on whatever the customer base is willing to sign up for.
We’ve also had some very good conversations with, we’ll say, a large university that has tremendous interest in what we’re doing. Their vision is to go full-on adaptive learning and FlightSmart does a wonderful job of creating an environment that allows for that and provides the tools that are necessary to accomplish it. It’s a robust market opportunity with primary training. Even with the skill level of the students, there’s a lot of opportunity to use that data to shape their training experience and tailor it to their individual needs.
MS&T: How would you compare this with some of the other data-related products that are on the market or are announced as coming to market?
Sawyer: I would say one of the key differentials is the automatic identification of the training tasks (to be remediated). I’ve not seen any others on the market that have that capability. That’s key to reducing the burden on the instructors. They have enough on their plates already.
Another is the use of the machine learning / AI and really getting into the diagnostic and predictive capabilities. Some of (the competitor products) that I’ve seen, all they provide is an evaluation of how the student performed against the standard; that’s the extent of how far it goes. It doesn’t give any detail on why, that error chain or the root cause analysis as to exactly why they deviated. And from there, what specifically should they do in the future to ensure success? Everything that I’ve seen on the market stops. You deviated from your altitude, and then it’s up to the instructor to apply their traditional instructional techniques to deduce why that was and what they need to do differently in the future.
I would add to that, a lot of this is scalability. When you look at how we we’ve implemented these algorithms, this is capable of scaling from platform to platform.
MS&T: What does what does IBM bring to the team on this?
Littrell: They have 300 and some odd very high-level data scientists available in a pool that we can drive as needed. They have tremendous amounts of experience in adaptive learning in several different industries. (They are) one of the pioneers on data privacy, (healthcare) HIPPA and so forth. So they bring a very well-rounded experience of capability to us to help augment where we may not be as strong or they help us see outside the box. They bring in that perspective from other industries, which has been very helpful, different ways to look at the data or manipulate it to get to where we’re going.
Sawyer: We wanted to be able to scale this, and we can lean back into IBM and rapidly build these algorithms as we go from aircraft to aircraft. They give us the ability to scale, which is really very important as we roll this out into the training ecosystem.
We did a lot of analysis, as we were going to launch with the Department of Defense being the first customer. IBM is ranked third with the DoD in regard to “validated algorithms.” The definition is that it requires no human interaction from the decision tree to happen and for changes to be imposed, which is huge because that’s just what we’re doing. We want the system to be able to provide prescriptive remediation and then monitor the student throughout.
MS&T: When collecting all of this data from aircraft and simulators and so forth, sometimes the pilots or the unions get uptight. What sort of privacy controls are built into the system?
Littrell: That’s definitely been a challenge. On the commercial side, with the pilot unions especially. Thankfully, they’re already at least somewhat accustomed to it with FOQA where they’ve been collecting local data for 20-plus years.
With the Air Force, they’re not so much concerned about the data privacy and they’ve requested that we identify the data all the way down to the student or the individual level, which really is the ideal level, because now we can get a history for you as the pilot and really provide the predictive abilities in the tailored training based on your history.
Other customers don’t feel comfortable due to privacy concerns, especially when it comes to the unions. In some cases, we will identify solely to the customer level. We know that it is a pilot that flies for XY airline, but we don’t know that specific individual’s name, just a pilot with that airline.
And the final level is it’s completely anonymous; it’s pretty limited what you can do with that data.
This goes back to IBM’s experience with data privacy; they’ve provided a lot of assistance to help with our knowledge of data privacy, on how best to structure it to protect that information. And ultimately, at the end of the day, the identified data resides in the possession of the customer.
The only data that is transmitted to us outside of a development effort is purely de-identified. It’s what’s called “pseudo anonymized.” There’s a random ID that’s attached to it; we don’t have the decoder ring to identify exactly who that is. There are a number of different tools that we employ to tailor that privacy.
MS&T: Would you also anticipate aggregating some of the data, say, for a customer or an aircraft type, so that you could tell this customer that your group of pilots that go through our training have these tendencies?
Littrell: Absolutely, there’s tremendous value at that level. If we implement it at an airline, they do their own training. But really, this needs standardization. We can provide analytics or metrics on performance of that population either at an airplane level or at a total population level. You know where are they struggling, where they excel. The population is doing very well in this area, but not so well over here. Maybe we can fine tune the training program to focus less on this area and focus more on that area.
Also, the airplanes are continually evolving either through service bulletins or avionics upgrades and so forth. Historically, the industry’s not done the best job of validating training that they developed to support those changes. We can do pre- and post-implementation comparisons of student performance relating to those areas to help identify the best training that was developed to support this change.
You can look at instructor standardization. How well are the students under this instructor performing compared to this instructor and ensuring that everybody’s on a level playing field. The students with this instructor historically doing not as well as they are with these other instructors. What remediation do we provide that instructor to get them back up to the level of everybody else?