ADS-B Workflow

This notebook outlines an example workflow and quality assurance/quality control steps for working with ADS-B flight trajectories.

Specifically:

  1. Fetching of data from the ADS-B telemetry endpoint

  2. Overview of ADS-B data fields and anatomy

  3. Extraction of individual flight instances

  4. Validation & QA/QC of flight instances

NOTE A flight instance refers to a the collection of waypoints representing a single flight trajectory (an instance of takeoff to landing).

Prerequisites

Familiarize yourself with the Contrails API ADS-B Telemetry endpoint.

Step 1a: Fetch two days of global ADS-B data (METHOD 1 - synchronous)

The following approach iteratively (synchronously) fetches 1 hour chunks of global ADS-B data from the API, and saves those data to parquet files on local disk.

This approach takes longer to run, but is easier to debug and less prone to memory or Python async management issues.

[ ]:
import os

import pandas as pd  # pip install pandas
import requests  # pip install requests

URL = "https://api.contrails.org"
API_KEY = os.environ["CONTRAILS_API_KEY"]
HEADERS = {"x-api-key": API_KEY}

times = pd.date_range("2025-02-01T00", "2025-02-02T23", freq="1h", inclusive="both", tz="UTC")

# iterate over datetime_range and fetch ADS-B data, saving it to local disk
destination_dir = "adsb_method1"
os.makedirs(destination_dir, exist_ok=True)

for time in times:
    params = {"date": time.strftime("%Y-%m-%dT%H")}

    r = requests.get(f"{URL}/v1/adsb/telemetry", params=params, headers=HEADERS, timeout=120)
    r.raise_for_status()

    # write out response content as parquet file
    with open(f"{destination_dir}/{params['date']}.pq", "wb") as f:
        f.write(r.content)

Step 1b: Fetch two days of global ADS-B data (METHOD 2 - asynchronous)

The following approach concurrently (asynchronously) fetches 1 hour chunks of global ADS-B data from the API, and saves those data to parquet files on local disk.

This approach has a faster runtime, but requires an understanding of memory management & Python asynchronous methods.

[ ]:
import asyncio
import os

import httpx  # pip install httpx
import pandas as pd  # pip install pandas

URL = "https://api.contrails.org"
API_KEY = os.environ["CONTRAILS_API_KEY"]
HEADERS = {"x-api-key": API_KEY}

times = pd.date_range("2025-02-01T00", "2025-02-02T23", freq="1h", inclusive="both", tz="UTC")

# iterate over datetime_range and fetch ADS-B data, saving it to local disk
destination_dir = "adsb_method2"
os.makedirs(destination_dir, exist_ok=True)


async def fetch_target_hour(
    semaphore: asyncio.locks.Semaphore, time: pd.Timestamp, destination_directory: str
) -> None:
    """Call the telemetry endpoint for a single time, and save the parquet file to disk."""

    params = {"date": time.strftime("%Y-%m-%dT%H")}
    async with semaphore, httpx.AsyncClient() as client:
        r = await client.get(
            f"{URL}/v1/adsb/telemetry",
            params=params,
            headers=HEADERS,
            timeout=120,
        )
        r.raise_for_status()

        # write out response content as parquet file
        with open(f"{destination_directory}/{params['date']}.pq", "wb") as f:
            f.write(r.content)


async def run_routines(semaphore: asyncio.locks.Semaphore):
    """Run the fetch_target_hour() function for each time in the times list."""
    routines = [fetch_target_hour(semaphore, time, destination_dir) for time in times]
    await asyncio.gather(*routines)


# limit the number of concurrent tasks running at any given time
max_concurrent_tasks = 6
sem_lock = asyncio.Semaphore(max_concurrent_tasks)
await run_routines(sem_lock)

Step 2 - load data

Load the parquet files (fetched from Step 1) into a pandas dataframe.

[1]:
import pandas as pd

destination_dir = "adsb_method2"

df = pd.read_parquet(destination_dir)  # load all pq files present in the directory
df
[1]:
timestamp latitude longitude collection_type altitude_baro altitude_gnss icao_address flight_id callsign tail_number flight_number aircraft_type_icao airline_iata departure_airport_icao departure_scheduled_time arrival_airport_icao arrival_scheduled_time
0 2025-02-01 00:59:59 25.676445 54.505299 terrestrial 36975 NaN 0101DE 09fbb48d-6678-4bc8-9af2-f9cc39f5a92a MSR951 SU-GEU MS951 B789 MS HECA 2025-01-31 22:20:00 ZSPD 2025-02-01 08:50:00
1 2025-02-01 00:59:59 29.590485 32.469460 terrestrial 28000 NaN 0100DB deb06dee-361d-4f55-8240-825c0dcec2fa MSR931 SU-GCM MS931 B738 MS OOMS 2025-01-31 20:10:00 HECA 2025-02-01 00:40:00
2 2025-02-01 00:59:59 35.852440 -119.786217 terrestrial 36000 NaN 0D07A8 5c0a0363-e3e5-4ceb-8ded-49be4daac430 VOI7712 XA-VOY Y47712 A320 Y4 MMLO 2025-01-31 21:38:00 KOAK 2025-02-01 01:56:00
3 2025-02-01 00:59:59 41.100597 -78.485016 terrestrial 37000 NaN 0C6061 dbd65885-f2a4-469e-b82a-c8acace589de BWA600 9Y-TTO BW600 B38M BW TTPP 2025-01-31 19:55:00 CYYZ 2025-02-01 02:00:00
4 2025-02-01 00:59:59 8.320747 -79.484901 terrestrial 11175 NaN 0C21A3 1a0404bc-26cf-409d-ab3f-e1456391e828 CMP132 HP-9801CMP CM132 B38M CM SPJC 2025-01-31 21:40:00 MPTO 2025-02-01 01:20:00
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
52280762 2025-02-02 23:00:00 35.741821 -79.977608 terrestrial 27175 NaN A8AE7F 0e33f482-7857-44ec-8f07-d7cd552e0be6 DAL2855 N659DL DL2855 B752 DL KRDU 2025-02-02 22:35:00 KATL 2025-02-03 00:12:00
52280763 2025-02-02 23:00:00 32.650223 -95.647522 terrestrial 23000 NaN AA01DE 309893aa-4117-4898-848e-cab88a433703 SKW6486 N744EV AA6486 CRJ7 OO KDFW 2025-02-02 22:35:00 KAEX 2025-02-02 23:50:00
52280764 2025-02-02 23:00:00 33.047882 -96.192169 terrestrial 19925 NaN A7E9E7 682f72f7-4258-43f8-ac5c-a43f9b0555d1 JIA5605 N609NN AA5605 CRJ9 OH KDFW 2025-02-02 22:35:00 KLIT 2025-02-02 23:51:00
52280765 2025-02-02 23:00:00 40.460953 -111.951111 terrestrial 14125 NaN A32C1E c0098d49-b0c8-46d4-9042-2e3041f013fb SKW3737 N303SY DL3737 E75L OO KSLC 2025-02-02 22:35:00 KPSP 2025-02-03 00:29:00
52280766 2025-02-02 23:00:00 29.902313 -94.515434 terrestrial 17525 NaN A0FE7A 8cb8df5d-26ea-41d4-82e0-cc5d2376e9f6 SKW5356 N163SY UA5356 E75L OO KIAH 2025-02-02 22:35:00 KPNS 2025-02-03 00:18:00

52280767 rows × 17 columns

Discussion - Anatomy of an ADS-B waypoint

The data surfaced in the ADS-B telemetry endpoint returns approximately 2 waypoints on a per-minute basis, for a given aircraft. This is a best-effort approach to provide observations at the top and bottom of a given minute’s interval.

Each ADS-B waypoint contains the following fields:

timestamp
latitude
longitude
collection_type
altitude_baro
icao_address
flight_id
callsign
tail_number
flight_number
aircraft_type_icao
airline_iata
departure_airport_icao
departure_scheduled_time
airport_arrival_icao
arrival_scheduled_time

BEWARE: in principle, all fields except latitude, longitude, altitude and timestamp (space/time) should be invariant on a per flight instance (flight_id) basis. As we will see in the data handling and QAQC section, this is often not the case… For instance, it is common to see the callsign of a flight change several minutes into the beginning of a new flight instance, etc.

Timestamp

This is the timestamp, in second precision, for the reported aircraft position. Timezone is UTC.

latitude & longitude

X,Y positional data of the aircraft at the reported timestamp.

collection_type

An indicator of whether these data were collected from terrestrial or sattelite receivers. Possible values for this field include terrestrial and satellite. Notably, records with collection_type of satellite do not have a flight_id (i.e. null value), as Spire is unable to assign a flight_id to these records. The data handling section further down discusses how to impute flight_ids for records that are missing them.

Please note that, starting Dec 2024, Spire is not providing any records with collection_type of satellite, as their satellite constelation was decomissioned earlier than expected given unusually high solar wear’n’tear. They do plan to gradually reimplement satellite data coverage, but volume and exact dates for those data is not clear.

altitude_baro

Barometric altitude of the aircraft, in feet above MSL, at the reported timestamp. Barometric altitude is imputed from the pressure measured by an aircraft’s static port and is equal to geometric altitude only under specific atmospheric conditions.

altitude_gnss

Geometric altitude of the aircraft, derived from GNSS satellite data, at the reported timestamp. Geometric altitude is not reported by all flights, and missing values are filled with NaN.

icao_address

The icao identifier of the aircraft. This is unique to the aircraft, and, for commercial jets, is typically the identifier of the aircraft’s S-type transponder. This is considered to be the most robust unique identifier for grouping all data belonging to a specific aircraft.

flight_id

The UUID generated by Spire as a best-effort to uniquely identify a flight instance. All waypoints grouped on a flight_id should represent the trajectory of a single flight instance. It does happen, from time to time, that the flight_id is erroneous. This is addressed in the data QA/QC section further down. As mentioned above, the flight_id is generally non-null, except for waypoints that are satellite observations.

callsign

The reported call sign for the aircraft on for given flight instance.

tail_number

The reported tail number of the aircraft.

flight_number

The reported flight number of the aircraft for a given flight instance.

aircraft_type_icao

The type of aircraft (ICAO designator).

airline_iata

The airline IATA designator, for flights belonging to a registered airline (this is null for general aviation).

departure_airport_icao & arrival_airport_icao

The departure or arrival airport (ICAO designator) for a given flight instance.

departure_scheduled_time & arrival_scheduled_time

The departure or arrival scheduled time for a given flight instance. Timezone is UTC.

Step 3 - Data Workflow

The following workflow demonstrates some common data manipulations applied to these data.

Step 3a - Filtering

Let’s filter our data to make subsequent data mongering more manageable, effectively minifying our dataset to only those data of interest for our hypothetical workflow.

Suppose our workflow only concerns itself with flights from American Airlines (airline_iata: AA) flights.

[2]:
filt = df["airline_iata"] == "AA"

# apply filter and only retain American Airlines waypoints
df = df[filt]

Step 3b - Flight ID imputation

Next, we’ll impute any missing flight_id values. As noted above, this is typically encountered for records with collection_type of satellite where Spire is unable to provide a flight_id match.

The imputation logic applies a simple heuristic. First, we sort by icao_address (first) and timestamp (second). This gives us the temporally continuous lineage of positions for each aircraft (takeoff, flight, landing, takeoff, flight, landing, etc…). We then simply forward-fill flight_id for any missing data. This assumes that a given flight instance will always have a non-null flight_id at the beginning of the flight, which is a good assumption, given that terrestrial ADS-B observations are almost always available for an aircraft taking off from an airport.

[3]:
df = df.sort_values(["icao_address", "timestamp"], ascending=True)
df["flight_id"] = df["flight_id"].ffill()

Step 3c - Janitorial work

Also, let’s perform some janitorial work & remove any fragmented flight instances. Because we pulled an arbitrary time range of data, we expect that there are fragmented flights on the head and tailend of our dataset (flights that began before our time window start, or, flights that started but didn’t end before our time window end).

[4]:
# to identify flight fragments, we will find instances where the first or last waypoint in a flight have an altitude above a threshold
# any waypoints with those flagged flight_ids will be removed from the dataset

# min altitude for takeoff or landing
# highest airport in world is ~14.5k ft
alt_threshold_ft = 15_000

df = df.sort_values(["flight_id", "timestamp"], ascending=True)
[5]:
def flight_start_end_above_alt(df: pd.DataFrame) -> bool:
    """Check if the first or last waypoint in a flight is above a threshold.

    Assumes dataframe is temporally sorted.
    """
    start_above_thres = df["altitude_baro"].iloc[0] > alt_threshold_ft
    end_above_thres = df["altitude_baro"].iloc[-1] > alt_threshold_ft
    return start_above_thres or end_above_thres


pre_filter_n_waypoints = len(df)
df = df.groupby("flight_id").filter(lambda x: not flight_start_end_above_alt(x))
post_filter_n_waypoints = len(df)
print(f"Dropped {pre_filter_n_waypoints - post_filter_n_waypoints} waypoints")
Dropped 160524 waypoints

Step 3d - Flight Instance Manipulation (Resample, Heal, Validate)

As noted in the Discussion - Anatomy of an ADS-B waypoint section, waypoints belonging to a flight_id should represent a well-formed flight instance. Some flight instances, however, may violate one or more rules representing a well-formed flight instance.

The validation handler in Pycontrails is a utility that encapsulates a ruleset for evaluating a flight instance. The validation handler consumes a single flight instance, and, if a violation occurs, will return one or more Exceptions representing the nature of the violation(s).

SchemaError

The flight instance has waypoints with malformed or missing fields. See the SCHEMA class variable of the ValidateTrajectoryHandler (Reference)

OrderingError

The flight instance dataframe is not ordered/sorted correctly (descending timestamps).

FlightDuplicateTimestamps

The flight instance dataframe has one or more rows with duplicate timestamps.

FlightInvariantFieldViolation

The flight instance has one or more fields (columns) whose values are not invariant. A valid trajectory assumes that the values for the following fields do not vary for a given flight instance (Reference):

[
 "icao_address",
 "flight_id",
 "callsign",
 "tail_number",
 "aircraft_type_icao",
 "airline_iata",
 "departure_airport_icao",
 "departure_scheduled_time",
 "arrival_airport_icao",
 "arrival_scheduled_time",
]

OriginAirportError

The flight instance’s first waypoint is too far from the origin airport’s location.

DestinationAirportError

The flight instance’s first waypoint is too far from the origin airport’s location.

FlightTooShortError

The flight instance is too short (in time duration).

FlightTooLongError

The flight instance is too long (in time duration).

FlightTooSlowError

The flight instance has periods where the speed is too slow. This can either be instantaneous speed (speed imputed between consecutive waypoints) or the rolling average speed (the rolling avg speed threshold and windows width is defined in the handler’s class vars).

FlightTooLongError

Same as FlightTooSlowError accept with upper speed thresholds.

ROCDError

The rate of climb or rate of descent of the aircraft (calculated between consecutive waypoints) is above a threshold.

FlightAltitudeProfileError

The flight instance drops below an altitude (floor) during the flight.

Cleanup/Heal Trajectories

Before we scan the trajectories with the ValidateTrajectoryHandler, we’ll first apply some common cleanup/fixes to flight trajectories.

The heuristics applied here will eventually be encapsulated in a Handler available via the pycontrails package

[6]:
def key_max_value_count(df, column_name):
    """If multiple unique values exist in a column, return the value with the highest count.

    Note that null values are not considered in the stack rank.
    """
    counts = df[column_name].value_counts()
    return counts.index[0] if not counts.empty else None


def get_priority_map(df: pd.DataFrame, cols: list) -> dict:
    """Return a mapping of the column name to the value of highest count in the column.

    Parameters
    ----------
    df
        A pandas dataframe
    cols
        Names of columns for evaluation. e.g. col=["callsign", "airline_iata"]

    Returns
    -------
    A dict with mapping of cols to value of highest count
    e.g. ``{"callsign": None, "airline_iata": AA}``
    """

    ret = {}
    for col in cols:
        prio_val = key_max_value_count(df, col)
        ret[col] = prio_val
    return ret


def dataframe_convert_types(df: pd.DataFrame) -> pd.DataFrame:
    """Attempt to convert types for each dataframe column to expected type.

    Implicitly also checks for existence of expected columns.
    """
    cols = {
        "icao_address": str,
        "flight_id": str,
        "callsign": str,
        "tail_number": str,
        "flight_number": str,
        "aircraft_type_icao": str,
        "airline_iata": str,
        "departure_airport_icao": str,
        "departure_scheduled_time": "datetime64[ns]",
        "arrival_airport_icao": str,
        "arrival_scheduled_time": "datetime64[ns]",
        "timestamp": "datetime64[ns]",
        "latitude": float,
        "longitude": float,
        "collection_type": str,
        "altitude_baro": int,
    }
    df = df.astype(cols)
    df["ingestion_time"] = pd.NaT
    return df


def drop_duplicate_ts(df: pd.DataFrame) -> pd.DataFrame:
    """Drop rows with duplicate timestamp fields."""
    return df.drop_duplicates(subset="timestamp", keep="first")


def heal(df: pd.DataFrame) -> pd.DataFrame:
    """
    Manipulate trajectories with qaqc heuristics.

    Returns
    -------
    Dataset mirroring initiated dataset, with manipulations applied.
    """

    try:
        df = dataframe_convert_types(df)
    except KeyError as e:
        raise KeyError("flight trajectory dataframe is missing an expected column.") from e

    df = df.replace("nan", None)

    # drop dupes
    df = drop_duplicate_ts(df)

    # --------------
    # update dataset so the following target keys are uniform/distinct for a given flight
    # --------------
    target_cols = [
        "callsign",
        "flight_number",
        "arrival_airport_icao",
        "departure_airport_icao",
        "airline_iata",
    ]

    priority_values = get_priority_map(df, target_cols)

    # fill any null values with our priority values
    for col, val in priority_values.items():
        if val:
            df[col] = df[col].fillna(val)

    # drop any rows where our column values don't match the priority value
    for col, val in priority_values.items():
        if val:
            keep_filter = df[col] == val
            df = df[keep_filter]

    df = df.sort_values(by="timestamp", ascending=True).reset_index(drop=True)
    if df.empty:
        raise ValueError("flight trajectory is empty.")
    return df


# scan all our flight instances and apply the "heal" heuristics
# let's grab the first 100 flights from American Airlines to work with in the rest of the notebook
flight_instance_grps = df.groupby("flight_id")
healed_flights: list[pd.DataFrame] = []
for _, flight_df in flight_instance_grps:
    airline_iata = flight_df["airline_iata"]
    if airline_iata.iloc[0] == "AA":
        healed_flights.append(heal(flight_df))
    if len(healed_flights) == 100:
        break

Resample flight trajectories

The folowing workflow demonstrates how to use the pycontrails resample_and_fill method to resample a flight trajectory to a 1minute interval of flight segments. This method also applies some cleanup and manipulation heuristics (for example, climb/descents aren’t linearly interpolated between waypoints, rather a ramp function is applied, etc…).

Ref: https://github.com/contrailcirrus/pycontrails/blob/da099eed6e65eca4d3a04a809fb33845ea83e646/pycontrails/core/flight.py#L801

Note that there is some busy-work to marshal the flight dataframe into and out of a pycontrails Flight object.

[7]:
import warnings

from pycontrails import Flight

warnings.filterwarnings("ignore")

healed_resampled_flights: list[pd.DataFrame] = []
for flight_df in healed_flights:
    df_tmp = flight_df.rename(columns={"altitude_baro": "altitude_ft", "timestamp": "time"})
    df_resampled = Flight(df_tmp).resample_and_fill().dataframe
    if df_resampled.empty:
        continue

    # Recompute the altitude_ft (pycontrails Flight.resample_and_fill returns altitude [m])
    df_resampled["altitude_baro"] = df_resampled.pop("altitude").multiply(3.28).astype(int)
    df_resampled = df_resampled.rename(columns={"time": "timestamp"})

    # resample_and_fill drops all of our metadata columns (everything accept lat, lon, time, altitude)
    # so we need to patch that back in
    # note that our healing process in the previous step guarantees that all of those metadata fields are invariant in value
    # thus we're safe to grab any row's value from the input df and patch into the output df

    # collection_type and ingestion_time are exceptions. There is no handling of those fields when interpolating between records
    # here, we just mask the value with null, to preserve schema

    col_patches = [
        "icao_address",
        "flight_id",
        "callsign",
        "tail_number",
        "flight_number",
        "aircraft_type_icao",
        "airline_iata",
        "departure_airport_icao",
        "departure_scheduled_time",
        "arrival_airport_icao",
        "arrival_scheduled_time",
        "collection_type",
    ]
    for col in col_patches:
        df_resampled[col] = flight_df[col].iloc[0]

    df_resampled["ingestion_time"] = pd.NaT
    df_resampled["collection_type"] = None

    df_resampled = df_resampled.sort_values("timestamp", ascending=True)
    healed_resampled_flights.append(df_resampled)

healed_resampled_flights[0]
[7]:
longitude latitude timestamp altitude_baro icao_address flight_id callsign tail_number flight_number aircraft_type_icao airline_iata departure_airport_icao departure_scheduled_time arrival_airport_icao arrival_scheduled_time collection_type ingestion_time
0 -107.055359 39.577457 2025-02-02 00:10:00 13971 AC7581 0007726f-28fc-4add-ade5-fae5d07542bf AAL2165 N9012 AA2165 A319 AA KEGE 2025-02-02 00:02:00 KDFW 2025-02-02 02:20:00 None NaT
1 -107.109846 39.625671 2025-02-02 00:11:00 17280 AC7581 0007726f-28fc-4add-ade5-fae5d07542bf AAL2165 N9012 AA2165 A319 AA KEGE 2025-02-02 00:02:00 KDFW 2025-02-02 02:20:00 None NaT
2 -107.094761 39.698813 2025-02-02 00:12:00 19884 AC7581 0007726f-28fc-4add-ade5-fae5d07542bf AAL2165 N9012 AA2165 A319 AA KEGE 2025-02-02 00:02:00 KDFW 2025-02-02 02:20:00 None NaT
3 -106.996112 39.767361 2025-02-02 00:13:00 21257 AC7581 0007726f-28fc-4add-ade5-fae5d07542bf AAL2165 N9012 AA2165 A319 AA KEGE 2025-02-02 00:02:00 KDFW 2025-02-02 02:20:00 None NaT
4 -106.856049 39.822471 2025-02-02 00:14:00 22874 AC7581 0007726f-28fc-4add-ade5-fae5d07542bf AAL2165 N9012 AA2165 A319 AA KEGE 2025-02-02 00:02:00 KDFW 2025-02-02 02:20:00 None NaT
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
90 -97.067077 32.896418 2025-02-02 01:40:00 774 AC7581 0007726f-28fc-4add-ade5-fae5d07542bf AAL2165 N9012 AA2165 A319 AA KEGE 2025-02-02 00:02:00 KDFW 2025-02-02 02:20:00 None NaT
91 -97.063822 32.894102 2025-02-02 01:41:00 774 AC7581 0007726f-28fc-4add-ade5-fae5d07542bf AAL2165 N9012 AA2165 A319 AA KEGE 2025-02-02 00:02:00 KDFW 2025-02-02 02:20:00 None NaT
92 -97.060568 32.891786 2025-02-02 01:42:00 774 AC7581 0007726f-28fc-4add-ade5-fae5d07542bf AAL2165 N9012 AA2165 A319 AA KEGE 2025-02-02 00:02:00 KDFW 2025-02-02 02:20:00 None NaT
93 -97.057313 32.889470 2025-02-02 01:43:00 774 AC7581 0007726f-28fc-4add-ade5-fae5d07542bf AAL2165 N9012 AA2165 A319 AA KEGE 2025-02-02 00:02:00 KDFW 2025-02-02 02:20:00 None NaT
94 -97.054059 32.887154 2025-02-02 01:44:00 774 AC7581 0007726f-28fc-4add-ade5-fae5d07542bf AAL2165 N9012 AA2165 A319 AA KEGE 2025-02-02 00:02:00 KDFW 2025-02-02 02:20:00 None NaT

95 rows × 17 columns

Run the ValidateTrajectoryHandler

⚠️ Use version of pycontrails 0.54.8 or later

The following workflow demonstrates how to run the validation handler against flight instances.

We iterate over all the flight instances (groups of waypoints on a per flight_id basis), and for each flight instance, we ask whether or not the flight validates any rules in the ValidateTrajectoryHandler ruleset. We segregate flight instances that have no violations, and we quarantine flight instances that violate one or more rules.

Monkeypatching Custom Thresholds

If you want to modify the thresholds used in the ruleset, you can monkey patch the class vars of the handler. See example below.

Ref: https://github.com/contrailcirrus/pycontrails/blob/da099eed6e65eca4d3a04a809fb33845ea83e646/pycontrails/datalib/spire/spire.py#L78

[8]:
from pycontrails.datalib.spire import ValidateTrajectoryHandler

# iterate through our list of flight instance dataframes, and run the validation handler against each flight
# the validation handler will return a list of exception objects if a flight violates one or more rules
# (See the description of exceptions in Step 3c)

good_flights: list[pd.DataFrame] = []
bad_flights: list[tuple[pd.DataFrame, list[Exception]]] = []

vh = ValidateTrajectoryHandler()
vh.AIRPORT_DISTANCE_THRESHOLD_KM = 500.0  # relax threshold distance for origin/destination airports

for flight_df in healed_resampled_flights:
    vh.set(flight_df)
    violations = vh.evaluate()
    augmented_flight_df = vh.validation_df
    vh.unset()

    if violations:
        bad_flights.append((augmented_flight_df, violations))
    else:
        good_flights.append(flight_df)

print(
    f"Found {len(good_flights)} flights with no violations.\n"
    f"Found {len(bad_flights)} flights with one or more violations."
)
Found 90 flights with no violations.
Found 8 flights with one or more violations.

Useful Helper: The “Validation Dataframe object”

Notice that in the above code block, we access the validation_df property of the ValidateTrajectoryHandler instance. This is a dataframe that includes all of the input dataframe values AND additional columns for the imputed fields used in evaluating the flight.

These additional columns can be very helpful in assessing a flight instance that has one or more violations. This valudation_df can also be a bonus/helpful utility for gaining access to these additional values, even for totally valid flight instances.

The additional values (columns) added to the validation_df property dataframe by the ValidateTrajectoryHandler include:

  • elapsed_seconds (elapsed time between consecutive waypoints)

  • elapsed_distance_m (elapsed distance between consecutive waypoints)

  • ground_speed_m_s (speed between consecutive waypoints)

  • rocd_fps (rate of climb/descent between consecutive waypoints)

  • departure_airport_lon (longitude of the departure airport)

  • departure_airport_lat (latitude of the departure airport)

  • departure_airport_alt_ft (altitude of the departure airport)

  • arrival_airport_lon (longitude of the arrival airport)

  • arrival_airport_lat (latitude of the arrival airport)

  • arrival_airport_alt_ft (altitude of the arrival airport)

  • departure_airport_dist_m (distance from the current wayporiunt to the departure airport)

  • arrival_airport_dist_m (distance from the current waypoint to the arrival airport)

Bad flights? What’s next…

The ValidateTrajectoryHandler applies a strict and opinionated set of rules to a flight instance. It is up to you, the implementer, to determine how you’d like to handle violations to the rules.

Some options include:

  • monkey patching the handler with less strict thresholds (see example above where we relax the ROCD threshold)

  • ignore certain violations, but honor others (this can be done by scanning and selectively handling the list of violations returned by the handler)

  • selectively ignore certain violations, for instance, ignore a FlightTooFastError if there are fewer than 5 instances of the flight going too fast (this can be achieved by parsing the exception response message in a FlightTooFastError violation)

  • selectivey heal certain violations, for instance, interpolate the first waypoint in a trajectory to the departure airport, for flights with an OriginAirportError violation.

Bad flight – case study

From above, we see that our second flight w/ flight_id of 01fe97aa-173c-4041-906e-f1e9193502dd has a FlightTooSlowError with message of "Found 4 instances where rolling average speed is below threshold of 100.0 m/s (rolling window of 30.0 minutes). max value: 99.56874690879854, min value: 83.59578017138627", indicating that at 4 points in the trajectory, the rolling average speed dropped below the validation handler’s threshold.

[9]:
import matplotlib.pyplot as plt

df = bad_flights[2][0]
df.loc[:, "total_elapsed_sec"] = df["elapsed_seconds"].cumsum()
fig, ax1 = plt.subplots(1, 1, figsize=(10, 8))
ax1.plot(df["total_elapsed_sec"], df["altitude_baro"])
ax1.set_ylabel("altitude [ft]")
ax1.set_xlabel("total elapsed time [sec]")

ax2 = ax1.twinx()
ax2.plot(df["total_elapsed_sec"], df["ground_speed_m_s"].abs(), color="r")
ax2.set_ylabel("ground speed (m/sec)", color="r")
plt.show()
../_images/notebooks_adsb_workflow_26_0.png
[10]:
# plot flight trajectory -- visually inspect for anomalies
import cartopy.crs as ccrs

ax = plt.axes(projection=ccrs.PlateCarree())
ax.stock_img()
cm = plt.cm.get_cmap("RdYlBu")

plt.scatter(
    df["longitude"],
    df["latitude"],
    marker=".",
    c=df["total_elapsed_sec"],
    cmap=cm,
    transform=ccrs.Geodetic(),
)

plt.scatter(
    df["departure_airport_lon"].iloc[0],
    df["departure_airport_lat"].iloc[0],
    marker="x",
    color="black",
    s=100,
)
plt.scatter(
    df["arrival_airport_lon"].iloc[0],
    df["arrival_airport_lat"].iloc[0],
    marker="+",
    color="black",
    s=100,
)

ax.set_xlim([df["longitude"].min() - 3, df["longitude"].max() + 3])
ax.set_ylim([df["latitude"].min() - 3, df["latitude"].max() + 3])
plt.show()
../_images/notebooks_adsb_workflow_27_0.png

Discussion

An initial inspection of this flight suggests that it is a valid flight, and the low speed observed in the last ~ hour of flight time does not suggest an issue with the flight. Decreasing the default AVG_LOW_GROUND_SPEED_THRESHOLD_MPS. Increasing the rolling average window width (AVG_LOW_GROUND_SPEED_ROLLING_WINDOW_PERIOD_MIN) may also smooth out and avoid triggering on shorter periods of low ground speed.

In general, this type of threshold parameter tuning should be caried out in the context of the application use case, and judgements based on acceptable false positive and false negative rates.