Published: by
Brady Moon and Jay Patrikar

TrajAir: A General Aviation Trajectory Dataset

This novel dataset, TrajAir, provides recorded trajectories of multiple aircraft operating around a standard non-towered airport while also providing the weather conditions during these operations. TrajAir lends well for training, testing, and benchmarking algorithms related to trajectory prediction, including socially-aware models.


The TrajAir dataset is collected at the Pittsburgh-Butler Regional Airport (ICAO:KBTP), a single runway GA airport, 10 miles North of the city of Pittsburgh, Pennsylvania. Additional information about KBTP is available online. Aircraft entering and leaving non-towered airspace need to follow guidelines established by the FAA to ensure safety and efficiency of all participating agents. KBTP has Left Traffic patterns for both runways. These patterns are rectangular-shaped with left-handed turns relative to the direction of landing or takeoff. Figure 1 shows the traffic pattern for Runway 8 and 26 around KBTP with corresponding direction of traffic flow. Aircraft usually take-off or land into the wind and hence the nomenclature follows this sequence. When an aircraft takes-off it is on an upwind leg. A left turn puts it on a crosswind leg, followed by turns into downwind leg and base leg. The final left turn puts the aircraft on the final leg for a touch-down. FAA also establishes that an entry into the pattern should be at a 45 degree angle to the downwind leg.

Figure 1: The dataset and its collection setup at the Pittsburgh-Butler Regional Airport, a non-towered GA airport which serves as a primary location for the dataset. Lighter color indicates lower altitude. (a) A snippet of the processed dataset with aircraft trajectories showing clear lobes for traffic patterns for both runways. (b) The left traffic pattern and nomenclature for the runways at the airport. The star-marker shows the approximate location of the data collection setup at the airport. (c) The data-collection setup.

The trajectory data provided spans days from 18 Sept 2020 till 23 Apr 2021 and includes a total of 111 days of data discounting downtime, repairs, and bad weather days with no traffic. Data is collected starting at 1:00 AM local time to 11:00 PM local time.

Collection Process

The dataset uses an Automatic Dependent Surveillance-Broadcast (ADS-B) receiver placed within the airport premises to capture the trajectory data. The ADS-B In receiver receives data directly broadcasted by other aircraft with ADS-B Out. For aircraft that do not have a ADS-B Out, the Traffic Information Service-Broadcast (TIS-B) takes the position and altitude of aircraft using radar and converts that information into a format that’s compatible with ADS-B. It then broadcasts the information to our receiver. The receiver uses both the 1090 MHz and 978 MHz frequencies to listen to these broadcasts. The ADS-B uses satellite navigation to produce accurate location and timestamp for the targets which is recorded on-site using our custom setup.

We also include weather data during the data collection time period for environmental context. The weather data is obtained post-hoc using the METeorological Aerodrome Reports (METAR) strings generated by the Automated Weather Observing System (AWOS) system at KBTP. We use the Iowa State METAR repository to gather all the weather data during the trajectory collection time-frame. The raw METAR string is then appended to the raw trajectory data by matching the closest UTC timestamps.

Data Processing

The data obtained from the ADS-B receiver and the METAR strings is processed to make it suitable for training networks. The following steps are performed:

  • Removal of data points that have corrupt or no location fields.
  • Removal of duplicate data points with the same aircraft identifier and location fields.
  • Removal of data points where the altitude is more than 6000 feet and distance is more than 5 kms from one end of the runway.
  • Transforming the data to a local Cartesian coordinate frame in SI units. The origin is at the end of the runway with x-axis pointing along the runway.
  • Processing raw METAR strings to get wind velocity and direction along and across the runway in the local Cartesian frame in SI units.
  • Interpolating trajectory data every second for all agents using a cubic interpolation strategy.
  • Segmenting the data into “scenes” with at least one active aircraft in the airport vicinity. A scene starts when at least one aircraft is active or enters the threshold and ends when all aircrafts have left the vicinity or are inactive.

Data Structure

The data is grouped into five different folders. The folder titled 111_days contains the entire dataset, whereas the folders titled 7days1, 7days2, 7days3 and 7days4 contain a full week of contiguous data respectively. The contents and structure of each folder of data are shown in the example folder below:

│   └───train
│   │   │   1.txt
|   |   |   4.txt
|   |   |   ...
│   │
│   └───test
│       │   2.txt
│       │   3.txt
│       │   ...
        │   1.csv

The associated dates spanned by the processed data can be found by looking at the folder names within the raw_data folder. In the raw data, there is a CSV file for each day of data in the week. Within the processed data, the scenes are randomly separated into a train and test set with a 70/30 split.

Each scene is a text file with the following structure:

Frame # Aircraft ID x (km) y (km) z (km) windx (m/s) windy (m/s)
0 10620674 1.3407 0.0026 0.3353 0.0 0.0
1 10620674 1.3135 0.0021 0.3353 0.0 0.0
2 10620674 1.2863 0.0017 0.3353 0.0 0.0
405 10620674 -3.8946 1.5872 0.9751 0.0 0.0

The frames are at a rate of 1 Hz. The x, y, z positions are in respect to the inertial frame centered on the end of the runway with the x axis aligned with the runway. The variable windx and windy are the wind speeds in the x and y direction.

The weather data can be found in the weather_data folder which contains a CSV file containing the weather conditions for all data spanned by the dataset. The CSV variable explanations can be found on the website for the Iowa State METAR repository.


The data can be downloaded below or directly from Kilthub here where it is hosted or from the file explorer below:

Additional Info


      title={Predicting Like A Pilot: Dataset and Method to Predict Socially-Aware Aircraft Trajectories in Non-Towered Terminal Airspace},
      author={Jay Patrikar and Brady Moon and Jean Oh and Sebastian Scherer},


Jay Patrikar -

Brady Moon -


This work is supported by the U.S. Department of Energy (Grant DE-EE0008463). This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE1745016.

Thank you to the Pittsburgh-Butler Regional Airport and the employees who helped in the data collection for the past year, especially to Airport Manager Richard E. “Ike” Kelly and Maintenance Manager Chuck Ritchey.

Term of use

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.