Advanced Trino queries
This section demonstrates advanced query patterns for working with the OpenSky Trino database.
Using schema objects with SQLAlchemy
PyOpenSky provides schema classes for all Trino tables. You can use SQLAlchemy’s query builder for complex queries:
from sqlalchemy import select
from pyopensky.schema import StateVectorsData4, FlightsData4
from pyopensky.trino import Trino
trino = Trino()
# Select specific columns
query = (
select(StateVectorsData4)
.with_only_columns(
StateVectorsData4.time,
StateVectorsData4.icao24,
StateVectorsData4.lat,
StateVectorsData4.lon,
StateVectorsData4.baroaltitude,
StateVectorsData4.callsign,
)
.where(StateVectorsData4.callsign.startswith("AFR"))
.where(StateVectorsData4.hour >= "2023-01-01")
.where(StateVectorsData4.hour < "2023-01-02")
.limit(1000)
)
df = trino.query(query)
Selecting specific columns
Use with_only_columns() to reduce data transfer and improve query performance:
from pyopensky.schema import StateVectorsData4
from pyopensky.trino import Trino
trino = Trino()
# Minimal trajectory data
df = trino.history(
start="2023-01-03 16:00",
stop="2023-01-03 20:00",
icao24="400a0e",
selected_columns=(
"time",
"icao24",
"lat",
"lon",
"baroaltitude",
"velocity",
"heading",
),
)
Note
Column names can be specified as:
String:
"lat"(automatically prefixed with table name)Schema attribute:
StateVectorsData4.latQualified string:
"StateVectorsData4.lat"or"FlightsData4.estdepartureairport"
Spatial queries with bounds
Filter data by geographic bounding box:
from pyopensky.trino import Trino
trino = Trino()
# Download data within geographic bounds (west, south, east, north)
df = trino.history(
start="2023-07-01",
stop="2023-07-02",
bounds=(1.06, 43.38, 1.74, 43.85), # Toulouse area
StateVectorsData4.baroaltitude < 3048, # Below 10,000 ft
)
Spatial functions with SQLAlchemy
Use Trino’s geospatial functions for advanced filtering:
from sqlalchemy import func
from pyopensky.schema import StateVectorsData4
from pyopensky.trino import Trino
trino = Trino()
# Aircraft within 50 nautical miles of an airport
airport_lat, airport_lon = 52.308601, 4.76389 # Amsterdam Schiphol
df = trino.history(
"2023-03-01 12:00",
"2023-03-01 13:00",
func.ST_Distance(
func.to_spherical_geography(func.ST_Point(airport_lon, airport_lat)),
func.to_spherical_geography(
func.ST_Point(StateVectorsData4.lon, StateVectorsData4.lat)
),
) <= 50 * 1852, # 50 nautical miles in meters
arrival_airport="EHAM",
)
Tip
The example above finds aircraft within a circular area. This is more accurate than rectangular bounds for airport vicinity queries.
Ring-shaped area queries
Query aircraft in a specific distance range (e.g., for approach patterns):
from sqlalchemy import func
from pyopensky.schema import StateVectorsData4
from pyopensky.trino import Trino
trino = Trino()
airport_lat, airport_lon = 47.464722, 8.549167 # Zurich Airport
# Aircraft between 49 and 50 nautical miles from airport
df = trino.history(
start="2023-01-01",
stop="2023-02-01",
func.ST_Distance(
func.to_spherical_geography(func.ST_Point(airport_lon, airport_lat)),
func.to_spherical_geography(
func.ST_Point(StateVectorsData4.lon, StateVectorsData4.lat)
),
) <= 50 * 1852,
func.ST_Distance(
func.to_spherical_geography(func.ST_Point(airport_lon, airport_lat)),
func.to_spherical_geography(
func.ST_Point(StateVectorsData4.lon, StateVectorsData4.lat)
),
) > 49 * 1852,
arrival_airport="LSZH",
)
Query data from specific ADS-B receivers
Filter by sensor serial numbers to get data from specific ground stations:
from pyopensky.trino import Trino
trino = Trino()
# Data from specific sensors
df = trino.history(
start="2021-08-24 09:00",
stop="2021-08-24 09:10",
bounds=(17.8936, 59.6118, 17.9894, 59.6716),
serials=(-1408232560, -1408232534), # Specific receiver IDs
)
Note
Serial numbers can be positive or negative integers. Use a single integer or tuple of integers.
Extending temporal range for airport queries
Use time_buffer to include aircraft that departed/arrived slightly outside the time window:
from pyopensky.trino import Trino
trino = Trino()
# Get flights with 30-minute buffer before/after time window
df = trino.history(
start="2024-03-16 09:00",
stop="2024-03-16 11:00",
airport="UGTB",
time_buffer="30m", # Also accepts "1h", "45min", etc.
)
This is useful when you want complete flight trajectories that may start or end outside your time range.
Filtering with boolean expressions
Use SQLAlchemy expressions for complex filtering:
from sqlalchemy import not_, func
from pyopensky.schema import StateVectorsData4
from pyopensky.trino import Trino
trino = Trino()
# Non-commercial aircraft (negative lookahead for commercial pattern)
commercial_pattern = "^([A-Z]{3})[0-9](([0-9]{0,3})|([0-9]{0,2})([A-Z])|([0-9]?)([A-Z]{2}))\\s*$"
df = trino.history(
"2023-07-13",
"2023-07-14",
not_(func.regexp_like(StateVectorsData4.callsign, commercial_pattern)),
limit=1000,
)
Pattern matching with like()
Use .like() for SQL pattern matching with wildcards:
from sqlalchemy import select
from pyopensky.schema import StateVectorsData4, FlightsData4
from pyopensky.trino import Trino
trino = Trino()
# French registered aircraft (icao24 starting with 39)
query = (
select(StateVectorsData4)
.where(StateVectorsData4.icao24.like("39%"))
.where(StateVectorsData4.hour >= "2023-01-01")
.where(StateVectorsData4.hour < "2023-01-02")
.limit(1000)
)
df = trino.query(query)
# Multiple character wildcard patterns
query = (
select(FlightsData4)
.where(FlightsData4.callsign.like("AFR%")) # Air France flights
.where(FlightsData4.day >= "2023-01-01")
.where(FlightsData4.day < "2023-02-01")
)
df = trino.query(query)
Tip
SQL wildcards: % matches any sequence of characters, _ matches a single character.
Filtering for non-null values
Use != None to filter out rows with missing data:
from sqlalchemy import select
from pyopensky.schema import FlightsData4
from pyopensky.trino import Trino
trino = Trino()
# Only flights with known arrival airport
query = (
select(FlightsData4)
.with_only_columns(
FlightsData4.icao24,
FlightsData4.callsign,
FlightsData4.firstseen,
FlightsData4.lastseen,
FlightsData4.estdepartureairport,
FlightsData4.estarrivalairport,
)
.where(FlightsData4.callsign.like("AFR%"))
.where(FlightsData4.day >= "2023-01-01")
.where(FlightsData4.day < "2023-02-01")
.where(FlightsData4.estarrivalairport != None)
.limit(100)
)
df = trino.query(query)
Note
This is useful for filtering flights with complete metadata, especially when working with estimated airport fields that may be null.
Inspecting database schema
Discover available tables and columns:
from sqlalchemy import MetaData
from pyopensky.trino import Trino
trino = Trino()
connection = trino.connect()
# Reflect database schema
metadata = MetaData()
metadata.reflect(connection)
# List all tables
print("Available tables:")
for table_name in metadata.tables.keys():
print(f" - {table_name}")
# List columns in a specific table
print("\nColumns in state_vectors_data4:")
for column in metadata.tables["state_vectors_data4"].columns:
print(f" - {column.name}: {column.type}")
Available tables
Common tables in the OpenSky Trino database:
state_vectors_data4: Historical ADS-B state vectors (position, velocity, altitude)flights_data4: Flight metadata (departure/arrival airports, times)flights_data5: Enhanced flight data with takeoff/landing timesposition_data4: Raw position messagesvelocity_data4: Raw velocity messagesidentification_data4: Raw identification/callsign messagesrollcall_replies_data4: Mode S rollcall replies (squawk, BDS registers)adsc: ADS-C messages (for oceanic/remote areas)
See also
The Trino database - Complete API reference for the Trino class
Rebuilding trajectories - Reconstructing trajectories from raw messages