DSL iNatGet
General Principles
Scripts *.inat (or *.rb) are regular Ruby scripts with preloaded DSL methods. Users write in full Ruby, but have access to a specialized set of methods for working with iNaturalist data.
Core DSL principles:
- Laziness: no operations trigger immediate API or database access
- Composability: complex queries are built from simple ones through dataset operations
- Caching transparency: users don't manage the cache explicitly, the system decides when to update data
- Isolation: direct access to model classes and internal mechanisms is neither required nor provided
Datasets
A dataset is a lazy representation of a set of observations or taxa with an attached selection condition. The condition is not executed until data access is required.
Creating Datasets
| Method | Returns | Description |
|---|---|---|
select_observations(**query) |
Dataset |
Set of observations by condition |
select_taxa(**query) |
Dataset |
Set of taxa by condition |
select_places(*ids) |
Array<Place> |
Places by ID or slug |
select_projects(*args, **query) |
Dataset or Array<Project> |
Projects by condition or ID |
select_users(*ids) |
Array<User> |
Users by ID or login |
get_observation(id) |
Observation \ |
nil |
get_taxon(id) |
Taxon \ |
nil |
get_place(id) |
Place \ |
nil |
get_project(id) |
Project \ |
nil |
get_user(id) |
User \ |
nil |
Query Parameters for select_observations
| Parameter | Type | Description |
|---|---|---|
taxon |
Taxon, Integer, Set[Taxon], Set[Integer] | Taxon or set of taxa (including descendants) |
place |
Place, Integer, String, Set[...] | Place by object, ID or slug |
user |
User, Integer, String, Set[...] | User by object, ID or login |
project |
Project, Integer, String, Set[...] | Project by object, ID or slug |
rank |
Rank, Range[Rank], Set[Rank] | Taxon rank (species, genus, etc.) |
quality_grade |
String, Symbol, Set[...] | Quality (research, needs_id, casual) |
captive, mappable, threatened, introduced, etc. |
Boolean | Observation flags |
observed, created |
Time, Date, Range[Time] | Time ranges |
latitude, longitude |
Float, Range[Float] | Geographic coordinates |
accuracy |
Integer, Range[Integer], nil | Location accuracy in meters |
license, photo_license, sound_license |
String, Symbol, Set[...] | Licenses |
geoprivacy, taxon_geoprivacy |
String, Symbol, Set[...] | Geodata privacy |
id, observed_year, observed_month, etc. |
Integer, Set[Integer], Range[Integer] | Identifiers and time components |
iconic_taxa |
String, Symbol, Set[...] | Iconic taxa (Aves, Mammalia, etc.) |
identified, verifiable, licensed, photos, sounds, popular |
Boolean | Special flags |
Full list see in lib/inat-get/data/helpers/observations.rb.
Query Parameters for select_taxa
| Parameter | Type | Description |
|---|---|---|
id |
Integer, Set[Integer] | Taxon ID |
parent |
Taxon, Integer | Parent taxon |
is_active |
Boolean | Whether taxon is active |
rank |
Rank, Range[Rank], Set[Rank] | Rank |
The name Variable
Inside the script, the variable name is available — the task filename without extension. Used for forming output filenames:
File.open "#{name}.md", 'w' do |file|
file.puts "## Report #{name}"
end
Dataset Operations
All operations require operand compatibility: both datasets must have the same helper (i.e., belong to the same entity type — observations or taxa). Incompatible operands raise an exception.
| Operator | Semantics | Result |
|---|---|---|
ds1 + ds2 |
Union | Dataset with OR condition |
ds1 * ds2 |
Intersection | Dataset with AND condition |
ds1 - ds2 |
Difference | Dataset with AND-NOT condition |
Algebraic properties:
+is commutative and associative*is commutative and associative-is not commutative- Priority: standard Ruby (
*higher than+)
Splitting Operation
ds % field — splits a dataset into subsets by field value.
Returns a List — container of datasets, where each dataset has a key (the field value).
# Split bird observations by users
by_user = select_observations(taxon: taxon_birds) % :user
# Iterate over datasets, access key via ds.key
by_user.each do |ds|
user = ds.key
puts "#{user.login}: #{ds.count} observations"
end
Splitting works with any model fields:
- Associations (
:user,:taxon,:place) - Time components (
:observed_year,:created_month, etc.) - Other fields (
:quality_grade,:license, etc.)
Current limitation: splitting materializes the full list of keys on creation. Lazy splitting is planned for the future.
Lists
List — container for a set of datasets with homogeneous keys. Not to be confused with Ruby Array.
List Operations
| Operator | Semantics | Description |
|---|---|---|
list + other |
Union | By keys: key present if in any list; datasets for common keys are merged |
list * other |
Intersection | By keys: key present if in both lists; datasets for common keys are also merged |
list - other |
Difference | By keys: keys from other are removed with their datasets |
list.to_dataset |
Folding | Merges all list datasets into one via OR |
Homogeneous keys requirement: operations on lists with mismatched key types raise an exception.
List Methods
| Method | Description |
|---|---|
list.keys |
Array of keys |
list[key] |
Dataset by key or nil |
list.count, list.size |
Number of keys |
list.empty? |
Empty check |
| `list.filter { \ | ds\ |
| `list.filter_keys { \ | key\ |
| `list.sort { \ | ds\ |
list.sort! |
Sort by key (in-place) |
Materialization and Iteration
A dataset becomes "materialized" on first data access:
| Method | Action |
|---|---|
| `ds.each { \ | item\ |
ds.count |
Count records (SQL-optimized) |
ds.first |
First record |
ds.to_a |
Array of all records (caution with large sets!) |
On materialization:
update!is called — check if API data update is needed- Condition is translated to SQL via Sequel
- Query is executed against local database
Important: materializing one dataset does not affect others, even if they overlap by condition. Each dataset is independent.
Caching and Data Updates
Users don't manage the cache explicitly. The system automatically determines whether to access the iNaturalist API.
Automatic Behavior
- On first dataset materialization, data freshness is checked per rules in
caching.md - If data is stale — API query is executed with respect to cached previous queries
- Within a single script execution, cache is considered immutable
Explicit Methods (rarely used)
| Method | Action |
|---|---|
ds.update! |
Forces check if data update is needed. Doesn't always trigger API request — only if data is stale per caching rules |
ds.reset! |
Resets dataset materialization state. Next iteration will call update! again |
There is no way to forcefully ignore local cache. For debugging or obtaining "clean" data, use --db-reset or delete the database file.
Edge Cases and Limitations
API-Unsupported Conditions
iNaturalist API has limited filter support. In particular, negations (NOT) are not directly supported.
During condition normalization:
- Negations are maximally reduced and pushed to leaves
- Remaining negations are ignored when forming API requests
- Data is loaded in extended volume, final filtering is done in database
Example:
# All bird observations are loaded, filtered during iteration
select_observations(taxon: taxon_birds) - select_observations(user: current_user)
Critical Limitation: Unbounded Queries
A dataset with condition equivalent to "everything" (ANYTHING) cannot be materialized — an exception is raised. This protects against attempts to load the entire iNaturalist database.
Invalid constructs:
select_observations # empty query
select_observations(taxon: taxon_birds) + select_observations # OR with anything = anything
Valid empty results:
select_observations(taxon: taxon_birds) * select_observations(taxon: taxon_mammals) # NOTHING
Result is an empty dataset, no error.
Usage Examples
Example 1: Simple User Report
From share/inat-get/demo/01_user_stat.rb:
year = today.year
user = get_user 'shikhalev'
# Get observations
observations = select_observations user: user, observed: time_range(year: year), quality_grade: 'research'
by_taxon = observations % :taxon
File.open "#{name}.md", 'w' do |file|
file.puts '## Report for user ' + user.login + (user.name ? " (#{user.name})" : '')
file.puts ''
by_taxon.each do |ds|
# ds.key is a Taxon object
file.puts "+ #{ds.key.common_name} *(#{ds.key.name})* — #{ds.count} obs."
end
file.puts ''
file.puts "Total **#{observations.count}** observations"
end
Example 2: List Subtraction
From share/inat-get/demo/02_underfound.rb:
user = get_user 'shikhalev'
place = get_place 'artinskiy-gorodskoy-okrug-osm-2023-sv-ru'
all_observations = select_observations place: place, quality_grade: 'research', rank: (.. Rank.complex)
full_list = all_observations % :taxon
user_observations = select_observations place: place, quality_grade: "research", rank: (.. Rank.complex), user: user
user_list = user_observations % :taxon
others_list = full_list - user_list
others_list.sort! { |ds| -ds.count }
File.open "#{name}.md", 'w' do |file|
file.puts '## Not found by you'
file.puts ''
others_list.each do |ds|
file.puts "+ #{ds.key.common_name} *(#{ds.key.name})* — #{ds.count} obs."
end
file.puts ''
file.puts "Total **#{others_list.count}** taxa."
end
Example 3: Time-based Filtering
From share/inat-get/demo/03_newcomers.rb:
project = get_project 'bioraznoobrazie-rayonov-sverdlovskoy-oblasti'
month = today.month - 1
year = if month == 0
month = 12
today.year - 1
else
today.year
end
period = time_range year: year, month: month
observations = select_observations project: project, created: period
list = observations % :user
list.filter! { |ds| period === ds.key.created }
list.sort! { |ds| ds.key.created }
File.open "#{name}.md", 'w' do |file|
file.puts "## Newcomers of project «#{project.title}»"
file.puts "*#{period.begin.to_date} — #{period.end.to_date - 1}*"
file.puts ''
list.each do |ds|
file.puts "+ #{ds.key.login} (#{ds.key.created.to_date}) — #{ds.count} obs."
end
file.puts ''
file.puts "Total #{list.count} users"
end
Example 4: Combining Conditions
# Birds or mammals, but research grade only
birds = select_observations(taxon: get_taxon(3)) # Aves
mammals = select_observations(taxon: get_taxon(40151)) # Mammalia
research = select_observations(quality_grade: 'research')
target = (birds + mammals) * research
Example 5: Splitting and Aggregation
# Top-10 users by bird observation count in a project
project = get_project 'some-project'
taxon_birds = get_taxon(3) # Aves
by_user = select_observations(project: project, taxon: taxon_birds) % :user
# Sort by descending count
sorted = by_user.sort { |ds| -ds.count }
sorted.first(10).each do |ds|
puts "#{ds.key.login}: #{ds.count}"
end
Additional DSL Features
Time Utilities
| Method | Description |
|---|---|
today |
Current date (Date.today) |
now |
Current time (Time.now) |
time_range(...) |
Time range by various parameters |
start_time(...), finish_time(...) |
Period start and end |
Supported parameters for time_range: date, century, decade, year, quarter, season (:winter, :spring, :summer, :autumn), month, week, day.
Version
| Method | Description |
|---|---|
version |
Gem version (Gem::Version) |
version_alias |
Version codename |
version?(*requirements) |
Check against requirements |
version!(*requirements) |
Check or raise exception |
Enumerations
Rank and Iconic are directly available in DSL:
# Taxon ranks
Rank.species # species
Rank.genus # genus
Rank.family # family
# ... etc., see lib/inat-get/data/types/rank.rb
# Iconic taxa
Iconic.Aves # birds
Iconic.Mammalia # mammals
Iconic.Plantae # plants
# ... etc., see lib/inat-get/data/types/iconic.rb