Parser Configuration
Warning
- This document contains bot-generated text and has not yet been reviewed by developers!
This guide explains how to configure and customize path parsers in StarryNight to work with your own data organization.
Understanding Path Parsers
StarryNight uses a grammar-based path parsing system to extract structured metadata from file paths. This allows it to work with a variety of file organization schemes.
How Path Parsing Works
- Grammar Definition: A grammar file (
.lark
) defines the rules for interpreting file paths - Transformer: A transformer class converts the parsed structure into usable metadata
- Index Generation: The parsed metadata is stored in a structured index
The Default Parser
StarryNight includes a default parser ("vincent") that expects paths matching this pattern:
[dataset]/Source[source_id]/Batch[batch_id]/images/[plate_id]/[experiment_id]/Well[well_id]_Point[site_id]_[index]_Channel[channels]_Seq[sequence].ome.tiff
Understanding the Grammar File
The default grammar file (path_parser_vincent.lark
) defines rules for parsing file paths:
start: sep? dataset_id sep source_id sep _root_dir
_root_dir: batch_id sep (_images_root_dir | _illum_root_dir | _images_aligned_root_dir | _workspace_root_dir)
_images_root_dir: "images"i sep plate_id sep _plate_root_dir
...
Each rule identifies specific components of the path, such as dataset ID, batch ID, plate ID, etc.
Customizing the Parser
Option 1: Using CLI Parameters
When generating an index, you can specify a custom parser path:
starrynight index gen -i ./workspace/inventory/inventory.parquet \
-o ./workspace/index/ \
--parser /path/to/custom/parser.lark
Option 2: Creating a Custom Grammar File
To create a custom parser for your own file organization:
- Create a grammar file based on your file organization pattern
- Test your grammar with sample file paths
- Use it when generating the index
Example: Custom Grammar File
Here's an example grammar file for a different file organization pattern:
// Custom grammar for example_lab file organization
start: sep? project_name sep experiment_name sep plate_id sep _image_file
_image_file: well_id "_" site_id "_" channel "_" cycle_id "." extension
project_name: stringwithdashcommaspace
experiment_name: stringwithdashcommaspace
plate_id: string
well_id: (LETTER | DIGIT)~2
site_id: DIGIT~1..4
channel: stringwithdash
cycle_id: DIGIT~1..2
extension: stringwithdots
string: (LETTER | DIGIT)+
stringwithdash: (string | "-")+
stringwithdashcommaspace: ( string | "-" | "_" | "," | " " )+
stringwithdots: ( string | "." )+
DIGIT: "0".."9"
%import common.LETTER
This would parse paths like:
Advanced: Creating Custom Transformers
For even more customization, you can create your own transformer class:
- Extend the
BaseTransformer
class - Override methods for each grammar rule
- Register your transformer with the system
Example:
from starrynight.parsers.common import BaseTransformer
class MyCustomTransformer(BaseTransformer):
"""Custom transformer for my file organization."""
def __init__(self) -> None:
super().__init__()
self.channel_dict: dict[str, list[str]] = {"channel_dict": []}
def project_name(self, items) -> dict:
return {"project_name": items[0]}
def experiment_name(self, items) -> dict:
return {"experiment_name": items[0]}
# Other methods for each rule in your grammar
Best Practices
When configuring parsers:
- Start Simple: Begin with basic grammar rules and refine them
- Test Thoroughly: Validate your parser with representative file paths
- Handle Edge Cases: Consider special file naming conventions
- Document Your Schema: Document your file organization for reference
Troubleshooting
Common issues with parsers:
- Parsing Errors: Check if your file paths match your grammar rules
- Missing Metadata: Ensure your grammar extracts all needed metadata fields
- Performance Issues: Very complex grammars might be slower to parse
Next Steps
After configuring your parser:
- Generate an inventory and index with your custom parser
- Validate the index contains the expected metadata
- Proceed with your StarryNight workflow