Configuration

Extral uses YAML configuration files to define ETL pipelines. This section documents the complete configuration syntax and all available options.

Configuration File Structure

The configuration file is structured in the following main sections:

logging:
  level: info

processing:
  workers: 4

pipelines:
  - name: pipeline1
    source:
      # source configuration
    destination:
      # destination configuration
    tables:  # or files:
      # table/file configurations

Global Configuration

Logging Configuration

Controls the logging behavior of Extral.

logging:
  level: info  # debug, info, warning, error, critical

Options:

  • level (string, default: “info”) - Log level for the application

Processing Configuration

Controls parallel processing behavior.

processing:
  workers: 4  # Number of parallel workers

Options:

  • workers (integer, default: 4) - Number of parallel table processing workers

Pipeline Configuration

Extral supports multiple pipelines in a single configuration file. Each pipeline defines a complete ETL workflow.

Basic Pipeline Structure

pipelines:
  - name: my_pipeline
    source:
      # Source connector configuration
    destination:
      # Destination connector configuration
    workers: 2  # Optional: override global workers setting

Pipeline Options:

  • name (string, required) - Unique name for the pipeline

  • source (object, required) - Source connector configuration

  • destination (object, required) - Destination connector configuration

  • workers (integer, optional) - Override global worker count for this pipeline

Source and Destination Connectors

Database Connectors

MySQL and PostgreSQL connectors share the same configuration structure:

source:  # or destination:
  type: mysql  # or postgresql
  host: localhost
  port: 3306   # 3306 for MySQL, 5432 for PostgreSQL
  user: username
  password: password
  database: database_name
  schema: public      # PostgreSQL only, optional
  charset: utf8mb4    # MySQL only, default: utf8mb4
  tables:
    - name: table1
      # table configuration options

Database Connector Options:

  • type (string, required) - “mysql” or “postgresql”

  • host (string, required) - Database server hostname

  • port (integer, optional) - Database server port (defaults: MySQL=3306, PostgreSQL=5432)

  • user (string, required) - Database username

  • password (string, required) - Database password

  • database (string, required) - Database name

  • schema (string, optional) - Schema name (PostgreSQL only)

  • charset (string, optional) - Character set (MySQL only, default: “utf8mb4”)

  • tables (array, required) - List of table configurations

File Connectors

File connectors support CSV and JSON files from local filesystem or HTTP URLs:

source:  # or destination:
  type: file
  files:
    - name: customers_data
      format: csv  # or json
      file_path: /path/to/customers.csv
      # OR
      http_path: https://example.com/customers.csv
      options:
        delimiter: ","
        quotechar: "\""
        encoding: utf-8
      strategy: replace
      merge_key: id

File Connector Options:

  • type (string, required) - Must be “file”

  • files (array, required) - List of file configurations

File Item Options:

  • name (string, required) - Logical name for the file (like table name)

  • format (string, required) - “csv” or “json”

  • file_path (string) - Local file path (either this or http_path required)

  • http_path (string) - HTTP/HTTPS URL (either this or file_path required)

  • options (object, optional) - Format-specific options

  • strategy (string, optional) - Load strategy: “append”, “replace”, “merge” (default: “replace”)

  • merge_key (string) - Required if strategy is “merge”

  • batch_size (integer, optional) - Number of records to process per batch

Table Configuration

Tables define how individual database tables or files are processed during the ETL operation.

Basic Table Configuration

tables:
  - name: customers
    strategy: merge
    merge_key: id
    batch_size: 1000

Table Options:

  • name (string, required) - Name of the table

  • strategy (string, optional) - Load strategy: “append”, “replace”, “merge” (default: “replace”)

  • merge_key (string) - Primary key field, required if strategy is “merge”

  • batch_size (integer, optional) - Number of records to process per batch

Load Strategies

Append Strategy

Adds new records without modifying existing data:

tables:
  - name: logs
    strategy: append

Replace Strategy

Replaces all data in the destination table:

tables:
  - name: reference_data
    strategy: replace
    replace:
      how: recreate  # or truncate

Replace Options:

  • replace.how (string, optional) - “recreate” (default) drops and recreates the table, “truncate” only deletes records

Merge Strategy

Updates existing records and inserts new ones based on a merge key:

tables:
  - name: customers
    strategy: merge
    merge_key: customer_id

Merge Options:

  • merge_key (string, required) - Field used to identify existing records

Incremental Loading

Incremental loading processes only new or updated records based on a cursor field:

tables:
  - name: customers
    strategy: merge
    merge_key: id
    incremental:
      field: updated_at
      type: datetime
      initial_value: '2022-01-01T00:00:00'

Incremental Options:

  • field (string, required) - Name of the cursor field

  • type (string, required) - Data type: “datetime”, “integer”, “string”

  • initial_value (string, optional) - Starting value for first extraction

Complete Example

Here’s a complete configuration file example:

logging:
  level: info

processing:
  workers: 4

pipelines:
  - name: mysql_to_postgres
    source:
      type: mysql
      host: mysql.example.com
      port: 3306
      user: extractor
      password: secret123
      database: production
      charset: utf8mb4
      tables:
        - name: customers
          batch_size: 100
          strategy: merge
          merge_key: id
          incremental:
            field: updated_on
            type: datetime
            initial_value: '2022-01-01T00:00:00'
        - name: orders
          strategy: append
          batch_size: 500
        - name: product_categories
          strategy: replace
          replace:
            how: truncate

    destination:
      type: postgresql
      host: postgres.example.com
      port: 5432
      user: loader
      password: secret456
      database: warehouse
      schema: public

  - name: csv_to_postgres
    source:
      type: file
      files:
        - name: customer_updates
          format: csv
          file_path: /data/customer_updates.csv
          options:
            delimiter: ","
            quotechar: "\""
            encoding: utf-8
          strategy: merge
          merge_key: customer_id

    destination:
      type: postgresql
      host: postgres.example.com
      port: 5432
      user: loader
      password: secret456
      database: warehouse
      schema: staging

Legacy Configuration Format

Extral also supports a legacy single-pipeline configuration format for backward compatibility:

# Legacy format - automatically converted to pipeline format internally
source:
  type: mysql
  # ... source configuration

destination:
  type: postgresql
  # ... destination configuration

tables:
  # ... table configurations

This format is internally converted to the new pipeline format with a default pipeline name.