AWS S3 Vectors
Since Camel 4.17
Both producer and consumer are supported
The AWS S3 Vectors component stores and queries vector embeddings using Amazon S3 Vectors.
Prerequisites
You need an AWS account with access to S3 Vectors. See Amazon S3 Vectors.
URI Format
aws2-s3-vectors://vectorBucketName[?options]
You can append query options to the URI:
?option1=value&option2=value&…
Configuring Options
Camel components are configured on two separate levels:
-
component level
-
endpoint level
Configuring Component Options
At the component level, you set general and shared configurations that are, then, inherited by the endpoints. It is the highest configuration level.
For example, a component may have security settings, credentials for authentication, urls for network connection and so forth.
Some components only have a few options, and others may have many. Because components typically have pre-configured defaults that are commonly used, then you may often only need to configure a few options on a component; or none at all.
You can configure components using:
-
the Component DSL.
-
in a configuration file (
application.properties,*.yamlfiles, etc). -
directly in the Java code.
Configuring Endpoint Options
You usually spend more time setting up endpoints because they have many options. These options help you customize what you want the endpoint to do. The options are also categorized into whether the endpoint is used as a consumer (from), as a producer (to), or both.
Configuring endpoints is most often done directly in the endpoint URI as path and query parameters. You can also use the Endpoint DSL and DataFormat DSL as a type safe way of configuring endpoints and data formats in Java.
A good practice when configuring options is to use Property Placeholders.
Property placeholders provide a few benefits:
-
They help prevent using hardcoded urls, port numbers, sensitive information, and other settings.
-
They allow externalizing the configuration from the code.
-
They help the code to become more flexible and reusable.
The following two sections list all the options, firstly for the component followed by the endpoint.
Component Options
The AWS S3 Vectors component supports 33 options, which are listed below.
| Name | Description | Default | Type |
|---|---|---|---|
The component configuration. | AWS2S3VectorsConfiguration | ||
The data type of the vector. Options: float32, float16. Enum values:
| float32 | String | |
The distance metric to use for similarity search. Options: cosine, euclidean, dot-product. Enum values:
| cosine | String | |
Set the need for overriding the endpoint. This option needs to be used in combination with uriEndpointOverride option. | false | boolean | |
The region in which S3 Vectors client needs to work. When using this parameter, the configuration will expect the lowercase name of the region (for example ap-east-1). Enum values:
| String | ||
The minimum similarity threshold for results. | Float | ||
The number of top similar vectors to return in a query. | 10 | Integer | |
Set the overriding uri endpoint. This option needs to be used in combination with overrideEndpoint option. | String | ||
The dimensions of the vector embeddings (default: 1536, which is the dimension for OpenAI text-embedding-3-small). | 1536 | Integer | |
The name of the vector index. | String | ||
Allows for bridging the consumer to the Camel routing Error Handler, which mean any exceptions (if possible) occurred while the Camel consumer is trying to pickup incoming messages, or the likes, will now be processed as a message and handled by the routing Error Handler. Important: This is only possible if the 3rd party component allows Camel to be alerted if an exception was thrown. Some components handle this internally only, and therefore bridgeErrorHandler is not possible. In other situations we may improve the Camel component to hook into the 3rd party component and make this possible for future releases. By default the consumer will use the org.apache.camel.spi.ExceptionHandler to deal with exceptions, that will be logged at WARN or ERROR level and ignored. | false | boolean | |
Optional metadata filter for the consumer to filter vectors during polling. | String | ||
The query vector to use for the consumer to poll for similar vectors. Specified as comma-separated float values (e.g., 0.1,0.2,0.3). If not specified, the consumer will not poll. | String | ||
Milliseconds before the next poll for the consumer. | 500 | long | |
Delete vectors after they have been consumed. | false | boolean | |
The maximum number of messages to consume per poll for the consumer. | 10 | int | |
Whether the producer should be started lazy (on the first message). By starting lazy you can use this to allow CamelContext and routes to startup in situations where a producer may otherwise fail during starting and cause the route to fail being started. By deferring this startup to be lazy then the startup failure can be handled during routing messages via Camel’s routing error handlers. Beware that when the first message is processed then creating and starting the producer may take a little time and prolong the total processing time of the processing. | false | boolean | |
The operation to perform. Enum values:
| AWS2S3VectorsOperations | ||
Whether autowiring is enabled. This is used for automatic autowiring options (the option must be marked as autowired) by looking up in the registry to find if there is a single instance of matching type, which then gets configured on the component. This can be used for automatic configuring JDBC data sources, JMS connection factories, AWS Clients, etc. | true | boolean | |
Autowired Reference to a software.amazon.awssdk.services.s3vectors.S3VectorsClient in the registry. | S3VectorsClient | ||
Used for enabling or disabling all consumer based health checks from this component. | true | boolean | |
Used for enabling or disabling all producer based health checks from this component. Notice: Camel has by default disabled all producer based health-checks. You can turn on producer checks globally by setting camel.health.producersEnabled=true. | true | boolean | |
To define a proxy host when instantiating the S3 Vectors client. | String | ||
To define a proxy port when instantiating the S3 Vectors client. | Integer | ||
To define a proxy protocol when instantiating the S3 Vectors client. Enum values:
| HTTPS | Protocol | |
Amazon AWS Access Key. | String | ||
If using a profile credentials provider, this parameter will set the profile name. | String | ||
Amazon AWS Secret Key. | String | ||
Amazon AWS Session Token used when the user needs to assume an IAM role. | String | ||
If we want to trust all certificates in case of overriding the endpoint. | false | boolean | |
Set whether the S3 Vectors client should expect to load credentials through a default credentials provider. | false | boolean | |
Set whether the S3 Vectors client should expect to load credentials through a profile credentials provider. | false | boolean | |
Set whether the S3 Vectors client should expect to use Session Credentials. This is useful in a situation in which the user needs to assume an IAM role for doing operations in S3 Vectors. | false | boolean |
Endpoint Options
The AWS S3 Vectors endpoint is configured using URI syntax:
aws2-s3-vectors://vectorBucketName
With the following path and query parameters:
Query Parameters (46 parameters)
| Name | Description | Default | Type |
|---|---|---|---|
The data type of the vector. Options: float32, float16. Enum values:
| float32 | String | |
The distance metric to use for similarity search. Options: cosine, euclidean, dot-product. Enum values:
| cosine | String | |
Set the need for overriding the endpoint. This option needs to be used in combination with uriEndpointOverride option. | false | boolean | |
The region in which S3 Vectors client needs to work. When using this parameter, the configuration will expect the lowercase name of the region (for example ap-east-1). Enum values:
| String | ||
The minimum similarity threshold for results. | Float | ||
The number of top similar vectors to return in a query. | 10 | Integer | |
Set the overriding uri endpoint. This option needs to be used in combination with overrideEndpoint option. | String | ||
The dimensions of the vector embeddings (default: 1536, which is the dimension for OpenAI text-embedding-3-small). | 1536 | Integer | |
The name of the vector index. | String | ||
Optional metadata filter for the consumer to filter vectors during polling. | String | ||
The query vector to use for the consumer to poll for similar vectors. Specified as comma-separated float values (e.g., 0.1,0.2,0.3). If not specified, the consumer will not poll. | String | ||
Milliseconds before the next poll for the consumer. | 500 | long | |
Delete vectors after they have been consumed. | false | boolean | |
The maximum number of messages to consume per poll for the consumer. | 10 | int | |
If the polling consumer did not poll any files, you can enable this option to send an empty message (no body) instead. | false | boolean | |
Allows for bridging the consumer to the Camel routing Error Handler, which mean any exceptions (if possible) occurred while the Camel consumer is trying to pickup incoming messages, or the likes, will now be processed as a message and handled by the routing Error Handler. Important: This is only possible if the 3rd party component allows Camel to be alerted if an exception was thrown. Some components handle this internally only, and therefore bridgeErrorHandler is not possible. In other situations we may improve the Camel component to hook into the 3rd party component and make this possible for future releases. By default the consumer will use the org.apache.camel.spi.ExceptionHandler to deal with exceptions, that will be logged at WARN or ERROR level and ignored. | false | boolean | |
To let the consumer use a custom ExceptionHandler. Notice if the option bridgeErrorHandler is enabled then this option is not in use. By default the consumer will deal with exceptions, that will be logged at WARN or ERROR level and ignored. | ExceptionHandler | ||
Sets the exchange pattern when the consumer creates an exchange. Enum values:
| ExchangePattern | ||
A pluggable org.apache.camel.PollingConsumerPollingStrategy allowing you to provide your custom implementation to control error handling usually occurred during the poll operation before an Exchange have been created and being routed in Camel. | PollingConsumerPollStrategy | ||
The operation to perform. Enum values:
| AWS2S3VectorsOperations | ||
Whether the producer should be started lazy (on the first message). By starting lazy you can use this to allow CamelContext and routes to startup in situations where a producer may otherwise fail during starting and cause the route to fail being started. By deferring this startup to be lazy then the startup failure can be handled during routing messages via Camel’s routing error handlers. Beware that when the first message is processed then creating and starting the producer may take a little time and prolong the total processing time of the processing. | false | boolean | |
Autowired Reference to a software.amazon.awssdk.services.s3vectors.S3VectorsClient in the registry. | S3VectorsClient | ||
To define a proxy host when instantiating the S3 Vectors client. | String | ||
To define a proxy port when instantiating the S3 Vectors client. | Integer | ||
To define a proxy protocol when instantiating the S3 Vectors client. Enum values:
| HTTPS | Protocol | |
The number of subsequent error polls (failed due some error) that should happen before the backoffMultipler should kick-in. | int | ||
The number of subsequent idle polls that should happen before the backoffMultipler should kick-in. | int | ||
To let the scheduled polling consumer backoff if there has been a number of subsequent idles/errors in a row. The multiplier is then the number of polls that will be skipped before the next actual attempt is happening again. When this option is in use then backoffIdleThreshold and/or backoffErrorThreshold must also be configured. | int | ||
If greedy is enabled, then the ScheduledPollConsumer will run immediately again, if the previous run polled 1 or more messages. | false | boolean | |
Milliseconds before the first poll starts. | 1000 | long | |
Specifies a maximum limit of number of fires. So if you set it to 1, the scheduler will only fire once. If you set it to 5, it will only fire five times. A value of zero or negative means fire forever. | 0 | long | |
The consumer logs a start/complete log line when it polls. This option allows you to configure the logging level for that. Enum values:
| TRACE | LoggingLevel | |
Allows for configuring a custom/shared thread pool to use for the consumer. By default each consumer has its own single threaded thread pool. | ScheduledExecutorService | ||
To use a cron scheduler from either camel-spring or camel-quartz component. Use value spring or quartz for built in scheduler. | none | Object | |
To configure additional properties when using a custom scheduler or any of the Quartz, Spring based scheduler. This is a multi-value option with prefix: scheduler. | Map | ||
Whether the scheduler should be auto started. | true | boolean | |
Time unit for initialDelay and delay options. Enum values:
| MILLISECONDS | TimeUnit | |
Controls if fixed delay or fixed rate is used. See ScheduledExecutorService in JDK for details. | true | boolean | |
Amazon AWS Access Key. | String | ||
If using a profile credentials provider, this parameter will set the profile name. | String | ||
Amazon AWS Secret Key. | String | ||
Amazon AWS Session Token used when the user needs to assume an IAM role. | String | ||
If we want to trust all certificates in case of overriding the endpoint. | false | boolean | |
Set whether the S3 Vectors client should expect to load credentials through a default credentials provider. | false | boolean | |
Set whether the S3 Vectors client should expect to load credentials through a profile credentials provider. | false | boolean | |
Set whether the S3 Vectors client should expect to use Session Credentials. This is useful in a situation in which the user needs to assume an IAM role for doing operations in S3 Vectors. | false | boolean |
Message Headers
The AWS S3 Vectors component supports 17 message header(s), which is/are listed below:
| Name | Description | Default | Type |
|---|---|---|---|
CamelAwsS3VectorsOperation (common) Constant: | The operation to perform. | String | |
CamelAwsS3VectorsVectorBucketName (common) Constant: | The name of the vector bucket which will be used for the current operation. | String | |
CamelAwsS3VectorsVectorIndexName (common) Constant: | The name of the vector index which will be used for the current operation. | String | |
CamelAwsS3VectorsVectorId (producer) Constant: | The unique identifier for a vector. | String | |
CamelAwsS3VectorsVectorData (producer) Constant: | The vector embedding data as a list of floats or float array. | List or float[] | |
CamelAwsS3VectorsVectorDimensions (producer) Constant: | The dimensions of the vector. | Integer | |
CamelAwsS3VectorsDataType (producer) Constant: | The data type of the vector (float32 or float16). | String | |
CamelAwsS3VectorsVectorMetadata (producer) Constant: | Additional metadata for the vector as a map. | Map | |
CamelAwsS3VectorsQueryVector (producer) Constant: | The query vector for similarity search as a list of floats or float array. | List or float[] | |
CamelAwsS3VectorsTopK (producer) Constant: | The number of top similar vectors to return. | Integer | |
CamelAwsS3VectorsDistanceMetric (producer) Constant: | The distance metric to use for similarity search (cosine, euclidean, dot-product). | String | |
CamelAwsS3VectorsSimilarityThreshold (producer) Constant: | The minimum similarity threshold for results. | Float | |
CamelAwsS3VectorsMetadataFilter (producer) Constant: | Optional filter expression for metadata filtering during vector search. | String | |
CamelAwsS3VectorsSimilarityScore (consumer) Constant: | The similarity score of the returned vector. | Float | |
CamelAwsS3VectorsResultCount (consumer) Constant: | The number of vectors returned in the result. | Integer | |
CamelAwsS3VectorsIndexStatus (consumer) Constant: | The status of the vector index. | String | |
CamelAwsS3VectorsVectorBucketArn (consumer) Constant: | The ARN of the vector bucket. | String |
Usage
Producer Operations
-
putVectors- Insert vectors -
queryVectors- Search similar vectors -
deleteVectors- Delete vectors -
getVectors- Retrieve vectors by ID -
createVectorBucket- Create bucket -
deleteVectorBucket- Delete bucket -
listVectorBuckets- List buckets -
describeVectorBucket- Get bucket info -
createVectorIndex- Create index -
deleteVectorIndex- Delete index -
listVectorIndexes- List indexes -
describeVectorIndex- Get index info
Examples
Insert Vectors
-
Java
-
YAML
from("direct:insert")
.setHeader(AWS2S3VectorsConstants.VECTOR_ID, constant("doc-001"))
.setBody(constant(Arrays.asList(0.1f, 0.2f, 0.3f)))
.to("aws2-s3-vectors://my-bucket?operation=putVectors&vectorIndexName=my-index"); - from:
uri: direct:insert
steps:
- setHeader:
name: CamelAwsS3VectorsVectorId
constant: doc-001
- setBody:
constant:
- 0.1
- 0.2
- 0.3
- to:
uri: aws2-s3-vectors://my-bucket
parameters:
operation: putVectors
vectorIndexName: my-index Query Similar Vectors
-
Java
-
YAML
from("direct:search")
.setBody(constant(Arrays.asList(0.15f, 0.25f, 0.35f)))
.setHeader(AWS2S3VectorsConstants.TOP_K, constant(5))
.to("aws2-s3-vectors://my-bucket?operation=queryVectors&vectorIndexName=my-index")
.log("Found ${body.size} similar vectors"); - from:
uri: direct:search
steps:
- setBody:
constant:
- 0.15
- 0.25
- 0.35
- setHeader:
name: CamelAwsS3VectorsTopK
constant: 5
- to:
uri: aws2-s3-vectors://my-bucket
parameters:
operation: queryVectors
vectorIndexName: my-index
- log: "Found ${body.size} similar vectors" Consumer Polling
-
Java
-
YAML
from("aws2-s3-vectors://my-bucket?"
+ "vectorIndexName=my-index"
+ "&consumerQueryVector=0.1,0.2,0.3"
+ "&delay=5000"
+ "&maxMessagesPerPoll=10")
.log("Vector ID: ${header.CamelAwsS3VectorsVectorId}")
.to("direct:process"); - from:
uri: aws2-s3-vectors://my-bucket
parameters:
vectorIndexName: my-index
consumerQueryVector: "0.1,0.2,0.3"
delay: 5000
maxMessagesPerPoll: 10
steps:
- log: "Vector ID: ${header.CamelAwsS3VectorsVectorId}"
- to: direct:process Create Index
-
Java
-
YAML
from("direct:createIndex")
.setHeader(AWS2S3VectorsConstants.VECTOR_DIMENSIONS, constant(1536))
.setHeader(AWS2S3VectorsConstants.DATA_TYPE, constant("float32"))
.setHeader(AWS2S3VectorsConstants.DISTANCE_METRIC, constant("cosine"))
.to("aws2-s3-vectors://my-bucket?operation=createVectorIndex&vectorIndexName=my-index"); - from:
uri: direct:createIndex
steps:
- setHeader:
name: CamelAwsS3VectorsVectorDimensions
constant: 1536
- setHeader:
name: CamelAwsS3VectorsDataType
constant: float32
- setHeader:
name: CamelAwsS3VectorsDistanceMetric
constant: cosine
- to:
uri: aws2-s3-vectors://my-bucket
parameters:
operation: createVectorIndex
vectorIndexName: my-index Dependencies
Maven users will need to add the following dependency to their pom.xml.
pom.xml
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-aws2-s3-vectors</artifactId>
<version>x.x.x</version>
<!-- use the same version as your Camel core version -->
</dependency> where x.x.x is the version number of the latest Camel release.
Spring Boot Auto-Configuration
When using aws2-s3-vectors with Spring Boot make sure to use the following Maven dependency to have support for auto configuration:
<dependency>
<groupId>org.apache.camel.springboot</groupId>
<artifactId>camel-aws2-s3-vectors-starter</artifactId>
<version>x.x.x</version>
<!-- use the same version as your Camel core version -->
</dependency> The component supports 34 options, which are listed below.