sumologic / elasticsearch-client Goto Github PK
View Code? Open in Web Editor NEWElasticsearch Client for Scala that operates against the REST Endpoint
License: Apache License 2.0
Elasticsearch Client for Scala that operates against the REST Endpoint
License: Apache License 2.0
Currently, We have one API for BulkOperation and we construct different requests based on the operation type. The api gets awkward when supply different configurations for different bulk operations, e.g. the retryOnConflictOpt, upsertOpt for update operations.
When breaking changes are allowed, we should refactor this to BulkOperation as a trait, and break the implementations down to BulkUpdateOperation,BulkCreateOperation,BulkDeleteOperation etc.
In the readme under "Install/Download," it says you download the library as:
<dependency>
<groupId>com.sumologic.elasticsearch</groupId>
<artifactId>elasticsearch-core</artifactId>
<version>6.0.0</version>
</dependency>
However, according to Maven the most recent version is 3.0.1. Will you either update the readme, publish 6.0.0 to Maven, or tell us where 6.0.0 has been published?
new test:
"Support deleting a doc that doesn't exist" in {
val delFut = restClient.deleteDocument(index, tpe, new QueryRoot(TermQuery("text7", "here7")))
Await.result(delFut, 10.seconds) // May not need Await?
}
Failure mode:
- should Support deleting a doc that doesn't exist *** FAILED ***
com.sumologic.elasticsearch.restlastic.RestlasticSearchClient$ReturnTypes$ElasticErrorResponse: ElasticsearchError(status=400): JString({"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to derive xcontent"}],"type":"parse_exception","reason":"Failed to derive xcontent"},"status":400})
at com.sumologic.elasticsearch.restlastic.RestlasticSearchClient$$anonfun$runRawEsRequest$1.apply(RestlasticSearchClient.scala:254)
at com.sumologic.elasticsearch.restlastic.RestlasticSearchClient$$anonfun$runRawEsRequest$1.apply(RestlasticSearchClient.scala:249)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:237)
at scala.util.Try$.apply(Try.scala:192)
at scala.util.Success.map(Try.scala:237)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
...
My best guess is it's this bit:
val documents = Await.result(query(index, tpe, deleteQuery, rawJsonStr = false), 10.seconds).rawSearchResponse.hits.hits.map(_._id)
bulkDelete(index, tpe, documents.map(Document(_, Map()))).map(res => RawJsonResponse(res.toString))
Looks like we fetch matching documents (0 matches) and then delete them. Some googling (elastic/elasticsearch#8595 (comment)) tells me that the Failure to derive xcontent
occurs when the body is empty - so I think that's the issue here.
I'd correct it myself, but I'm not actually sure what to do in the case documents
is empty since the return type is Future[RawJsonResponse]
I think this was a regression introduced in #126 btw. cc @CCheSumo
Delete the old version, rename the new version to the old version.
It is surprising when your passing test starts to fail when run with all tests in the file.
In RestlasticSearchClientTest the tests share state in ES and are isolated by the use of PhrasePrefixQuery on a field. It is not obvious when first working in the file.
A typical way of isolating test data in a database is to clear the data in some way before each new test run. I tried using OneInstancePerTest which would provide a clean ES for each test, but it caused a few tests to fail. Those tests can't be run on their own either. It appears that they depend the side effects of other tests to pass.
It seems worthwhile to make writing tests as easy as possible with few surprises in order to encourage outside contributors to write tests.
Hi,
We have an issue with the AWS credentials.
We are instantiating the service like in the README.md:
private val signer = new AwsRequestSigner(awsCredentials, "us-east-1", "es")
private val endpoint = new StaticEndpoint(Endpoint(elasticSearch.getHost, elasticSearch.getPort))
private val restClient = new RestlasticSearchClient(endpoint, Some(signer))
We are using the credentials provider implementation that loads credentials from the Amazon EC2 Instance Metadata Service named InstanceProfileCredentialsProvider
.
After few hours we have the following error message:
{"message":"The security token included in the request is expired"}
We should refresh the temporary credentials but it seems it's not possible with the current implementation of the client.
Do you have any advices about handling expiring tokens?
One option could be to instantiate a new instance of RestlasticSearchClient
at each request but this will instantiate a new ActorSystem()
each time. Which is a really heavy operation.
Furthermore, I do not think instantiating an actor system in the client is a good thing. As the akka doc says: An ActorSystem is a heavyweight structure that will allocate 1âŚN Threads, so create one per logical application.
One improvement could be to remove the creation of the ActorSystem
within this class and add it as a dependency as the indexExecutionCtx: ExecutionContext
or searchExecutionCtx: ExecutionContext
. So we can have only one ActorSystem
per application.
What do you think?
Some resources:
Thanks,
Arnaud.
[ERROR] Failed to execute goal on project elasticsearch-core: Could not resolve dependencies for project com.sumologic.elasticsearch:elasticsearch-core:jar:1.0.12-SNAPSHOT: Could not find artifact org.json4s:json4s-native_2.11:jar:3.4.0-SUMO in central (http://repo.maven.apache.org/maven2) -> [Help 1]
https://travis-ci.org/SumoLogic/elasticsearch-client/builds/124351424 cc @jakozaur
I am using ES 2.3 with sumologic 1.0.29, we are migrating to ES5. Can we know the tentative date for the support of ES5/ES6 any chances of releasing this month?
Hi,
Is there an example to make a request with scroll batch because when i try to retrieve all indexed data, i receive the error :
Exception in thread "main" org.scalatest.exceptions.TestFailedException: The future returned an exception of type: com.sumologic.elasticsearch.restlastic.RestlasticSearchClient$ReturnTypes$ElasticErrorResponse, with message: ElasticsearchError(status=500): JString({"error":{"root_cause":[{"type":"query_phase_execution_exception","reason":"Batch size is too large, size must be less than or equal to: [10000] but was [10029]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"materials","node":"zGodcjEaSrCXFO9_SJuvzg","reason":{"type":"query_phase_execution_exception","reason":"Batch size is too large, size must be less than or equal to: [10000] but was [10029]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting."}}],"caused_by":{"type":"query_phase_execution_exception","reason":"Batch size is too large, size must be less than or equal to: [10000] but was [10029]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting."}},"status":500}).
here my code
var resFutQuery = restClient.query(index, tpe, new QueryRoot(MatchAll))
var total = 10
whenReady(resFutQuery , timeout(Span(10, Minutes))) { res =>
total = res.rawSearchResponse.hits.total;
}
val resFutScroll = restClient.startScrollRequest(index, tpe, new QueryRoot(MatchAll, sizeOpt = Some(total)))
whenReady(resFutScroll, timeout(Span(10, Minutes))) { res =>
res._2.rawSearchResponse.hits.hits.foreach(
x => {
Process
}
)
}
Thank you in advance
Hi,
We start to use your library (we were using an internal library) for the elasticsearch 2.3 and AWS support. We are really happy so far! đ
I didn't find any DSL to create nested queries (Nested query allows to query nested objects / docs).
More information is available on the elasticsearch doc.
I will create in few minutes a pull request for the NestedQuery
support.
Please, tell me what you think!
I'll create more issues / PR for missing functionalities that we need in the following days if you are interested.
Thanks!
Arnaud.
Hi,
I didn't find any DSL to create multi match queries (The multi_match
query builds on the match query to allow multi-field queries).
More information is available on the elasticsearch doc.
I will create in few minutes a pull request for the MultiMatchQuery
support.
Please, tell me what you think!
Thanks,
Arnaud.
Can you please explain to me how I could achieve something as simple as https://<username>:<password>@<host>:<port>
with this client? Is that even possible?
I've found that runRawEsRequest
is worth a shot but the buildUri
that it uses underneath is not ready to accept <username>
and <password>
.
Thanks!
Quote @davidcarltonsumo here in the 2.3 upgrade PR #126
So I think this is fine, especially since 1.6 is such an old version.
Having said that, given that the client is pure REST, it seems like we should be able to design it to be able to support multiple different versions, and that we might need to do that. I haven't looked in detail at which classes would have to move, but it feels to me like there's probably a tractable subset of the classes that we could put in packages whose names include a version number, and then we could expose a version of RestlasticSearchClient in that directory that people could use to interact with a specific version and also have a version in its current location that refers to the current version. And then, when we add support for a new version, we could just copy the files from the previous version wholesale to the package for the new version, not trying to be clever about reducing duplication or anything.
I'm not completely sure about the details, admittedly. E.g. #121 changes the state machine, so that makes me not completely confident that we'd be able to do this in a way that limits scope. (I guess the flip side is that it's also not completely obvious to me that it would be awful to copy basically everything when upgrading versions!) And it's not like there are that many serious changes between versions (at least if this plus #121 is representative of version changes), so maybe it's overkill - maybe we could just leave in support for old interfaces as well as new interfaces while using a uniform client. And, finally, there's the test issue - presumably when testing, we would have to specify a single elasticsearch version to test against, which would make it hard to detect regressions against old versions. (Hopefully we could do it in a way to make it easy enough to manually test against old versions, ideally by just temporarily editing one number in the pom, but who knows.)
Anyways, I'm fine merging this specific one, given that there hopefully aren't too many other people on 1.6 and given that we believe that the new version should work with 1.6, it just might be a little less performant. But it feels like something where we'll want to develop a strategy at some point, possibly even for the 5.1 change?
This is very useful suggestion. We need a clear story around how to make the client better support multiple different versions. Especially, we we go for the 5.1 change.
Hi,
Is that there is a method to stop the connection from RestlasticSearchClient to Elasticsearch server?
thank you in advance
pom.xml:
<elasticsearch.version>2.3.5</elasticsearch.version>
README.md:
This project is currently targeted at Elasticsearch 1.x. Support for newer versions is planned but not yet built.
Entire README probably needs some TLC?
When we move to version 2.0, where we can introducing breaking changes, default value for "highlight" in ElasticJsonDocument should be removed. Current it is set to None.
Hi,
Thanks for this library, it works very well.
But there is one missing feature for my case that prevent me to fully adopt it: boosting query:
https://www.elastic.co/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html
I don't know if it would be difficult to implement it.
I may open a PR if I manage to take some time!
The actual DELETE request on an /index/type/id succeeds, however the creation of the RawJsonResponse throws the mapping exception.
noException should be thrownBy( Await.result(restClient.runRawEsRequest(op = "", endpoint = "/testIndex/testType/bloopId", DELETE), Duration.Inf))
Does the current client support exists query? There was an old PR for adding support to Exists query, but it's been closed without merging.
Please advise. Thanks.
Hello,
I'm doing a raw request to list indices. My code looks like this:
client.runRawEsRequest("", "/_cat/indices?bytes=m", GET)
I get this warn log: RestlasticSearchClient$ - Failure response: {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"No feature for name [indices?bytes=m]"}],"type":"illegal_argument_exception","reason":"No feature for name [indices?bytes=m]"},"status":400}
It works when I'm using directly the ES endpoint (through Postman for example).
It seems it's the query parameter ?bytes=m
which causes the error. Because this request works perfectly:
client.runRawEsRequest("", "/_cat/indices", GET)
Hi there,
I am using the elastic search client and it is quite nice, so thanks for your work.
I have a query that uses sourceFilter and to my surprise in the SearchResponse jsonStr is empty.
I found the cause
I can workaround the issue using the rawSearchResponse.
I just would like to know, if there is a reason for that line to be there.
Best regards,
Jonas.
https://www.elastic.co/guide/en/elasticsearch/reference/2.0/mapping-timestamp-field.html
Deprecated in 2.0.0-beta2.
The _timestamp field is deprecated. Instead, use a normal date field and set its value explicitly
See #79
I am creating a request in the following manner:
// Client creation function:
def createClient(hostAddress: String): RestlasticSearchClient = {
val credentialsProvider = new DefaultAWSCredentialsProviderChain()
val awsCredentials = credentialsProvider.getCredentials
val signer = new AwsRequestSigner(awsCredentials, "us-west-2", "es")
val endpoint = new StaticEndpoint(new Endpoint(hostAddress, 443))
new RestlasticSearchClient(endpoint, Some(signer))
}
val client = createClient("our-es-endpoint-12345-etc.us-west-2.es.amazonaws.com")
val response = client.count(index, tpe, query)
When I execute the count
query, I get an HTTP 403 response with the following message:
{"message":"The security token included in the request is invalid."}
I have a similar chunk of code running in Python, using the exact same credentials, endpoint and port, and it can access the Elasticsearch cluster without issue. It is using requests_aws4auth
. Is there something I've missed? It seems there is a difference between how elasticsearch-aws
and requests_aws4auth
are forming the HTTP headers.
Hello,
There appears to be a bug in your API. When giving multiple field sorts, only the last one is taken.
After debugging the issue, we found the issue in QueryDsl QueryRoot.toJson:
override def toJson: Map[String, Any] = { Map(_query -> query.toJson) ++ fromOpt.map(_from -> _) ++ sizeOpt.map(_size -> _) ++ timeout.map(t => _timeout -> s"${t}ms") ++ sort.map(_sort -> _.toJson) ++ sourceFilter.map(_source -> _) }
The sort is a Seq[Sort] rather than an Option[Seq[Sort]], so the mapping is overwriting the key when we give multiple values.
At a glance, this could be resolved by making the primary constructor of QueryRoot accept an Option[Seq[Sort]] instead of converting in the apply method.
Create a BreakingChanges.md where we track breaking changes along with the rectification for release notes.
Attempting to run a startScrollRequest against an AWS Elasticsearch 5.1 instance results in an exception with the message
org.json4s.package$MappingException: No usable value for error
Do not know how to convert JObject(List((root_cause,JArray(List(JObject(List((type,JString(illegal_argument_exception)), (reason,JString(No search type for [scan]))))))), (type,JString(illegal_argument_exception)), (reason,JString(No search type for [scan])))) into class java.lang.String
The client supports 2.3 currently and it needs some changes in order to be able to support version 6.0
Akka HTTP is the successor for spray. With the first non-experimental release of Akka HTTP, spray has reached its end-of-life. http://doc.akka.io/docs/akka-http/current/scala/http/migration-guide/migration-from-spray.html We would like to swap out spray for Akka HTTP. There was an initial effort here #111.
Although all the tests passed, there seems to be a bug in the PR that was not caught. As a result, a request with the change results in the following error
com.sumologic.elasticsearch.restlastic.RestlasticSearchClient$ReturnTypes$ElasticErrorResponse: ElasticsearchError(status=403): JString({"message":"The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.\n\nThe Canonical String for this request should have been\n'GET\n/_count\n\nhost:search-cche-metrics-es-23-crilv2kbplmz3epf4ykurg3fda.us-west-1.es.amazonaws.com\nx-amz-date:20170710T043641Z\n\nhost;x-amz-date\nbaa6846b65b050d71831bb2e4cd6e6f1593902f6d82b16a6c1f9979d14cfcd12'\n\nThe String-to-Sign should have been\n'AWS4-HMAC-SHA256\n20170710T043641Z\n20170710/us-west-1/es/aws4_request\nb8a1845acaa117e0d7db6b981f53738e749b98b02bfe0140eee14faa2229afb5'\n"})
at com.sumologic.elasticsearch.restlastic.RestlasticSearchClient$$anonfun$runRawEsRequest$1.apply(RestlasticSearchClient.scala:246)
at com.sumologic.elasticsearch.restlastic.RestlasticSearchClient$$anonfun$runRawEsRequest$1.apply(RestlasticSearchClient.scala:240)
I spent half a day on it, all the changes seem legit and I was not able to figure out what was going wrong. A revert back to spray fix the problem. Since it is blocking our upgrade, I am going to keep the spray version and raise an issue to fix the bug and revert-revert changes in #111 here (cc @rcoh @seanpquig).
Hi,
The error json message mapping from elasticsearch seems incorrect.
We have the following stacktrace when a document is missing:
org.json4s.package$MappingException: No usable value for error
Do not know how to convert JObject(List((root_cause,JArray(List(JObject(List((type,JString(document_missing_exception)), (reason,JString([engine_product][37035513]: document missing)), (shard,JString(1)), (index,JString(product))))))), (type,JString(document_missing_exception)), (reason,JString([engine_product][37035513]: document missing)), (shard,JString(1)), (index,JString(product)))) into class java.lang.String
at org.json4s.reflect.package$.fail(package.scala:93)
at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$buildCtorArg(Extraction.scala:509)
at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$14.apply(Extraction.scala:529)
at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$14.apply(Extraction.scala:529)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
for the following error message:
{
"error": {
"root_cause": [{
"type": "document_missing_exception",
"reason": "[engine_product][37035513]: document missing",
"shard": "1",
"index": "product"
}],
"type": "document_missing_exception",
"reason": "[engine_product][37035513]: document missing",
"shard": "1",
"index": "product"
},
"status": 404
}
The class ElasticErrorResponse
can not be mapped by json4s because the error is not a String but an Object.
case class ElasticErrorResponse(error: String, status: Int) extends Exception(s"ElasticsearchError(status=$status): $error")
We can reproduce the error with the following test:
package com.sumologic.elasticsearch.restlastic
import com.sumologic.elasticsearch.restlastic.RestlasticSearchClient.ReturnTypes.ElasticErrorResponse
import org.json4s._
import org.json4s.native.JsonMethods._
import org.scalatest.{Matchers, WordSpec}
class ElasticErrorResponseTest extends WordSpec with Matchers {
private implicit val formats = org.json4s.DefaultFormats
"RestlasticSearchClient" should {
"Be able to create an index and setup index setting with keyword lowercase analyzer" in {
val jsonTree = parse(errorDocumentMissing)
val errorMessage = jsonTree.extract[ElasticErrorResponse]
errorMessage should be(ElasticErrorResponse("document_missing_exception", 404))
}
}
val errorDocumentMissing = """{"error":{"root_cause":[{"type":"document_missing_exception","reason":"[engine_product][37035513]: document missing","shard":"1","index":"product"}],"type":"document_missing_exception","reason":"[engine_product][37035513]: document missing","shard":"1","index":"product"},"status":404} """
}
We can have also the following error from ES:
{"message":"The security token included in the request is expired"}
So I suppose the error json format change regarding the http status code.
Arnaud.
"Be able to create an index and setup index setting with keyword lowercase analyzer"
https://travis-ci.org/SumoLogic/elasticsearch-client/builds/161348069
Test:
"Support deleting more than 10 docs" in {
val insertFutures = (0 to 11).map(i => restClient.index(index, tpe, Document(s"doc$i", Map("text7" -> "here7"))))
val ir = Future.sequence(insertFutures)
Await.result(ir, 10.seconds)
refresh()
val delFut = restClient.deleteDocument(index, tpe, new QueryRoot(MatchAll))
Await.result(delFut, 10.seconds)
refresh()
val count = Await.result(restClient.count(index, tpe, new QueryRoot(MatchAll)), 10.seconds)
count should be (0)
}
Failure mode:
- should Support deleting more than 10 docs *** FAILED ***
2 was not equal to 0 (RestlasticSearchClientTest.scala:314)
It would be very useful.
case object delete extends OperationType {
override val jsonStr: String = "delete"
}
Hi...i keep getting the following errors...
[info] application - Connecting to Elasticsearch using 'XXXXXXXX.eu-west-1.es.amazonaws.com' on port '443' [debug] c.s.e.r.RestlasticSearchClient$ - Got Es response: 403 Forbidden [warn] c.s.e.r.RestlasticSearchClient$ - Failure response: {"message":"The security token included in the request is invalid."} [warn] c.s.e.r.RestlasticSearchClient$ - Failing request: {"query":{"term":{"_id":"123"}}}
The code is
`scala
def signer: Option[AwsRequestSigner] = {
host match {
case h if h.contains("es.amazonaws.com") =>
val credentialsProviderChain = new DefaultAWSCredentialsProviderChain()
val region = Option(Regions.getCurrentRegion).getOrElse(Region.getRegion(Regions.EU_WEST_1))
Some(new AwsRequestSigner(credentialsProviderChain, region.getName, "es"))
case _ => None
}
}
override lazy val client = {
Logger.info(s"Connecting to Elasticsearch using '$host' on port '$port'")
new RestlasticSearchClient(new StaticEndpoint(Endpoint(host, port)), signer)
}
`
https://github.com/SumoLogic/elasticsearch-client/blob/master/CHANGELOG.md
Hasn't been updated at all. We should either drop it or update it. While I'd prefer the latter, it's probably best to nuke it.
We are running into an issue right now where we are only able to run a single RestlasticSearchClient on a machine because the ActorSystem being created is always default and we are unable to pass props.
implicit val system: ActorSystem = ActorSystem()
Normally I would just override the val for this and set my own value, but unfortunately with how Scala works, it is still calling the initialization and creating the ActorSystem prior to my value being set (so 2 actor systems are created).
Is there a work around for this?
Otherwise please consider making it configurable.
Possible solutions:
Lazy Init:
implicit lazy val system: ActorSystem = ActorSystem()
or implicit parameter pass in:
class RestlasticSearchClient(endpointProvider: EndpointProvider, signer: Option[RequestSigner] = None, indexExecutionCtx: ExecutionContext = ExecutionContext.Implicits.global, searchExecutionCtx: ExecutionContext = ExecutionContext.Implicits.global)(implicit val system: ActorSystem = ActorSystem()){}
Thanks for your time.
You should include the supported elastic search version numbers to the readme file. I spent too much time on this only to find it doesn't support the version of ES I'm using.
I'm on 5.2, after writing some code and running it I'm getting Received illegal response: The server-side HTTP version is not supported
A declarative, efficient, and flexible JavaScript library for building user interfaces.
đ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. đđđ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google â¤ď¸ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.