We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No response
I'm using BGE-m3 as a sentence embedding model on Databricks to embed a text column. I used the following instructions: https://sparknlp.org/2024/02/11/bge_m3_xx.html
I currently get an IllegalArgumentException.
IllegalArgumentException
The full error is as follows:
bge_m3 download started this may take some time. Approximate size to download 391.8 MB [OK!] IllegalArgumentException: requirement failed: Embedding dimension mismatch: expected 15, but found 1 File <command-4667109844782750>, line 6 2 from sparknlp.annotator.embeddings.sentence_embeddings import SentenceEmbeddings 3 from pyspark.ml import Pipeline 5 embeddings = ( ----> 6 XlmRoBertaSentenceEmbeddings.pretrained("bge_m3", "xx") 7 .setInputCols(["embedding_text"]) 8 .setOutputCol("embedding") 9 ) 11 pipeline = Pipeline().setStages([embeddings]) 12 pipeline_model = pipeline.fit(df) File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/sparknlp/annotator/embeddings/xlm_roberta_sentence_embeddings.py:194, in XlmRoBertaSentenceEmbeddings.pretrained(name, lang, remote_loc) 176 """Downloads and loads a pretrained model. 177 178 Parameters (...) 191 The restored model 192 """ 193 from sparknlp.pretrained import ResourceDownloader --> 194 return ResourceDownloader.downloadModel(XlmRoBertaSentenceEmbeddings, name, lang, remote_loc) File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/sparknlp/pretrained/resource_downloader.py:95, in ResourceDownloader.downloadModel(reader, name, language, remote_loc, j_dwn) 93 t1.start() 94 try: ---> 95 j_obj = _internal._DownloadModel(reader.name, name, language, remote_loc, j_dwn).apply() 96 except Py4JJavaError as e: 97 sys.stdout.write("\n" + str(e)) File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/sparknlp/internal/__init__.py:606, in _DownloadModel.__init__(self, reader, name, language, remote_loc, validator) 605 def __init__(self, reader, name, language, remote_loc, validator): --> 606 super(_DownloadModel, self).__init__( 607 "com.johnsnowlabs.nlp.pretrained." + validator + ".downloadModel", 608 reader, 609 name, 610 language, 611 remote_loc, 612 ) File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/sparknlp/internal/extended_java_wrapper.py:27, in ExtendedJavaWrapper.__init__(self, java_obj, *args) 25 super(ExtendedJavaWrapper, self).__init__(java_obj) 26 self.sc = SparkContext._active_spark_context ---> 27 self._java_obj = self.new_java_obj(java_obj, *args) 28 self.java_obj = self._java_obj File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/sparknlp/internal/extended_java_wrapper.py:37, in ExtendedJavaWrapper.new_java_obj(self, java_class, *args) 36 def new_java_obj(self, java_class, *args): ---> 37 return self._new_java_obj(java_class, *args) File /databricks/spark/python/pyspark/ml/wrapper.py:85, in JavaWrapper._new_java_obj(java_class, *args) 83 java_obj = getattr(java_obj, name) 84 java_args = [_py2java(sc, arg) for arg in args] ---> 85 return java_obj(*java_args) File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1355, in JavaMember.__call__(self, *args) 1349 command = proto.CALL_COMMAND_NAME +\ 1350 self.command_header +\ 1351 args_command +\ 1352 proto.END_COMMAND_PART 1354 answer = self.gateway_client.send_command(command) -> 1355 return_value = get_return_value( 1356 answer, self.gateway_client, self.target_id, self.name) 1358 for temp_arg in temp_args: 1359 if hasattr(temp_arg, "_detach"): File /databricks/spark/python/pyspark/errors/exceptions/captured.py:230, in capture_sql_exception.<locals>.deco(*a, **kw) 226 converted = convert_exception(e.java_exception) 227 if not isinstance(converted, UnknownException): 228 # Hide where the exception came from that shows a non-Pythonic 229 # JVM exception message. --> 230 raise converted from None 231 else: 232 raise
I expect to not get an error when loading the embedding model, as I'm following the instructions of the documentation.
You can reproduce using the following code:
from sparknlp.annotator.embeddings.xlm_roberta_sentence_embeddings import XlmRoBertaSentenceEmbeddings embeddings = ( XlmRoBertaSentenceEmbeddings.pretrained("bge_m3", "xx") .setInputCols(["embedding_text"]) .setOutputCol("embedding") )
5.5.2 '3.5.0'
Python Application
openjdk version "1.8.0_412" OpenJDK Runtime Environment (Zulu 8.78.0.19-CA-linux64) (build 1.8.0_412-b08) OpenJDK 64-Bit Server VM (Zulu 8.78.0.19-CA-linux64) (build 25.412-b08, mixed mode)
/usr/lib/jvm/zulu8-ca-amd64/jre/
I followed the installation instructions under "Install Spark NLP on Databricks" from here: https://sparknlp.org/docs/en/install
The text was updated successfully, but these errors were encountered:
maziyarpanahi
ahmedlone127
No branches or pull requests
Is there an existing issue for this?
Who can help?
No response
What are you working on?
I'm using BGE-m3 as a sentence embedding model on Databricks to embed a text column. I used the following instructions:
https://sparknlp.org/2024/02/11/bge_m3_xx.html
Current Behavior
I currently get an
IllegalArgumentException
.The full error is as follows:
Expected Behavior
I expect to not get an error when loading the embedding model, as I'm following the instructions of the documentation.
Steps To Reproduce
You can reproduce using the following code:
Spark NLP version and Apache Spark
Type of Spark Application
Python Application
Java Version
openjdk version "1.8.0_412" OpenJDK Runtime Environment (Zulu 8.78.0.19-CA-linux64) (build 1.8.0_412-b08) OpenJDK 64-Bit Server VM (Zulu 8.78.0.19-CA-linux64) (build 25.412-b08, mixed mode)
Java Home Directory
/usr/lib/jvm/zulu8-ca-amd64/jre/
Setup and installation
I followed the installation instructions under "Install Spark NLP on Databricks" from here: https://sparknlp.org/docs/en/install
Operating System and Version
No response
Link to your project (if available)
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: