Skip to content

Commit 43ab794

Browse files
2023-06-26-distilbert_embeddings_finetuned_sarcasm_classification_en (#13867)
* Add model 2023-06-26-distilbert_embeddings_finetuned_sarcasm_classification_en * Add model 2023-06-26-distilbert_embeddings_distilbert_base_indonesian_id * Add model 2023-06-26-distilbert_embeddings_BERTino_it * Add model 2023-06-26-distilbert_embeddings_distilbert_base_uncased_sparse_85_unstructured_pruneofa_en * Add model 2023-06-26-distilbert_embeddings_malaysian_distilbert_small_ms * Add model 2023-06-26-distilbert_embeddings_distilbert_fa_zwnj_base_fa * Add model 2023-06-26-distilbert_embeddings_javanese_distilbert_small_jv * Add model 2023-06-26-distilbert_embeddings_javanese_distilbert_small_imdb_jv * Add model 2023-06-26-distilbert_embeddings_indic_transformers_hi_distilbert_hi * Add model 2023-06-26-distilbert_embeddings_marathi_distilbert_mr * Add model 2023-06-26-distilbert_embeddings_indic_transformers_bn_distilbert_bn * Add model 2023-06-26-distilbert_embeddings_distilbert_base_uncased_sparse_90_unstructured_pruneofa_en * Add model 2023-06-26-deberta_embeddings_xsmall_dapt_scientific_papers_pubmed_en * Add model 2023-06-26-deberta_embeddings_spm_vie_vie * Add model 2023-06-26-deberta_embeddings_vie_small_vie * Add model 2023-06-26-deberta_embeddings_tapt_nbme_v3_base_en * Add model 2023-06-26-deberta_embeddings_erlangshen_v2_chinese_sentencepiece_zh * Add model 2023-06-26-deberta_v3_xsmall_en * Add model 2023-06-26-deberta_embeddings_mlm_test_en * Add model 2023-06-26-deberta_v3_small_en * Add model 2023-06-26-roberta_base_swiss_legal_gsw --------- Co-authored-by: ahmedlone127 <[email protected]>
1 parent d054074 commit 43ab794

21 files changed

+2910
-0
lines changed
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
layout: model
3+
title: Chinese Deberta Embeddings Cased model (from IDEA-CCNL)
4+
author: John Snow Labs
5+
name: deberta_embeddings_erlangshen_v2_chinese_sentencepiece
6+
date: 2023-06-26
7+
tags: [open_source, deberta, deberta_embeddings, debertav2formaskedlm, zh, onnx]
8+
task: Embeddings
9+
language: zh
10+
edition: Spark NLP 5.0.0
11+
spark_version: 3.0
12+
supported: true
13+
engine: onnx
14+
annotator: DeBertaEmbeddings
15+
article_header:
16+
type: cover
17+
use_language_switcher: "Python-Scala-Java"
18+
---
19+
20+
## Description
21+
22+
Pretrained DebertaV2ForMaskedLM model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `Erlangshen-DeBERTa-v2-186M-Chinese-SentencePiece` is a Chinese model originally trained by `IDEA-CCNL`.
23+
24+
## Predicted Entities
25+
26+
27+
28+
{:.btn-box}
29+
<button class="button button-orange" disabled>Live Demo</button>
30+
<button class="button button-orange" disabled>Open in Colab</button>
31+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_embeddings_erlangshen_v2_chinese_sentencepiece_zh_5.0.0_3.0_1687781761029.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
32+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_embeddings_erlangshen_v2_chinese_sentencepiece_zh_5.0.0_3.0_1687781761029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
33+
34+
## How to use
35+
36+
<div class="tabs-box" markdown="1">
37+
{% include programmingLanguageSelectScalaPythonNLU.html %}
38+
39+
```python
40+
documentAssembler = DocumentAssembler() \
41+
.setInputCol("text") \
42+
.setOutputCol("document")
43+
44+
tokenizer = Tokenizer() \
45+
.setInputCols("document") \
46+
.setOutputCol("token")
47+
48+
embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_erlangshen_v2_chinese_sentencepiece","zh") \
49+
.setInputCols(["document", "token"]) \
50+
.setOutputCol("embeddings") \
51+
.setCaseSensitive(True)
52+
53+
pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings])
54+
55+
data = spark.createDataFrame([["I love Spark-NLP"]]).toDF("text")
56+
57+
result = pipeline.fit(data).transform(data)
58+
```
59+
```scala
60+
val documentAssembler = new DocumentAssembler()
61+
.setInputCol("text")
62+
.setOutputCol("document")
63+
64+
val tokenizer = new Tokenizer()
65+
.setInputCols("document")
66+
.setOutputCol("token")
67+
68+
val embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_erlangshen_v2_chinese_sentencepiece","zh")
69+
.setInputCols(Array("document", "token"))
70+
.setOutputCol("embeddings")
71+
.setCaseSensitive(True)
72+
73+
val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings))
74+
75+
val data = Seq("I love Spark-NLP").toDS.toDF("text")
76+
77+
val result = pipeline.fit(data).transform(data)
78+
```
79+
</div>
80+
81+
{:.model-param}
82+
83+
<div class="tabs-box" markdown="1">
84+
{% include programmingLanguageSelectScalaPythonNLU.html %}
85+
```python
86+
documentAssembler = DocumentAssembler() \
87+
.setInputCol("text") \
88+
.setOutputCol("document")
89+
90+
tokenizer = Tokenizer() \
91+
.setInputCols("document") \
92+
.setOutputCol("token")
93+
94+
embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_erlangshen_v2_chinese_sentencepiece","zh") \
95+
.setInputCols(["document", "token"]) \
96+
.setOutputCol("embeddings") \
97+
.setCaseSensitive(True)
98+
99+
pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings])
100+
101+
data = spark.createDataFrame([["I love Spark-NLP"]]).toDF("text")
102+
103+
result = pipeline.fit(data).transform(data)
104+
```
105+
```scala
106+
val documentAssembler = new DocumentAssembler()
107+
.setInputCol("text")
108+
.setOutputCol("document")
109+
110+
val tokenizer = new Tokenizer()
111+
.setInputCols("document")
112+
.setOutputCol("token")
113+
114+
val embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_erlangshen_v2_chinese_sentencepiece","zh")
115+
.setInputCols(Array("document", "token"))
116+
.setOutputCol("embeddings")
117+
.setCaseSensitive(True)
118+
119+
val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings))
120+
121+
val data = Seq("I love Spark-NLP").toDS.toDF("text")
122+
123+
val result = pipeline.fit(data).transform(data)
124+
```
125+
</div>
126+
127+
{:.model-param}
128+
## Model Information
129+
130+
{:.table-model}
131+
|---|---|
132+
|Model Name:|deberta_embeddings_erlangshen_v2_chinese_sentencepiece|
133+
|Compatibility:|Spark NLP 5.0.0+|
134+
|License:|Open Source|
135+
|Edition:|Official|
136+
|Input Labels:|[sentence, token]|
137+
|Output Labels:|[embeddings]|
138+
|Language:|zh|
139+
|Size:|443.8 MB|
140+
|Case sensitive:|false|
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
layout: model
3+
title: English Deberta Embeddings model (from domenicrosati)
4+
author: John Snow Labs
5+
name: deberta_embeddings_mlm_test
6+
date: 2023-06-26
7+
tags: [deberta, open_source, deberta_embeddings, debertav2formaskedlm, en, onnx]
8+
task: Embeddings
9+
language: en
10+
edition: Spark NLP 5.0.0
11+
spark_version: 3.0
12+
supported: true
13+
engine: onnx
14+
annotator: DeBertaEmbeddings
15+
article_header:
16+
type: cover
17+
use_language_switcher: "Python-Scala-Java"
18+
---
19+
20+
## Description
21+
22+
Pretrained DebertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `deberta-mlm-test` is a English model originally trained by `domenicrosati`.
23+
24+
## Predicted Entities
25+
26+
27+
28+
{:.btn-box}
29+
<button class="button button-orange" disabled>Live Demo</button>
30+
<button class="button button-orange" disabled>Open in Colab</button>
31+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_embeddings_mlm_test_en_5.0.0_3.0_1687782209221.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
32+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_embeddings_mlm_test_en_5.0.0_3.0_1687782209221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
33+
34+
## How to use
35+
36+
<div class="tabs-box" markdown="1">
37+
{% include programmingLanguageSelectScalaPythonNLU.html %}
38+
39+
```python
40+
documentAssembler = DocumentAssembler() \
41+
.setInputCol("text") \
42+
.setOutputCol("document")
43+
44+
tokenizer = Tokenizer() \
45+
.setInputCols("document") \
46+
.setOutputCol("token")
47+
48+
embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_mlm_test","en") \
49+
.setInputCols(["document", "token"]) \
50+
.setOutputCol("embeddings") \
51+
.setCaseSensitive(True)
52+
53+
pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings])
54+
55+
data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text")
56+
57+
result = pipeline.fit(data).transform(data)
58+
```
59+
```scala
60+
val documentAssembler = new DocumentAssembler()
61+
.setInputCol("text")
62+
.setOutputCol("document")
63+
64+
val tokenizer = new Tokenizer()
65+
.setInputCols("document")
66+
.setOutputCol("token")
67+
68+
val embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_mlm_test","en")
69+
.setInputCols(Array("document", "token"))
70+
.setOutputCol("embeddings")
71+
.setCaseSensitive(true)
72+
73+
val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings))
74+
75+
val data = Seq("I love Spark NLP").toDS.toDF("text")
76+
77+
val result = pipeline.fit(data).transform(data)
78+
```
79+
</div>
80+
81+
{:.model-param}
82+
83+
<div class="tabs-box" markdown="1">
84+
{% include programmingLanguageSelectScalaPythonNLU.html %}
85+
```python
86+
documentAssembler = DocumentAssembler() \
87+
.setInputCol("text") \
88+
.setOutputCol("document")
89+
90+
tokenizer = Tokenizer() \
91+
.setInputCols("document") \
92+
.setOutputCol("token")
93+
94+
embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_mlm_test","en") \
95+
.setInputCols(["document", "token"]) \
96+
.setOutputCol("embeddings") \
97+
.setCaseSensitive(True)
98+
99+
pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings])
100+
101+
data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text")
102+
103+
result = pipeline.fit(data).transform(data)
104+
```
105+
```scala
106+
val documentAssembler = new DocumentAssembler()
107+
.setInputCol("text")
108+
.setOutputCol("document")
109+
110+
val tokenizer = new Tokenizer()
111+
.setInputCols("document")
112+
.setOutputCol("token")
113+
114+
val embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_mlm_test","en")
115+
.setInputCols(Array("document", "token"))
116+
.setOutputCol("embeddings")
117+
.setCaseSensitive(true)
118+
119+
val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings))
120+
121+
val data = Seq("I love Spark NLP").toDS.toDF("text")
122+
123+
val result = pipeline.fit(data).transform(data)
124+
```
125+
</div>
126+
127+
{:.model-param}
128+
## Model Information
129+
130+
{:.table-model}
131+
|---|---|
132+
|Model Name:|deberta_embeddings_mlm_test|
133+
|Compatibility:|Spark NLP 5.0.0+|
134+
|License:|Open Source|
135+
|Edition:|Official|
136+
|Input Labels:|[sentence, token]|
137+
|Output Labels:|[embeddings]|
138+
|Language:|en|
139+
|Size:|265.4 MB|
140+
|Case sensitive:|false|

0 commit comments

Comments
 (0)