chentong00 commited on
Commit
1495ad8
1 Parent(s): 1ac5f43

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md CHANGED
@@ -1,3 +1,76 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+
6
+ This is the proposition segmentation model from ["Dense X Retrieval: What Retrieval Granularity Should We Use?"](https://arxiv.org/abs/2312.06648) by Chen et. al. 2023.
7
+
8
+ # Usage
9
+
10
+ The prompt to the model is formatted like: `Title: {title}. Section: {section}. Content: {content}`. The output of the model is a list of propositions in JSON format.
11
+
12
+ For example, if we use the model to decompose the following passage:
13
+
14
+ ```
15
+ Title: Leaning Tower of Pisa. Section: . Content: Prior to restoration work performed between 1990 and 2001, Leaning Tower of Pisa leaned at an angle of 5.5 degrees, but the tower now leans at about 3.99 degrees. This means the top of the tower is displaced horizontally 3.9 meters (12 ft 10 in) from the center.
16
+ ```
17
+
18
+ The output will be:
19
+
20
+ ```
21
+ ["Prior to restoration work performed between 1990 and 2001, Leaning Tower of Pisa leaned at an angle of 5.5 degrees.", "Leaning Tower of Pisa now leans at about 3.99 degrees.", "The top of Leaning Tower of Pisa is displaced horizontally 3.9 meters (12 ft 10 in) from the center."]
22
+ ```
23
+
24
+ # Example Code
25
+
26
+ Example:
27
+
28
+ ```python
29
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
30
+ import torch
31
+ import json
32
+
33
+ model_name = "chentong00/propositionizer-wiki-flan-t5-large"
34
+ device = "cuda" if torch.cuda.is_available() else "cpu"
35
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
36
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)
37
+
38
+ title = "Leaning Tower of Pisa"
39
+ section = ""
40
+ content = "Prior to restoration work performed between 1990 and 2001, Leaning Tower of Pisa leaned at an angle of 5.5 degrees, but the tower now leans at about 3.99 degrees. This means the top of the tower is displaced horizontally 3.9 meters (12 ft 10 in) from the center."
41
+
42
+ input_text = f"Title: {title}. Section: {section}. Content: {content}"
43
+
44
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids
45
+ outputs = model.generate(input_ids.to(device), max_new_tokens=512).cpu()
46
+
47
+ output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
48
+ try:
49
+ prop_list = json.loads(output_text)
50
+ except:
51
+ prop_list = []
52
+ print("[ERROR] Failed to parse output text as JSON.")
53
+ print(json.dumps(prop_list, indent=2))
54
+ ```
55
+
56
+ Expected Output:
57
+
58
+ ```json
59
+ [
60
+ "Prior to restoration work performed between 1990 and 2001, Leaning Tower of Pisa leaned at an angle of 5.5 degrees.",
61
+ "Leaning Tower of Pisa now leans at about 3.99 degrees.",
62
+ "The top of Leaning Tower of Pisa is displaced horizontally 3.9 meters (12 ft 10 in) from the center."
63
+ ]
64
+ ```
65
+
66
+ # Citation
67
+
68
+ ```bibtex
69
+ @article{chen2023subsentence,
70
+ title={Dense X Retrieval: What Retrieval Granularity Should We Use?},
71
+ author={Tong Chen and Hongwei Wang and Sihao Chen and Wenhao Yu and Kaixin Ma and Xinran Zhao and Hongming Zhang and Dong Yu},
72
+ journal={arXiv preprint arXiv:2312.06648},
73
+ year={2023},
74
+ URL = {https://arxiv.org/pdf/2312.06648.pdf}
75
+ }
76
+ ```