Arsive commited on
Commit
4fc37af
1 Parent(s): 7271565

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -20
README.md CHANGED
@@ -1,42 +1,103 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
4
  ---
5
 
6
  # Model Card for Model ID
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
- Input - Receipt image
10
  Output - JSON
11
 
12
-
13
  ## Model Details
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ### Model Description
16
 
17
  <!-- Provide a longer summary of what this model is. -->
18
 
19
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
20
 
21
- - **Developed by:** [More Information Needed]
22
- - **Funded by [optional]:** [More Information Needed]
23
- - **Shared by [optional]:** [More Information Needed]
24
  - **Model type:** [More Information Needed]
25
  - **Language(s) (NLP):** [More Information Needed]
26
  - **License:** [More Information Needed]
27
- - **Finetuned from model [optional]:** [More Information Needed]
28
 
29
  ### Model Sources [optional]
30
 
31
  <!-- Provide the basic links for the model. -->
32
 
33
- - **Repository:** [More Information Needed]
34
- - **Paper [optional]:** [More Information Needed]
35
- - **Demo [optional]:** [More Information Needed]
36
 
37
  ## Uses
38
 
39
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
40
 
41
  ### Direct Use
42
 
@@ -187,14 +248,6 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
187
 
188
  [More Information Needed]
189
 
190
- ## More Information [optional]
191
-
192
- [More Information Needed]
193
-
194
- ## Model Card Authors [optional]
195
-
196
- [More Information Needed]
197
-
198
  ## Model Card Contact
199
 
200
- [More Information Needed]
 
1
  ---
2
  library_name: transformers
3
+ license: gemma
4
+ datasets:
5
+ - naver-clova-ix/cord-v2
6
+ language:
7
+ - en
8
  ---
9
 
10
  # Model Card for Model ID
11
 
12
  <!-- Provide a quick summary of what the model is/does. -->
13
+ Input - Receipt image <br>
14
  Output - JSON
15
 
 
16
  ## Model Details
17
 
18
+ Taken from Donut:
19
+
20
+ ```
21
+ ### Use this code to convert the generated output to JSON
22
+ def token2json(tokens, is_inner_value=False, added_vocab=None):
23
+ """
24
+ Convert a (generated) token sequence into an ordered JSON format.
25
+ """
26
+ if added_vocab is None:
27
+ added_vocab = processor.tokenizer.get_added_vocab()
28
+
29
+ output = {}
30
+
31
+ while tokens:
32
+ start_token = re.search(r"<s_(.*?)>", tokens, re.IGNORECASE)
33
+ if start_token is None:
34
+ break
35
+ key = start_token.group(1)
36
+ key_escaped = re.escape(key)
37
+
38
+ end_token = re.search(rf"</s_{key_escaped}>", tokens, re.IGNORECASE)
39
+ start_token = start_token.group()
40
+ if end_token is None:
41
+ tokens = tokens.replace(start_token, "")
42
+ else:
43
+ end_token = end_token.group()
44
+ start_token_escaped = re.escape(start_token)
45
+ end_token_escaped = re.escape(end_token)
46
+ content = re.search(
47
+ f"{start_token_escaped}(.*?){end_token_escaped}", tokens, re.IGNORECASE | re.DOTALL
48
+ )
49
+ if content is not None:
50
+ content = content.group(1).strip()
51
+ if r"<s_" in content and r"</s_" in content: # non-leaf node
52
+ value = token2json(content, is_inner_value=True, added_vocab=added_vocab)
53
+ if value:
54
+ if len(value) == 1:
55
+ value = value[0]
56
+ output[key] = value
57
+ else: # leaf nodes
58
+ output[key] = []
59
+ for leaf in content.split(r"<sep/>"):
60
+ leaf = leaf.strip()
61
+ if leaf in added_vocab and leaf[0] == "<" and leaf[-2:] == "/>":
62
+ leaf = leaf[1:-2] # for categorical special tokens
63
+ output[key].append(leaf)
64
+ if len(output[key]) == 1:
65
+ output[key] = output[key][0]
66
+
67
+ tokens = tokens[tokens.find(end_token) + len(end_token) :].strip()
68
+ if tokens[:6] == r"<sep/>": # non-leaf nodes
69
+ return [output] + token2json(tokens[6:], is_inner_value=True, added_vocab=added_vocab)
70
+
71
+ if len(output):
72
+ return [output] if is_inner_value else output
73
+ else:
74
+ return [] if is_inner_value else {"text_sequence": tokens}
75
+ ```
76
+
77
  ### Model Description
78
 
79
  <!-- Provide a longer summary of what this model is. -->
80
 
81
+ This is the model card of a 🤗 paligemma-img-to-json model that has been pushed on the Hub.
82
 
83
+ - **Developed by:** [Arsive](https://huggingface.co/Arsive)
 
 
84
  - **Model type:** [More Information Needed]
85
  - **Language(s) (NLP):** [More Information Needed]
86
  - **License:** [More Information Needed]
87
+ - **Finetuned from model [optional]:** [google/paligemma-3b-pt-224](https://huggingface.co/google/paligemma-3b-pt-224)
88
 
89
  ### Model Sources [optional]
90
 
91
  <!-- Provide the basic links for the model. -->
92
 
93
+ - **Repository:** [Respository] (https://huggingface.co/Arsive/paligemma-img-to-json/tree/main)
94
+ - **Paper [optional]:** NIL
95
+ - **Demo [optional]:** NIL
96
 
97
  ## Uses
98
 
99
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
100
+ Can be used to get the json version of an image. The Image must contain a receipt.
101
 
102
  ### Direct Use
103
 
 
248
 
249
  [More Information Needed]
250
 
 
 
 
 
 
 
 
 
251
  ## Model Card Contact
252
 
253
+ [mail](arsive.ai@gmail.com)