Skip to content
Snippets Groups Projects
Unverified Commit 61011d77 authored by Rana Banerjee's avatar Rana Banerjee Committed by GitHub
Browse files

Fixing issue of ambiguous/duplicate knowledge triplet generation because of...

Fixing issue of ambiguous/duplicate knowledge triplet generation because of mismatched case or appended special character (double quotes) in generated entities (subject, predicate, object) (#10409)

* While generating triplets, many ambiguous duplicates are created becuase of mismatch in case or presence of double quote charaters appended to the subject or the object. A line of code which strips double qoutes and Capitalizes the entities helps in disambiguation

* Format code with black

* Updated test_base.py with expected output for capitalized text

* Revert "Format code with black"

This reverts commit dd87dc46d7d7630e5bb611a469d6ec14f2e2e194.

* Fix end of files

* Remove unnecessary files
parent 6d456552
No related branches found
No related tags found
No related merge requests found
......@@ -156,6 +156,12 @@ class KnowledgeGraphIndex(BaseIndex[KG]):
if not subj or not pred or not obj:
# skip partial triplets
continue
# Strip double quotes and Capitalize triplets for disambiguation
subj, pred, obj = (
entity.strip('"').capitalize() for entity in [subj, pred, obj]
)
results.append((subj, pred, obj))
return results
......
......@@ -232,6 +232,7 @@ def test__parse_triplet_response(
)
assert len(parsed_triplets) == 1
assert len(parsed_triplets[0]) == 3
assert ("foo", "is", "bar") in parsed_triplets[0]
assert ("hello", "is not", "world") in parsed_triplets[0]
assert ("Jane", "is mother of", "Bob") in parsed_triplets[0]
# Expecting Capitalized triplet Outputs
assert ("Foo", "Is", "Bar") in parsed_triplets[0]
assert ("Hello", "Is not", "World") in parsed_triplets[0]
assert ("Jane", "Is mother of", "Bob") in parsed_triplets[0]
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment