Text comparison of phrases in two data-frames and getting the output at matching phrases with sequence and index
Two datasets df and df1 are in columns in row-wise split, but separated by fullstop ‘.’ as complete sentence. I want to match the dataset phrases which are present in both and and get the dataset at the matching sentences with the index of superset df.
I can only make if the text is plain, but not in the column-wise. If the spaCy or nlp with language model can help to handle this issue?
df: index ID-0 ID-1 text 0 4 20 This 1 6 8 is 2 8 6 an 3 12 15 apple 4 29 9. 5 45 5 The 6 56 8 apple 7 60 10 is 8 62 15 sweet 9 65 2 . 10 66 1 This 11 68 2 is 12 70 6 very 13 73 4 good 14 75 1 fruit 15 76 3 . 16 78 1 I 17 82 0 like 18 90 6 to 19 95 8 eat 20 99 2 apple 21 100 0 . df1 idx text 1 The 2 apple 3 is 4 sweet 5 . 6 I 7 like 8 to 9 eat 10 apple 11 . output: index ID-0 ID-1 text 5 45 5 The 6 56 8 apple 7 60 10 is 8 62 15 sweet 9 65 2 . 16 78 1 I 17 82 0 like 18 90 6 to 19 95 8 eat 20 99 2 apple 21 100 0 .
Should be pretty simple:
df_new = df[df.text.isin(df1.text)]