2021. 1. 31. 14:04

This is the hint for "Fixing an Inconsistent Character Set"

http://rosalind.info/problems/cset/

 

The most important concept you will use is:

"Two splits S1|Sc1 and S2|Sc2 conflict when all four intersections S1 ∩ S2, S1 Sc2, Sc1 S2, and Sc1 Sc2 are nonempty."

 

Let's say we are splitting a set with six elements: 1, 2, 3, 4, 5, and 6.

The splits can be {1, 2} | {3, 4, 5, 6} or {1, 4, 5} | {2, 3, 6} etc.

 

We will check if two splits conflict using the concept above. 

Let's say S1 = {1, 2}, Sc1 = {3, 4, 5, 6}, S2 = {1, 4, 5}, and Sc2 = {2, 3, 6}.

 

S1 ∩ S2 = {1, 2}{1, 4, 5} = {1}

S1 ∩ Sc2 = {1, 2}{2, 3, 6} = {2}

Sc1 ∩ S2 = {3, 4, 5, 6} {1, 4, 5} = {4, 5}

Sc1 ∩ Sc2 = {3, 4, 5, 6}{2, 3, 6} = {3, 6}

 

As you can see, all four intersections are non-empty. Thus, those two splits conflict.

The other two splits {1, 2} | {3, 4, 5, 6} and {1, 2, 3} | {4, 5, 6} do not conflict because

 

S1 ∩ S2 = {1, 2}

S1 ∩ Sc2 = {}  ← Empty!

Sc1 ∩ S2 = {3}

Sc1 ∩ Sc2 = {4, 5, 6}

 

Now, look at the sample dataset:

100001
000110
111000
100111

We will number the columns as 1, 2, 3, 4, 5, 6 and re-write them as splits. Note it is not important to distinguish between 0 and 1.

100001 → {1, 6} | {2, 3, 4, 5}

000110 → {1, 2, 3, 6} | {4, 5}

111000 → {1, 2, 3} | {4, 5, 6}

100111 → {1, 4, 5, 6} | {2, 3}

 

Then, find a row that conflicts with more than one other row.

 

Conflicts between rows. X indicates "conflict"

Only row 1 and row 3 conflict. Thus, you can delete either one of the rows. Note you will need to remove only one row. This will make computation easier.

 

---
I don't put ads in my blog. If this post helps you, please consider donating a small amount of money that will be a huge motivation.
https://www.buymeacoffee.com/harryincupboard