Algorithm - can set of strings be represented as composition of two sets?

Question

0.00/5 (No votes)

See more:

A is a set of strings defined by alphabet {a, b, c}. Can it be represented as composition of two other sets of strings B and C? A = BC, but A != C and A != B.

Composition means joining each string of set B with each string from set B (which creates set A). For example A = {aba, ca, abb, cb} then answer is positive, because B = {ab, c} and C = {a, b}. Empty string ε is also allowed.

Example input:
List of strings:
aba ca abb cb

Expected output:
true

All I'm asking for is there an official name of this problem so I can look up solution myself. I don't mind getting solution here, but this is extremely hard problem to solve.

What I have tried:

Brute force solution but the deeper I go the more I realize how collosal amount of possibilities I need to cover to create solution that works for all examples.

Posted 7-Mar-22 23:24pm

Member 15047625

Updated 8-Mar-22 2:51am

Add a Solution

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Greg Utas · Answer 1 · 2022-03-08T00:57:00

I thought the term for this was the product, not the composition, of two sets. But I don't know what the reverse operation is called.

One approach would be to factor |C| (the cardinality of C). In your example, this yields 4x1 or 2x2, so you'd be looking for an |A| and |B| of 4 and 1, or 2 and 2.

But let's say that A contains a and ab, and B contains bc and c. AxB then produces abc twice, so one of them would be dropped. In this case, trying to factor |C| no longer works. So we're back to a more brute force approach.

However, as you use a brute force approach, you're constructing candidate sets A and B. If |A| x |B| > |C|, it must be that AxB produces some number of duplicates, so checking that this is true would probably prune a lot of the searches quickly.

EDIT: Another thing that would help is to consider each string in C individually. ca implies that c must be in A, and a must be in B. cb is similar. These are simple examples, but even a long string must be split this way. This could quickly prune the search when a candidate AxB produces a string that isn't in C.

Arthur V. Ratz · Answer 2 · 2022-03-08T02:51:00

Solution 2

Here's an algorithm:

Alphabet: {a,b,c}

Strings: aba ca abb cb

The most complete solution to this is to build a dictionary M, mapping each string onto the letters, that existed in {a,b,c}, which have been occurred once or multiple times. Here, it's not necessary to set the mappings of the duplicate letters, so that the dictionary M will be a rectangular matrix 4 x 3, consisting of the following elements:

a b c

aba : 1, 1, 0
ca : 1, 0, 1
abb : 1, 1, 0
cb : 0, 1, 1

For instance, the string "abb" contains 1 - 'a' and 2 - 'b', such as that the row of matrix M is { 1, 1, 0 }, for which 1 and 1 values are corresponding to letters 'a' and 'b', respectively, and 0 corresponds to the letter 'c', which has not been found in the string "abb".

Next, take the inner product of the matrix M and its transpose, to obtain the symmetric matrix I=MM^T of shape 4 x 4.

I=MM^T (Upper triangular):

0 1 2 0
0 0 1 0
0 0 0 0
0 0 0 0

Finally, check whether at least one element, on the I matrix's diagonal, is equal to 0. If so, the composition does not exist and the result is "FALSE", or "TRUE", unless otherwise.

Here's a code snippet, written in plain C89, illustrating the algorithm, above:

C++

#include <vector>
#include <fstream>
#include <iostream>

int main()
{
    const char* filename = "tZwciVLk.txt";

    std::ifstream ifs(filename, \
        std::ifstream::ios_base::in);

    std::vector<std::string> strings;

    char* line = (char*)new char[256];

    while (ifs.getline(line, 256))
        strings.push_back(line);

    std::string chars = "abc";

    if (strings.size() == 1) {

        std::size_t i = 0; bool f = false;
        const char* string = \
            strings[strings.size() - 1].c_str();
        f = strcmp("\0", string) == 0;
        while (i < strlen(string) && !f)
            f = strchr(chars.data(), string[i++]);

        printf("Output: %s\n", f ? "True" : "False");

        return 0;
    }

    int** M = (int**)malloc(strings.size() * sizeof(int*));
    int** I = (int**)malloc(strings.size() * sizeof(int*));

    std::size_t n = 0;
    for (std::size_t i = 0; i < strings.size(); i++) {
        M[i] = (int*)malloc(chars.size() * sizeof(int));
        const char* s = strings[i].data();
        if (strcmp("\0", s) < 0) {
            for (std::size_t j = 0; j < chars.size(); j++)
                M[n][j] = strchr(s, chars[j]) ? 1 : 0;

            n++;
        }
    }

    bool f = true;
    for (std::size_t i = 0; i < n && f; i++) {
        I[i] = (int*)malloc(strings.size() * sizeof(int));
        memset((void*)I[i], 0x00, strings.size() * sizeof(int));
        for (std::size_t j = 0; j < n && f; j++) {
            for (std::size_t k = 0; k < chars.size(); k++)
                I[i][j] += M[i][k] * M[j][k];

        }

        f = I[i][i] == 0 ? 0 : 1;
    }

    for (std::size_t i = 0; i < n; i++)
         std::cout << strings[i] << " ";

    std::cout << std::endl;

    std::cout << "Output: " << (f == true ? "True" : "False") << std::endl;

    return 0;
}

Outputs:

Alphabet: {a,b,c}
Strings: aba ca abb cb
Output: True

Alphabet: {a,b,c}
Strings: aba ca abb cb d
Output: False

Alphabet: {a,b,c}
Strings: tt yyyy zz dd ppp
Output: False

Alphabet: {a,b,c}
Strings: tt yyyy zz dd ppp
Output: False

Alphabet: {a,b,c}
Strings: xp
Output: False

Alphabet: {a,b,c}
Strings: ax
Output: True

Edit:

Also, this is implementation-specific. You can also find an intersection of the alphabet and each string to do that, something like:

C#

using System.Linq;

char[] chars = {'a','b','c'};
String[] strings = { "aba", "ca", "abb", "cb", "zz" };

bool isComposition(String[] strings, char[] chars) {
    return strings.Select(s => s.Intersect(chars).Any()).ToArray().All(s => s);
}

Console.WriteLine(isComposition(strings, chars));

Good Luck! :)

Posted 8-Mar-22 2:51am

Arthur V. Ratz

Updated 8-Mar-22 8:53am

v7

Comments

Member 15047625 8-Mar-22 9:22am

Program crashes for set of length 1 "abc". I can provide you my code (I prefer C# myself).

Arthur V. Ratz 8-Mar-22 10:32am

Here's a small update to the code, working for 1-string sets. Probably, this one is what you need. :)

Member 15047625 8-Mar-22 10:53am

Still crashing for me. j at some point becomes 1 and it gets out of range because I has size 1x1 for set of size 1. So it only has available indexes 0 and 0.

Arthur V. Ratz 8-Mar-22 11:12am

You might want also to try this. See the code fragment, from above.

Member 15047625 8-Mar-22 11:23am

Now it works. But there is still one thing off. This set of strings: https://pastebin.com/tZwciVLk (first line is empty string of length 0). Returns false, but it should return true. I trust that this algorithm works fine so I'm unsure where is the problem.

Arthur V. Ratz 8-Mar-22 12:10pm

Working??

Arthur V. Ratz 8-Mar-22 11:29am

Just remove a string in the first line, or do a check. If the string scanned is not empty, if not do the computation for that, or proceed with the next line, otherwise.

Member 15047625 8-Mar-22 12:12pm

But empty string also can be used and shouldn't be ignored, for example in set A = {"", "a", "b", "aa", "ba"}. Result for this one is true.

Arthur V. Ratz 8-Mar-22 11:41am

Also you can add this f = strcmp("\0", string) == 0; to the line 15 of the code, above.

Arthur V. Ratz 8-Mar-22 12:10pm

Working??

Member 15047625 8-Mar-22 12:13pm

About to check out.

Arthur V. Ratz 8-Mar-22 12:18pm

When done, just let me know about it. All right?

Member 15047625 8-Mar-22 12:21pm

Well that change in 15 line only allows empty string to pass (set made of one element) which is wrong because it can't be divided into two that way.

Big set still isn't passing because there are no other changes (maybe "" needs to be added to alphabet somehow?)

Arthur V. Ratz 8-Mar-22 12:41pm

No. Just do the complete empty strings removal, better, first, and then finally apply the algorithm.

Member 15047625 8-Mar-22 12:44pm

Removing empty string might affect result. Empty string ε is allowed.

Arthur V. Ratz 8-Mar-22 12:49pm

All right, just let me see...

Member 15047625 8-Mar-22 12:54pm

It's tricky because empty string is not part of the alphabet, but it can be used when creating two sets.
Like I said for example A = {"", "a", "b", "aa", "ba"} gives B = {"", "a", "b"} and C = {"", "a"}

Arthur V. Ratz 8-Mar-22 12:53pm

1. Just find the count of non-empty strings, first, and after that build matrices M and I of the specific shape, observing non-empty strings quantity.

2. Do the check after line 29, whether a current string is not empty. If it's a non-empty string, compute its row, normally, otherwise bypass it and proceed to the next one.

Member 15047625 8-Mar-22 12:56pm

I will try to figure something out but no promises. This is awfully complex problem.
P.S
Your C# version also fails for the big set.

Arthur V. Ratz 8-Mar-22 13:18pm

And yes, here's another solution above. To do that, for each string, check if it's a non-empty string (lines 31-35), and if so, compute the M's row for non-empty strings, maintaining another index n of non-empty strings in M's rows. Finally, if empty strings were found, the matrix M will have empty rows of 0's, at the final positions. Just don't care about that and build another matrix I, regularly, since the dot product of two zero-vectors is the scalar 0. It doesn't impact the result of composition, as such.

Lookup into my code above, and give a try, running it.

Member 15047625 8-Mar-22 13:23pm

The "big" set fails, everything else passes. So far it seems it works fine but only for very small sets.

Arthur V. Ratz 8-Mar-22 13:27pm

As far as I can understand, this problem does not have an impact on the general performance, nevertheless, empty strings are processed smartly or not. Why do you think that the empty strings issue and performance are related problems, here?

Member 15047625 8-Mar-22 13:32pm

I'm not the one who came up with this problem. Empty strings are tolerable and they can affect the result (set being divisible to two sets). That set in pastebin returns false, but it should return true. Which means that this solution is flawed, because it doesn't work for all cases. And I'm unsure how to adjust it to fix it.

Arthur V. Ratz 8-Mar-22 13:39pm

Could please tell me how you've parsed the strings from pastebin, after reading the file line-by-line?

Arthur V. Ratz 8-Mar-22 13:42pm

In this file, a string at each line does not contain spaces or any other delimiters. Are you concatenating all of the strings into a single array/set? Let me give it a try doing this. And, I'll get back with my opinion later.

Member 15047625 8-Mar-22 13:56pm

File.ReadAllLines and saved to List<string>. First line is "" (string.Empty). The rest is just normal string without any spaces at the start or the end.
As I said so far everything works except this one, which implies something is off, which can also mean that correct results are accidents.

Arthur V. Ratz 8-Mar-22 14:55pm

So here's my final update to the solution :)

I've modified the code to read strings from the pastebin. You may try running the latest version of the code. For me, it works just fine for either case, such as "True" or "False", both.

So that try this out if you would like and paste your response, please.

Member 15047625 8-Mar-22 15:18pm

Still returns false instead of true for this set. This is my code https://pastebin.com/k5e6UJEj, as I mentioned above I actually do it in c#, I decided to accept answers for more languages because there is zero information about this problem online, so I wanted to widen my search area.

It seems to be exact copy of your c++ code, except now its in c#.

Arthur V. Ratz 8-Mar-22 15:25pm

No. Please for this matter run my code in C++, only, to reproduce an error. As I've already explained, this code is working correctly for me and your dataset. I still cannot figure about what the problem is. If you would like I can submit screenshots of my code returns the proper results.

Member 15047625 8-Mar-22 15:52pm

I see. Then I would like to ask what is the best equivalent of strchr and strcmp for c#.

Arthur V. Ratz 9-Mar-22 0:08am

String s1 = "abcdefhi";
String s2 = "nnmmzztt";

Console.WriteLine(s1.IndexOf('d')); # strchr
Console.WriteLine(s1.Equals(s2)); # strcmp

Member 15047625 9-Mar-22 11:50am

Your c++ code run on onlinegdb.com returns true for set "", "a", "ab", which is incorrect, should return false. Link: https://onlinegdb.com/70yxdwKE6

Arthur V. Ratz 9-Mar-22 12:22pm

I'm sorry, but for your dataset from the pastebin it will always return false because the first line is always an empty string. Read your previous posts, carefully. Good Luck.

Member 15047625 9-Mar-22 12:24pm

Pastebin data should return true. Which means it doesn't work at all.
I do appreciate your help but it seems we achieved nothing in the end.

Arthur V. Ratz 9-Mar-22 12:27pm

As I've already explained below:

Arthur V. Ratz 9-Mar-22 12:28pm

I've just checked these sets with my running code.

Arthur V. Ratz 9-Mar-22 12:31pm

If you want, here's another version of my code, which does it the way you've elaborated in your post: https://ufile.io/z4s6kkwp

Arthur V. Ratz 9-Mar-22 12:33pm

Please, just let me guess what we must have achieved. :((

Arthur V. Ratz 9-Mar-22 12:26pm

Also, for sets, below, my code returns:

"" "a" "ab" "abba" "ca" "" "bc", the return is "True"
"" "aa" "abb" aba" "cab" "ca" "" "bac" "" "dd", the return is "False"

And that's the only possible correct result for your assigment.

Arthur V. Ratz 9-Mar-22 12:34pm

... When the condition for handling empty strings is not known. Please make sure how the empty strings must be treated and then get back with your questions or comments if any.

Member 15047625 9-Mar-22 13:31pm

Your new version seems to work but I have to fully confirm it.
Empty string ε (also known as "", string.Empty, string of length 0) works like this: B = {"", "a"} C = {"a", "b"} => A = {"a", "b", "aa", "ab"}. A can be composed of these B and C, and we need to guess if such composition is possible or not. string.Empty + string = string. "" + "something" = "something".
Edit
Still doesn't work. https://onlinegdb.com/JvhxzTZ_g "" "a" "b" "aa" "ba" returns false, should be true.

Arthur V. Ratz 8-Mar-22 13:29pm

The complexity of an algorithm without empty-strings of O(N)-is proportional and with empty-strings occurred it's O(N-e), where e is the number of empty strings. In the empty strings case, it must have been even faster than without them.

Arthur V. Ratz 8-Mar-22 13:31pm

Also, to implement this algorithm smartly, it's better to re-work my code and use the memory buffers re-allocation with std::malloc, std::realloc, as well as new/delete C++ operators, rather than store data as 2d-arrays, though.