After multiple timings with 64 bit release code, I get times between 3us and 4us for the example.
I would be interested to know if the C++ solution takes longer and secondly how long Matlab takes.
Here is my test program call with the measurement used and the data:
vector<int> X{ 4, 6, 3, 2 };
vector<int> Y{ 1, 2, 4, 5, 3, 8 };
vector<int> LX;
auto t1 = high_resolution_clock::now();
LX = ismember(X, Y);
auto t2 = high_resolution_clock::now();
duration<double, std::micro> us_double = t2 - t1;
std::cout << us_double.count() << "us\n";
Of course there are many other ways to measure time. I have chosen the C++ way here. To avoid caching effects I restarted the program for each run.
If your code should take much longer you can show it. Then we can discuss about it.