The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.
Having Seen various approach, I first decided to pull Mrs into team Parent and talked about my concern.
It seems Mrs and daughter had already this discussion few months ago when we gifted her phone and this is the status
1. My kids pvt account has my Mrs as a member and follower (on one condition. See 2). So Mrs can see whatever is being posted. Its other story that Mrs rarely looks as she says its too many chatter to even care . But this works I guess because Kid knows she can be checked anytime
2. Mrs should never respond, join in or give opinion on the posting etc. She has to be a just a mute spectator and any issues need to be discussed offline.
Ps. I asked if she knew what they generally talk about and it seems its billie eilish,
billie eilish ,
billie eilish ,
fashion style for next day in school,
Why *insert name* is behaving so childish,
billie eilish ,
Too much of good is bad,mix some evil in it
I've rarely paid attention to music at all, with extremely few exceptions...I do have a small CD collection from the late 80s/early 90s, but if I got rid of it all I wouldn't miss most of it. I ain't much of a music "consumer". I work in total silence (save for computer fan noises)--the best environment for me to work in is one where you can hear a pin drop.
1. My kids pvt account has my Mrs as a member and follower
As was pointed out yesterday...that's the account Mrs knows about. Then there's the real one - the one that's not using a postbot to query Twitter to figure out what's trending and decide what to post about...
I have the following problem and was thinking I could use machine learning but I'm not completely certain it will work for my use case.
I have completed a machine learning course and have a data set of around a hundred million records containing customer data including names, addresses, emails, phones, etc and would like to find a way to clean this customer data and identify possible duplicates in the data set.
Most of the data has been manually entered using an external system with no validation so a lot of our customers have ended up with more than one profile in our DB, sometimes with different data in each record.
For Instance, We might have 5 different entries for a customer John Doe, each with different contact details.
We also have the case where multiple records that represent different customers match on key fields like email. For instance, when a customer doesn't have an email address but the data entry system requires it our consultants will use a random email address, resulting in many different customer profiles using the same email address, same applies for phones, addresses, etc.
All of our data is indexed in Elasticsearch and stored in a SQL Server Database. My first thought was to use Mahout as a machine learning platform (since this is a Java shop) and maybe use H-base to store our data (just because it fits with the Hadoop Ecosystem, not sure if it will be of any real value), but the more I read about it the more confused I am as to how it would work in my case, for starters I'm not sure what kind of algorithm I could use since I'm not sure where this problem falls into, can I use a Clustering algorithm or a Classification algorithm? and of course, certain rules will have to be used as to what constitutes a profile's uniqueness, i.e what fields.
you forgot to add there may be 50 different people with he same name, i.e. Mohamm John Doe (often just the one name, sometimes spelled differently, sometimes father and son sharing the same name...), birthdate sometimes unknown so it's 1-Jan-yyyy (and not always the same year for the same cust because they just don't know for sure). People may have moved so address [or even locality] is not telling.
Only 1 million (or is that just the sample?), well over 6 million I thought.
If you have done a machine learning course then you will have a basic understanding of statistical modelling.
Call me simple, but surely all you need to do is create a training dataset and apply that dataset to a number of different models until you get the results you are expecting.
You can then use this model to your live dataset and see what results you get.
I feel like I am patronising you by explaining the above having only myself spent about 30 minutes learning about machine learning, so I am sure you know much more than me on this subject and can see the flaws in my suggestions.
“That which can be asserted without evidence, can be dismissed without evidence.”
My personal take on this sort of thing. And please don't take this the wrong way.
Can you, as a human being, with a human brain, look at any two such records and define some criteria by which you can decide what's a duplicate and what's not...? And then correctly decide what to do about the situation?
If you can't, then I'm afraid this is another case of "machine learning" being presented as a panacea to solve all of humanity's problems.
As I was saying just a few days ago in some unrelated thread...this is how "AI" and "big data" ends up showing me ads for articles I've just purchased, after the purchase was made...
Last Visit: 7-Apr-20 4:47 Last Update: 7-Apr-20 4:47