Click here to Skip to main content
15,908,834 members

Welcome to the Lounge

   

For discussing anything related to a software developer's life but is not for programming questions. Got a programming question?

The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.

 
GeneralRe: Women Pin
theoldfool13-Nov-21 5:22
professionaltheoldfool13-Nov-21 5:22 
GeneralRe: Women Pin
Sander Rossel14-Nov-21 22:56
professionalSander Rossel14-Nov-21 22:56 
GeneralRe: Women Pin
Nuke Ashraf14-Nov-21 23:06
Nuke Ashraf14-Nov-21 23:06 
GeneralRe: Women Pin
DRHuff13-Nov-21 8:54
DRHuff13-Nov-21 8:54 
GeneralRe: Women Pin
Greg Utas13-Nov-21 13:37
professionalGreg Utas13-Nov-21 13:37 
GeneralRe: Women Pin
Eddy Vluggen14-Nov-21 9:11
professionalEddy Vluggen14-Nov-21 9:11 
GeneralRe: Women Pin
Kelly Herald15-Nov-21 3:24
Kelly Herald15-Nov-21 3:24 
GeneralDetecting Information Bias Pin
Randor 13-Nov-21 0:05
professional Randor 13-Nov-21 0:05 
Hi,

I just stumbled on something and thought I would share it. A while back I mentioned that I was working on a CCC analyzer/solver in my spare time, it's a side project and I haven't finished it. Since I will have some free time towards the end of this year I am picking up the project again. As part of my project I am analyzing the crossword puzzles using a skip-gram word-embedding with over 2 trillion tokens (3M vocab) evaluated in 500 dimensions. The embedding is trained from parts of the English Gigaword corpus, the wikipedia dump and most of the news/science articles from 2011-2017. (Yes, alot of data!)

One of my unit tests checks the 100 common nouns in the English language for certain characteristics.

[Top 10 correlations for Government]

governments   0.723813
minister   0.618532
administration   0.60618
federal   0.595554
governmental   0.587466
cabinet   0.584909
public   0.583068
ministry   0.579487
officials   0.572555
whitlam   0.565244


I like to think that I have a good grasp of the English language. However last night I noticed something that stood out, I saw a word relation that seemed unusual. The word 'Whitlam' was showing up as being very highly related to the word 'government'. I'd never heard of that word before so I looked up the definition. It's not a word, it's a persons name but how could the world's population of 7.9 billion use this word at such a high frequency under the context of 'government'.

The spearman[^] and pearson correlation[^] was so high... it could only mean that the word was being used directly next to the word 'government'. So I needed to find out how this bias has occured and where it was coming from. Then I found it, Whitlam Government[^], there are 434 articles on wikipedia[^] with this phrase. A quick investigation shows that there are over 80,000 indexed web pages using this phrase.

Interesting situation... since I have historic wikipedia dumps and also news articles from prior years I can look for this bias in prior years. I generated an embedding representing the year 2013 and 'Whitlam' scores much much lower. So it seems people are using this phrase much more today that in years past.

So this got me thinking... potentially as an offensive IW attack against NationY that is known to be using NLP to study TopicX it should be quite easy to distort and manipulate the outcome. In fact, you can easily calculate just how many words/articles would be needed to increase the rank/correlation.

As a defensive measure, it should be quite easy to monitor (from an omniscient internet viewpoint) words and phrases being used by the population that begin to deviate away from the current Zipfian distribution[^].

Wikipedia is not a reliable data source, I would recommend avoiding it for important NLP research.

Best Wishes,
-David Delaune
GeneralRe: Detecting Information Bias PinPopular
Greg Utas13-Nov-21 1:28
professionalGreg Utas13-Nov-21 1:28 
GeneralRe: Detecting Information Bias Pin
ElectronProgrammer13-Nov-21 7:36
ElectronProgrammer13-Nov-21 7:36 
GeneralRe: Detecting Information Bias Pin
Randor 13-Nov-21 8:55
professional Randor 13-Nov-21 8:55 
GeneralRe: Detecting Information Bias Pin
ElectronProgrammer13-Nov-21 10:37
ElectronProgrammer13-Nov-21 10:37 
GeneralRe: Detecting Information Bias Pin
trønderen13-Nov-21 9:57
trønderen13-Nov-21 9:57 
JokeRe: Detecting Information Bias Pin
Randor 13-Nov-21 11:17
professional Randor 13-Nov-21 11:17 
GeneralI feel like hot stuff right now. Pin
honey the codewitch12-Nov-21 13:19
mvahoney the codewitch12-Nov-21 13:19 
QuestionRe: I feel like hot stuff right now. Pin
Eddy Vluggen12-Nov-21 13:42
professionalEddy Vluggen12-Nov-21 13:42 
GeneralIt's not my fault, but since November is call Microsoft names month... Pin
charlieg12-Nov-21 12:56
charlieg12-Nov-21 12:56 
AnswerRe: It's not my fault, but since November is call Microsoft names month... Pin
Eddy Vluggen12-Nov-21 13:43
professionalEddy Vluggen12-Nov-21 13:43 
GeneralRe: It's not my fault, but since November is call Microsoft names month... Pin
charlieg15-Nov-21 10:19
charlieg15-Nov-21 10:19 
GeneralRIP: Graeme Edge, 'The Moody Blues' :( Pin
0x01AA12-Nov-21 7:01
mve0x01AA12-Nov-21 7:01 
GeneralRe: RIP: Graeme Edge, 'The Moody Blues' :( Pin
OriginalGriff12-Nov-21 6:40
mveOriginalGriff12-Nov-21 6:40 
GeneralRe: RIP: Graeme Edge, 'The Moody Blues' :( Pin
Mike Hankey12-Nov-21 6:43
mveMike Hankey12-Nov-21 6:43 
GeneralRe: RIP: Graeme Edge, 'The Moody Blues' :( Pin
Mike Hankey12-Nov-21 6:42
mveMike Hankey12-Nov-21 6:42 
GeneralRe: RIP: Graeme Edge, 'The Moody Blues' :( Pin
Slow Eddie13-Nov-21 2:39
professionalSlow Eddie13-Nov-21 2:39 
PraiseEureka! I was stewing on a rant I made here and it clicked! Pin
honey the codewitch12-Nov-21 5:47
mvahoney the codewitch12-Nov-21 5:47 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.