|
Jo_vb.net wrote: Sandy Denny song Great song.
"the debugger doesn't tell me anything because this code compiles just fine" - random QA comment
"Facebook is where you tell lies to your friends. Twitter is where you tell the truth to strangers." - chriselst
"I don't drink any more... then again, I don't drink any less." - Mike Mullikins uncle
|
|
|
|
|
" ... do not copy or store in a retrieval system ...". There's the rub.
The "discovery" phase will force them to show their "training data".
"AI" is so "expensive" because it's constantly "structuring" unstructured data from "documents" that it "copied".
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
|
|
|
|
|
|
Could you not us AI to detect?
As the aircraft designer said, "Simplicate and add lightness".
PartsBin an Electronics Part Organizer - Release Version 1.3.0 JaxCoder.com
Latest Article: SimpleWizardUpdate
|
|
|
|
|
Perhaps - but this could be a challenge
|
|
|
|
|
Following seems to be more readable without a subscription. Original link pops huge subscription page over it.
The New York Times sues OpenAI and Microsoft for copyright infringement | CNN Business[^]
From my link...
unlawful use of The Times’s work to create artificial intelligence products that compete with it threatens The Times’s ability to provide that service."
Not clear to me what that means.
In general of course NYT provides news, like daily news. So how is AI competing with that?
Now perhaps it also provides reviews? (Movie, theater, etc) And AI is providing recommendations based on that?
But it doesn't say that in the article. So specifically how is it providing competitive material?
It says the following...
it[NYT] discovered months ago that its work had been used to train the companies’ large language models.
It doesn't say it wasn't paid for that access.
Now if I read a newspaper for 20 years and learn a lot about businesses from that and then start my own business and make a billion dollars can the NYT sue me because I "used" the paper to achieve my success?
If I have a digital subscription and re-read an article three times should I pay extra?
If I have print edition and I have been cutting out articles for 20 years and I re-read them regularly should I pay extra?
Keep in mind of course that my business is based on that and it is worth a billion dollars.
|
|
|
|
|
You are correct, the laws surrounding unjust or unlawful enrichment are tricky. The NYT will have to prove in court that the AI is not randomly piecing articles together and not following a rule (like the standard "Who, What, When, Where, Why and How" of news article structure). But that the AI algorithm is using the stylistic pattern that was trained by the use of the NYT articles. That pattern when applied to "new" news articles will allow the AI to impersonate the successful NYT style and unfairly compete with the NYT.
You are correct that there is nothing stopping you from studying the NYT article style and copying that style. But to compete with the NYT you would also need to raise money to start your own newspaper. You as a person will not be able to compete with a complete news organization. You would need to hire people and in the end, your organization would be similar but not identical to the NYT. However, an AI with proper hardware can replicate the work of hundreds of people. It can be identical because it is not creative. It is not sentient, it is not conscious. It is an algorithm.
The NYT is claiming that the news articles were not used for their intended purpose, which is to inform the public of events. Instead it was used to train a machine to replicate the style that makes the NYT unique and the result will be a machine that can unfairly compete with the NYT.
For that valuable training, the NYT wants to be compensated or the material removed from the training dataset.
It remains to be seen how this will play out in court.
|
|
|
|
|
Gary Stachelski 2021 wrote: the laws surrounding unjust or unlawful enrichment are tricky.
Follow up on actual video (CNN?) suggested that NYT provided an 'example' which was a post where a real person could not find anything so they used a AI which responded with the first three paragraphs of an existing article.
Now one might say that is problematic. But any standard paywall is likely going to do something similar. Only alternative with a paywall is either to use only the headline or to provide a synopsis for every article.
The user/reader, if they wanted to see the entire article, would still need to access NYT.
So at least with that example I am not convinced where the problem lies.
Gary Stachelski 2021 wrote: nothing stopping you from studying the NYT article style
Nothing I have seen suggests that has anything to do with it. The problem is content in everything that I have seen.
|
|
|
|
|
Here is an article that just came out that sheds more light on NYT suit.
One thing that I did not consider is that AI responses often hallucinate (fabricate) results and in some of the NYT examples a GPT model completely fabricated an article that it claimed that the NYT published on January 10, 2020 titled "Study Finds Possible Link between Orange Juice and Non-Hodgkin's Lymphoma", The NYT never published such an article. Other examples show a mix of fact and fabricated info. Never thought about that aspect of AI responses.
NY Times sues Open AI, Microsoft over copyright infringement | Ars Technica[^]
|
|
|
|
|
But I doubt that is actionable. Not in this suit.
Their current claim is about how it is using the data it collected. Obviously this demonstrates something it didn't collect.
Not to mention they would also need to prove that what they publish is a standard in truth telling and thus this would hurt them.
But following as an example suggests otherwise.
What the New York Times UFO Report Actually Reveals[^]
|
|
|
|
|
|
Thanks for the link!
I don't think there is a problem when re-reading three times an article.
But an AI in learning mode can read thousands, hundred thousands or more text paragraphs or articles and re-read it for each Optimization Loop.
And there can be a huge number of loops.
|
|
|
|
|
Remember the napster debacle?
They stab it with their steely knives but they just can't kill the beast.
|
|
|
|
|
Can go one step further.
All the words of the NYT articles are taken from a standard English dictionary, and the AI is just rearranging/reusing words from that dictionary into meaningful (sometimes meaningless?) sentences.
So the publishers of that dictionary can indeed sue the AI, isn't it?
|
|
|
|
|
That's already settled law. An English dictionary publisher cannot sue everyone writing in English for breach of copyright.
What is protected in a copyright is not the individual words, but the creativity required to arrange them in a particular order. It is for this reason that derived works (set in the same "universe" as the original work) are also protected under copyright.
EDIT: corrected syntax
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Maybe they should sue the NYT first for using their words.
|
|
|
|
|
I'm probably way more amused by this than I have any right to be.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
Begs the question of whether they’re protecting their journalists’ work or their paywall.
Time is the differentiation of eternity devised by man to measure the passage of human events.
- Manly P. Hall
Mark
Just another cog in the wheel
|
|
|
|
|
|
Good for them, it's certainly an issue that needs to be addressed. In their terms, AI isn't intelligence, it's parroting back what 'it' reads. Sometimes verbatim, sometimes glued together, and often mis-cited; appearing to come from sources that don't reflect the content.
In our industry, we can look at this from two different perspectives. One is the "Houston, we have a problem building AI" and the other is "yeah, we need better IP protections". Creating intelligent content costs money; in some cases a lot of money. If AI is allowed to trample IP rights, what is the motivation to invest the time and resources to create that content? What happens if the Times and other media cease to exist since their ability to make money ends? AI can't replace it and the information age will be permanently stuck in 2023 to some extent.
As an aside, in my opinion, the Fed needs to revisit the entire IP realm. We, as an industry, have been stuck between the lame copyright protection and the extreme bar of patent protection. The day is liable to come at some point, where AI could get into recreating software on its own, possibly eliminating any IP protection. There's a lot of things that need to be sorted out.
|
|
|
|
|
I've come up with a simple defense that the OpenAI team of lawyers can utilize and that no one can possibly defend.
If the President of Harvard can do it then Chat-GPT can do it because if the President of Harvard can do it because she is a "protected" class then what is more of a minority than the very first instance of an AI and shouldn't that then be a protected class that is allowed to also break the law and all forms of ethics if the Harvard President is also allowed to otherwise keep her job after having so many clear instances of plegarism?
|
|
|
|
|
upvoted.
Charlie Gilley
“They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759
Has never been more appropriate.
|
|
|
|
|
Wait till they figure out everything is a derivative and nothing is original.
|
|
|
|
|
Bear with me, because as much as I am loath to holy roll about technology, I still have my peeves.
I went about porting my DFA lexer engine from C# to TypeScript. It was primarily an exercise in teaching myself TypeScript, plus brushing up on my JS.
So I implement the bones of it, and after adjusting my mental map to the JS way of doing things I got it mostly working.
Then I went about trying to use a Map keyed by Sets.
Turns out JS Map and Set will only compare by value for "scalar" types (presumably including strings) or otherwise it uses reference comparisons. You can't override your own equality mechanism either.
how to customize object equality for javascript set - Stack Overflow[^]
Consequently, there is no performant way to do subset construction to convert an NFA to a DFA in this language.
I've seen others solve this problem by using string keys, but this falls down for machines of non-trivial size.
Regex FA visualizer[^] is one example but I can basically crash it or stall it out for a long time at least with any non-trivial expression. This one also doesn't work properly besides, but I have no other link handy for you to try.
This may be academic, but it is also basic computer science. A language should be able to allow you to implement computer sciencey algorithms and constructs - especially those that have been adapted to countless other programming languages. DFA by subset construction is basic.
And you can't do it in JS.
I can't even begin to imagine what LALR table generation would look like.
You may be wondering why do I care?
Because node.js.
Because Angular
Because React-Native
it's not just for web front ends anymore. JS is an almost virulent technology these days. It needs to be, if not Turing complete at least cover the fundamentals, or you're just spreading garbage around.
Without a way to do custom comparisons at the very least on hashed containers, your language isn't going to be able to do a lot of things other high level languages can accomplish handily.
Is it even a "real" language? Is it ready for primetime, or is it just being adopted because we can?
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
modified 27-Dec-23 9:13am.
|
|
|
|
|
My experience has been that programming languages evolve mostly to meet practical needs. You can substitute the words economic or business for the word practical and still have a valid statement. While there is a certain amount of 'need' for the ability to implement computer-sciencey algorithms in a language in a performant way, I think it's a lower priority than other features that simplify or extend expression of common idioms.honey the codewitch wrote: Consequently, there is no performant way to do subset construction to convert an NFA to a DFA in this language I take it that it's not impossible, and your objection is to the performance of the implementation required by the language? It sounds like an edge case you run in to with almost every language that needs an alternative solution.
For example: Since you're implementing this in TypeScript, it's a web app. That implies a server. What about serializing the NFA, shipping it to the server for conversion, and deserializing the DFA returned? 'Out-of-the-box', as it were .
Software Zen: delete this;
|
|
|
|