Click here to Skip to main content
15,886,788 members
Articles / Programming Languages / SQL
Article

Use SQL in Java to Search for Twitter Users Based on Platform Usage

Rate me:
Please Sign up or sign in to vote.
5.00/5 (2 votes)
29 Oct 2019CPOL 3.6K   2   2
There was a recent request from the Java community on how to extract Twitter users who have not followed a specific account, but have Tweeted about specific content. In this article, we will walk through using the CData JDBC Driver for Twitter to answer such a question.

This article appears in the Third Party Products and Tools section. Articles in this section are for the members only and must not be used to promote or advertise products in any way, shape or form. Please report any spam or advertising.

About the CData JDBC Driver for Twitter

The CData JDBC Driver for Twitter allows users to work with data from Twitter using standard SQL statements. With the Twitter Driver, you can easily connect to Twitter in Java applications, as well as BI, reporting, and ETL tools that support the JDBC standard. The CData Driver supports working with Tweets, direct messages, followers, replies, lists, and more.

Download and Install the JDBC Driver for Twitter

First, download the Twitter JDBC Driver: (click here for a 30-day free trial). Follow the instructions in the installer to complete the installation.

Image 1

Connect to Twitter with JDBC in a Java IDE

Choose your favorite Java IDE (Eclipse, IntelliJ, NetBeans, etc.) or Java tool that supports JDBC and use the native JDBC support to create a connection to Twitter. For this article, we use NetBeans.

  1. Register a new JDBC driver.
    • Driver: The driver JAR file (typically found in C:\Program Files\CData\CData JDBC Driver for Twitter\lib)
    • Driver Class: cdata.jdbc.twitter.TwitterDriver
    • Name: Any (we use CDataTwitter)
    Image 2
  2. Click "OK."
  3. Create a new connection. Select the newly registered driver and enter the authentication parameters in the JDBC URL.
     

    For example: jdbc:twitter:InitiateOAuth=GETANDREFRESH;...;

    The JDBC Driver for Twitter uses OAuth for authentication. To learn more, refer to the Twitter JDBC Driver Help Documentation.

  4. After testing the connection, click "Finish" to establish the connection and begin querying Twitter.

    Image 3

Query Twitter

To query Twitter for the list of users we want, we need to create a SQL query to represent the question we have asked. We will break down our question into separate parts first and combine the separate queries to find the users we want.

Find Users Who Are Not Following You

The JDBC Driver for Twitter exposes a Follower view, which is a list of Twitter accounts that are following your account. There is a Following column which is a Boolean value that can be used to find a list of accounts that you have followed but are not following you back:

SELECT ID, Screen_Name, Following FROM Followers WHERE Following = 'false';

Image 4

Find Tweets About Specific Content

In the Tweets table, you can get Tweets as records. Use a pseudo-column called SearchTerms to get Tweets that contain a specific string. Since all Tweets are searched, it is important to extract only Tweets containing the target character string (in order to avoid hitting the Twitter API rate limits).

SELECT From_User_ID, From_User_Name, Text FROM Tweets WHERE SearchTerms = 'JDBC'

Image 5

If your query still exceeds the rate limit, add LIMIT 10000 to the SQL statement to reduce the number of results.

Combine the SQL Queries

Once we have parsed the separate parts of the question, we can combine them into a single query. Below, we use a sub-select query to find all of the Tweets from our followers that contain the string "JDBC." From the results, we select those accounts that are not following our account.

SELECT * FROM 
(
  SELECT 
    Tweets.From_User_ID, 
    Tweets.From_User_Name, 
    Tweets.From_User_Screen_Name, 
    Followers.Following, 
    Tweets.Text
  FROM 
    Tweets 
  LEFT OUTER JOIN 
    Followers 
  ON 
    Tweets.From_User_ID = Followers.ID
  WHERE Tweets.SearchTerms = 'JDBC'
) 
WHERE Following = false;

Image 6

Additional Restrictions

As mentioned above, SearchTerms in the Tweets table is useful because it allows you to extract only a specific character string from many Tweets. If you put a LIKE condition in the Text column (the main text of a given Tweet), the API will query the entire timeline, hitting the API rate limit immediately. The same is true for the Hashtags column. As a workaround, write a LIMIT in the query and further filter by the Text and Hashtags columns.

A limitation of the SearchTerms pseudo-column is that you can only get the most recent Tweets. This is a limitation of the Twitter API. By default, Twitter supports returning only Tweets from searches of up to 7 days. However, customers with premium accounts can retrieve older Tweets by running a 30-day or full-archive search.

To help limit your results, search for content that is relevant for a short time period, such as event-specific hashtags or phrases. Otherwise, set up a program to query Twitter every 7 days to get complete results.

NOTE: SearchTerms cannot be queried in combination with columns such as User_Id. To work around this restriction, use an OUTER JOIN to combine the result sets.

Further Querying

If you want to see distinct results based on follower ID, you can use the following query:

SELECT * FROM
(
  SELECT 
    Tweets.From_User_ID, 
    Tweets.From_User_Name, 
    Tweets.From_User_Screen_Name, 
    Followers.Following 
  FROM 
    Tweets 
  LEFT OUTER JOIN 
    Followers 
  ON 
    Tweets.From_User_ID = Followers.ID
  WHERE 
    Tweets.SearchTerms = 'JDBC'
  GROUP BY 
    Tweets.From_User_ID
) 
WHERE Following = false ;

Summary

With the CData JDBC Driver for Twitter, you can use SQL to work with Twitter data. Download a free, 30-day trial and start working with your Twitter data today. For more on using SQL to access 150+ other SaaS, Big Data, and NoSQL data sources, explore our JDBC Drivers.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Technical Writer CData Software
United States United States
I'm an educator-turned-technology evangelist, with a short stint as a software developer. In all of the work I've done, data has been critical, and as businesses, industries, and services grow, I can't help but notice the growth in the breadth and depth of data usage. A common interface to data frees enterprises from the burden of connecting to their data and frees them to focus on their own business. By leveraging CData drivers to access common SQL interfaces to more than 100 SaaS, Big Data, and NoSQL sources, developers can build solid, data-driven products and analysts and data scientists can quickly and easily build insights that drive business.

While giving presentations, writing articles, engaging in webinars, and producing tutorial videos I get the opportunity to see first-hand the difference that standard connectivity makes, with regards to both the underlying data sources and the tools and apps consuming the data. Talk to me about partnering with CData to connect to your own organization's data, embedding connectivity into your data-driven solutions or building custom connectors for a new data source.

Comments and Discussions

 
QuestionWhat if I work with a LinkedIn ODBC driver? Pin
stc.reseller7-Nov-19 1:14
stc.reseller7-Nov-19 1:14 
Hi!
I'm interested in using this kinda ODBC on LinkedIn data, for a marketing research.
Please let me know if I'm right with the following prerequisites:

Driver: The driver JAR file (typically found in C:\Program Files\CData\CData JDBC Driver for LinkedIn\lib)
Driver Class: cdata.jdbc.LinkedIn.LinkedInDriver
Name: Any (we use CDataLinkedIn)
Thanks in advance!
AnswerRe: What if I work with a LinkedIn ODBC driver? Pin
Jerod Johnson12-Feb-20 5:37
sponsorJerod Johnson12-Feb-20 5:37 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.