Collect Business Intelligence from Twitter
Introduction
In this application scenario we are illustrating how to collect Business Intelligence from Twitter Feed and giving examples how to process and evaluate received data using simple SQL commands. We collect busniness intelligence from Twitter feed by using the Twitter In service available the Boomerang Notification Framework. The Twitter In service supports two different Twitter API methods; track and follow.
When tracking you filter for a keyword or keywords in the Twitter stream. In follow you specify one or more Twitter accounts to follow i.e. to receive status updates. The two methods may be used at the same time. Tweets resulting from tracking and following will be stored in the IN_TWIT and IN_TWIT_HEADER tables.
Track/Filter
Phrases of keywords are specified by a comma-separated list. Comma separated keywords and comma separated phrases are treated as logical ORs, quoted phrases are treated as logical ANDs. Words within phrases are delimited by spaces. A tweet matches if any phrase matches. A phrase matches if all of the words are present in the tweet. (e.g. ‘the twitter’ is the AND twitter, and ‘the, twitter’ is the OR twitter.). Terms are exact-matched ignoring punctuation. Each comma-separated term may be up to 60 characters long. The current limit imposed by Twitter is 400 keywords.
To track/filter on specific keywords open the Boomerang administration console and select the Twitter node. Under Status filter enter one or more keywords. In the example below all Twitts with the word Mexico, Sweden or USA will be captured.
To capture additional information from Twitter such as user location or user name check the checkbox Save incoming status headers.
Exercise caution in selecting filter keywords. Filtering on commonly used keywords may require high bandwidth and increased database storage space.
Changes made to the Status filter may take up to 30 seconds before starting to work. To read more about the track/filter method please refer to the Twitter documentation found here.
Follow
The follow method returns the public statuses for the given users. Users are specified by a comma separated list of the Twitter user Id’s (integer). To find out the user id assigned to a particular user name click here.
In the below example status updates for two users will be received by Boomerang and stored in the IN_TWIT and IN_TWIT_HEADER tables.
Examples
Track/filter for the word ’USA’ or ‘Canada’ and follow user id: 186574980 and 186566175.
Track/filter for the word ’USA’ or ‘Canada’
Track/filter for the word ‘USA’ and ‘Canada’ and follow user id: 186574980 and 186566175.
Post Processing
As with all Boomerang services, post processing is easy with the use of the available event handlers. In the case of the Twitter in services the stored procedure in question is sp_On_Twit_In. This store procedure will fire after each Twitter status update have been received. For example to discard Twitter messages which does not include a hyperlink, the event handler store procedure could be altered like this:
ALTER procedure [dbo].[sp_On_Twit_In] ( @rKey uniqueidentifier ) as begin --- Collect Business Intelligence from Twitter Feed ----------------- --- Delete incoming twitter status updtes that does not include a URL select null from IN_TWIT where Text like '%http://%' and rKey=@rKey if @@ROWCOUNT = 0 begin delete from IN_TWIT where rKey=@rKey end end GO |
The above code sample will remove any Twitter status update from IN_TWITT and IN_TWITT_HEADER tables as they are being captured.
Evaluating Twitter Updates
Now that only Twitter status updates exist that contain a URL let us further processes these records by parsing the actual twitter messages. To get a top 10 list of the most twittered URL’s over the last 24 hours would look like this:
--- Collect Business Intelligence from Twitter Feed ----------------- --- Parse and aggregate top 10 URL last 24 hours select top 10 TwitterURL , COUNT(*) NumberOfTwitts into #_tmp_top_10_tweets from ( select rtrim(case when EndPos = 0 then SUBSTRING(TwitterURL, 1, 140) else substring(TwitterURL,1, EndPos) end) as TwitterURL from ( select PATINDEX('% %', substring(Text, StartPos, 140)) EndPos, StartPos, substring(Text, StartPos, 140) as TwitterURL from ( select PATINDEX('%http://%', Text) as StartPos, Text from dbo.IN_TWIT it where Created_On > GETDATE()-1 ) s1 ) s2 ) s3 where LEN(TwitterURL) > 15 group by TwitterURL order by 2 desc |
Result:
TwitterURL NumberOfTwitts -------------------------------------------------- -------------- http://t.co/6U6B4UtN 22 http://t.co/cGykr8xa 13 http://t.co/Hucbg1LC 13 http://t.co/aSWBTOQV 10 http://t.co/AorOUOca 9 http://t.co/Crly4SRv 9 http://t.co/c2R2F11t 7 http://t.co/Gxvqi0tA 6 http://t.co/1aU7otWQ 5 http://t.co/9TpSDwxt 5 (10 row(s) affected) |
Twitter Profile Data
All Twitter profile data available through the API is stored in IN_TWITT_HEADER. The information is organized by category (tag) and value.
select Name from IN_TWIT_HEADER group by Name |
Result:
Name --------------------------------------------------------------- created_at geo geo:coordinates place user:created_at user:description user:lang user:location user:name user:profile_image_url user:screen_name user:time_zone user:url user:verified (14 row(s) affected) |
In order to only include Twitter status messages where the Twitter users language setting is set to English add the following:
--- Collect Business Intelligence from Twitter Feed ----------------- --- Parse and aggregate top 10 URL last 24 hours select top 10 TwitterURL , COUNT(*) NumberOfTwitts into #_tmp_top_10_tweets from ( select rtrim(case when EndPos = 0 then SUBSTRING(TwitterURL, 1, 140) else substring(TwitterURL,1, EndPos) end) as TwitterURL from ( select PATINDEX('% %', substring(Text, StartPos, 140)) EndPos, StartPos, substring(Text, StartPos, 140) as TwitterURL from ( select PATINDEX('%http://%', Text) as StartPos, Text from dbo.IN_TWIT it inner join dbo.IN_TWIT_HEADER ith on it.rKey=ith.rKey and ith.Name='user:lang' where Created_On > GETDATE()-1 and ith.Value='en' ) s1 ) s2 ) s3 where LEN(TwitterURL) > 15 group by TwitterURL order by 2 desc |
Manipulate Track/Follow values
To view or change the Twitter Track/Follow settings dynamically rather from the Boomerang administration console the SYSTEM_VARS table may be manipulated.
select CONVERT(Nvarchar (50),Value) from SYSTEM_VARS where [Key]='twit_status_filter' |
Result:
-------------------------------------------------- track=sql (1 row(s) affected) |
Conclusion
Boomerang takes away the chore of writing complex code to interface with notification infrastructure like twitter, e-mail, printers, file and fax servers. By means of the Boomerang notification framework IT Professionals can create robust and dynamic notification solution. As demonstrated above we collect business intelligence from Twitter Feed and automatically parsed and filtered incoming tweets with a minimum amount of T-SQL code.
References
- Learn more how to post Twitter status updates with Boomerang
- Configuring Twitter In and Out
- View current Top 10 List based on above sample