Ph: 36552955

OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Visualising Twitter User Timeline Activity in R

leave a comment »

I’ve largely avoided “time” in R to date, but following a chat with @mhawksey at #dev8d yesterday, I went down a rathole last night exploring a few ways of visualising a Twitter user timeline and as a result also had a quick initial play with some time handling features of R, such as timeseries objects, and generating daily, weekly and monthly summary counts of data value.

To start, let’s grab a user timeline. As Martin started it (?!), we’ll use his…;-)

require(twitteR)

#the most tweets we can bring back from a user timeline is the most recent 3600...
mht=userTimeline('mhawksey',n=3600)
tw.df=twListToDF(mht)

#As I've done in previous scripts, pull out the names of folk who have been "old-fashioned RTd"...
require(stringr)
trim <- function (x) sub('@','',x)

tw.df$rt=sapply(tw.df$text,function(tweet) trim(str_match(tweet,"^RT (@[[:alnum:]_]*)")[2]))
tw.df$rtt=sapply(tw.df$rt,function(rt) if (is.na(rt)) 'T' else 'RT')

The returned data includes a created attribute (of the form “2012-02-17 11:40:25″) and a replyToSN attribute that includes the username of a user Martin was replying to via a particular tweet.

The simplest way I can think of displaying the data is to just display the screenName atrribute of the sender (which in this case is always mhawskey) against time:

require(ggplot2)
ggplot(tw.df)+geom_point(aes(x=created,y=screenName))

As ever, things are never that simple… some tweets with old dates appear to have crept in somehow… A couple of things I tried realting to time based filtering caused R to have all sorts of malloc errors, so here’s a fudge I found to just display tweets that were created within the last 8,000 hours…

tw.dfs=subset(tw.df,subset=((Sys.time()-created)<8000))
ggplot(tw.dfs)+geom_point(aes(x=created,y=screenName))

Okay, so not very interesting… It shows that Martin tweets…

Picking up on views of the style doodled in Visualising Activity Around a Twitter Hashtag or Search Term Using R, where we look at when new users appear in a hashtag stream, we can plot when Martin replies to another twitter user, arranging the user names in the order in which they were first publicly replied to:

require(plyr)
#Order the replyToSN factor levels in the order in which they were first created
tw.dfx=ddply(tw.dfs, .var = "replyToSN", .fun = function(x) {return(subset(x, created %in% min(created),select=c(replyToSN,created)))})
tw.dfxa=arrange(tw.dfx,-desc(created))
tw.dfs$replyToSN=factor(tw.dfs$replyToSN, levels = tw.dfxa$replyToSN)

#and plot the result
ggplot(tw.dfs)+geom_point(aes(x=created,y=replyToSN))

The line at the top are tweets where the replyToSN value was NA (not available).

We can then go a little further and plot when folk are replied to or retweeted, as well as tweets that are neither a reply nor an old-style retweet:

ggplot()+geom_point(data=subset(tw.dfs,subset=(!is.na(replyToSN))),aes(x=created,y=replyToSN),col='red') + geom_point(data=subset(tw.dfs,subset=(!is.na(rt))),aes(x=created,y=rt),col='blue') + geom_point(data=subset(tw.dfs,subset=(is.na(replyToSN) & is.na(rt))),aes(x=created,y=screenName),col='green')

Here, the blue dots are old-style retweets, the red dots are replies, and the green dots are tweets that are neither replies nor old-style retweets. If a blue dot appears on a row before a red dot, it shows Martin RT’d them before ever replying to them. If blue dots are on a row that contains no red dot, then it shows Martin has RT’d but not replied to that person. A heavily populated row shows Martin has repeated interactions with that user.

We can generate an ordered bar chart showing who is most heavily replied to:

#First we need to count how many replies a user gets...
#http://stackoverflow.com/a/3255448/454773
r_table <- table(tw.dfs$replyToSN)
#..rank them...
r_levels <- names(r_table)[order(-r_table)]
#..and use this ordering to order the factor levels...
tw.dfs$replyToSN <- factor(tw.dfs$replyToSN, levels = r_levels) 

#Then we can plot the chart...
ggplot(subset(tw.dfs,subset=(!is.na(replyToSN))),aes(x=replyToSN)) + geom_bar(aes(y = (..count..)))+opts(axis.text.x=theme_text(angle=-90,size=6))

(Hmmm… how would I filter this to only show folk replied to more than 50 times, for example?)

Sometimes, a text view is easier…

head(table(tw.dfs$replyToSN))
#eg returns:
#psychemedia        wilm     ambrouk    sheilmcn  dajbelshaw  manmalik 
        394          66          59          53          48        43     
#Hmm..can we generalise this?
topTastic=function(dfc,num=5){
  r_table <- table(dfc)
  r_levels <- names(r_table)[order(-r_table)]
  head(table(factor(dfc, levels = r_levels)),num)
}
#so now, for example, I should be able to display the most old-style retweeted folk?
topTastic(tw.dfs$rt)
#or the 10 most replied to...
topTastic(tw.dfs$replyToSN,10)

Let’s try some time stuff now… From the R Cookbook, I find I can do this:

#label a tweet with the month number
tw.dfs$month=sapply(tw.dfs$created, function(x) {p=as.POSIXlt(x);p$mon})
#label a tweet with the hour
tw.dfs$hour=sapply(tw.dfs$created, function(x) {p=as.POSIXlt(x);p$hour})
#label a tweet with a number corresponding to the day of the week
tw.dfs$wday=sapply(tw.dfs$created, function(x) {p=as.POSIXlt(x);p$wday})

What this means is we can now chart a count of the number of tweets by day, week, or hour… For example, here’s hour vs. day of the week:

ggplot(tw.dfs)+geom_jitter(aes(x=wday,y=hour))

Note that this jittered scattergraph, where each dot is a tweet, only approximates the time each tweet occurred – the jitter applied is a random quantity designed to separate out tweets posted within the same hour-and-day-of-the-week bin.

What about Martin’s tweeting behaviour over time?

#We can also generate barplots showing the distribution of tweet count over time:
ggplot(tw.dfs,aes(x=created))+geom_bar(aes(y = (..count..)))
#Hmm... I'm not sure how to manually set binwidth= sensibly, though?!

Here’s a plot of the number of counts per… I’m not sure: the bin width was calculated automatically…

How about using the number of tweets in a particular day or hour bin to see what times of day or days of week Martin is tweeting?

#We can also plot the number of tweets within particular hour or time bins...
ggplot(tw.dfs,aes(x=wday))+geom_bar(aes(y = (..count..)),binwidth=1)
ggplot(tw.dfs,aes(x=hour))+geom_bar(aes(y = (..count..)),binwidth=1)

This chart shows activity (in terms of count…) per hour of day.

As well as doing the count of tweets per hour, for example, via a ggplot statistical graphical function, we can also get day, week, month, quarter and year counts from a set of functions associated with a particular sort of timeseries object…

Each element in a time series typically has two elements – a timestamp, and a numeric value. We can generate a time series of a sort around a twitter usertimeline by creating a dummy quantity – such as the unit value, 1 – and associate it with each timestamp:

require(xts)
#The xts function creates a timeline from a vector of values and a vector of timestamps.
#If we know how many tweets we have, we can just create a simple list or vector containing that number of 1s
ts=xts(rep(1,times=nrow(tw.dfs)),tw.dfs$created)

#We can now do some handy number crunching on the timeseries, such as applying a formula to values contained with day, week, month, quarter or year time bins.
#So for example, if we sum the unit values in daily bin, we can get a count of the number of tweets per day
ts.sum=apply.daily(ts,sum) 
#also apply. weekly, monthly, quarterly, yearly

#If for any resason we need to turn the timeseries into a dataframe, we can:
#http://stackoverflow.com/a/3387259/454773
ts.sum.df=data.frame(date=index(ts.sum), coredata(ts.sum))

colnames(ts.sum.df)=c('date','sum')

#We can then use ggplot to plot the timeseries...
ggplot(ts.sum.df)+geom_line(aes(x=date,y=sum))

#Having got the data in a timeseries form, we can do timeseries based things to it... such as checking the autocorrelation:
acf(ts.sum)

Hmmm.. so, one day is much the same as another, but there also appears to be a weekly (7 day periodicity) pattern…

Finally, here’s a handy script I found on the Revolution Analytics site for Charting time series as calendar heat maps in R:

##############################################################################
 #                        Calendar Heatmap                                    #
 #                                by                                          #
 #                         Paul Bleicher                                      #
 # an R version of a graphic from:                                            #
 # http://stat-computing.org/dataexpo/2009/posters/wicklin-allison.pdf        #
 #  requires lattice, chron, grid packages                                    #
 ############################################################################## 

## calendarHeat: An R function to display time-series data as a calendar heatmap 
## Copyright 2009 Humedica. All rights reserved.

## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 2 of the License, or
## (at your option) any later version.

## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.

## You can find a copy of the GNU General Public License, Version 2 at:
## http://www.gnu.org/licenses/gpl-2.0.html

calendarHeat <- function(dates, 
                         values, 
                         ncolors=99, 
                         color="r2g", 
                         varname="Values",
                         date.form = "%Y-%m-%d", ...) {
require(lattice)
require(grid)
require(chron)
if (class(dates) == "character" | class(dates) == "factor" ) {
  dates <- strptime(dates, date.form)
        }
caldat <- data.frame(value = values, dates = dates)
min.date <- as.Date(paste(format(min(dates), "%Y"),
                    "-1-1",sep = ""))
max.date <- as.Date(paste(format(max(dates), "%Y"),
                     "-12-31", sep = ""))
dates.f <- data.frame(date.seq = seq(min.date, max.date, by="days"))

# Merge moves data by one day, avoid
caldat <- data.frame(date.seq = seq(min.date, max.date, by="days"), value = NA)
dates <- as.Date(dates) 
caldat$value[match(dates, caldat$date.seq)] <- values

caldat$dotw <- as.numeric(format(caldat$date.seq, "%w"))
caldat$woty <- as.numeric(format(caldat$date.seq, "%U")) + 1
caldat$yr <- as.factor(format(caldat$date.seq, "%Y"))
caldat$month <- as.numeric(format(caldat$date.seq, "%m"))
yrs <- as.character(unique(caldat$yr))
d.loc <- as.numeric()                        
for (m in min(yrs):max(yrs)) {
  d.subset <- which(caldat$yr == m)  
  sub.seq <- seq(1,length(d.subset))
  d.loc <- c(d.loc, sub.seq)
  }  
caldat <- cbind(caldat, seq=d.loc)

#color styles
r2b <- c("#0571B0", "#92C5DE", "#F7F7F7", "#F4A582", "#CA0020") #red to blue                                                                               
r2g <- c("#D61818", "#FFAE63", "#FFFFBD", "#B5E384")   #red to green
w2b <- c("#045A8D", "#2B8CBE", "#74A9CF", "#BDC9E1", "#F1EEF6")   #white to blue
            
assign("col.sty", get(color))
calendar.pal <- colorRampPalette((col.sty), space = "Lab")
def.theme <- lattice.getOption("default.theme")
cal.theme <-
   function() {  
  theme <-
  list(
    strip.background = list(col = "transparent"),
    strip.border = list(col = "transparent"),
    axis.line = list(col="transparent"),
    par.strip.text=list(cex=0.8))
    }
lattice.options(default.theme = cal.theme)
yrs <- (unique(caldat$yr))
nyr <- length(yrs)
print(cal.plot <- levelplot(value~woty*dotw | yr, data=caldat,
   as.table=TRUE,
   aspect=.12,
 layout = c(1, nyr%%7),
   between = list(x=0, y=c(1,1)),
   strip=TRUE,
   main = paste("Calendar Heat Map of ", varname, sep = ""),
   scales = list(
     x = list(
               at= c(seq(2.9, 52, by=4.42)),
               labels = month.abb,
               alternating = c(1, rep(0, (nyr-1))),
               tck=0,
               cex = 0.7),
     y=list(
          at = c(0, 1, 2, 3, 4, 5, 6),
          labels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday",
                      "Friday", "Saturday"),
          alternating = 1,
          cex = 0.6,
          tck=0)),
   xlim =c(0.4, 54.6),
   ylim=c(6.6,-0.6),
   cuts= ncolors - 1,
   col.regions = (calendar.pal(ncolors)),
   xlab="" ,
   ylab="",
   colorkey= list(col = calendar.pal(ncolors), width = 0.6, height = 0.5),
   subscripts=TRUE
    ) )
panel.locs <- trellis.currentLayout()
for (row in 1:nrow(panel.locs)) {
    for (column in 1:ncol(panel.locs))  {
    if (panel.locs[row, column] > 0)
{
    trellis.focus("panel", row = row, column = column,
                  highlight = FALSE)
xyetc <- trellis.panelArgs()
subs <- caldat[xyetc$subscripts,]
dates.fsubs <- caldat[caldat$yr == unique(subs$yr),]
y.start <- dates.fsubs$dotw[1]
y.end   <- dates.fsubs$dotw[nrow(dates.fsubs)]
dates.len <- nrow(dates.fsubs)
adj.start <- dates.fsubs$woty[1]

for (k in 0:6) {
 if (k < y.start) {
    x.start <- adj.start + 0.5
    } else {
    x.start <- adj.start - 0.5
      }
  if (k > y.end) {
     x.finis <- dates.fsubs$woty[nrow(dates.fsubs)] - 0.5
    } else {
     x.finis <- dates.fsubs$woty[nrow(dates.fsubs)] + 0.5
      }
    grid.lines(x = c(x.start, x.finis), y = c(k -0.5, k - 0.5), 
     default.units = "native", gp=gpar(col = "grey", lwd = 1))
     }
if (adj.start <  2) {
 grid.lines(x = c( 0.5,  0.5), y = c(6.5, y.start-0.5), 
      default.units = "native", gp=gpar(col = "grey", lwd = 1))
 grid.lines(x = c(1.5, 1.5), y = c(6.5, -0.5), default.units = "native",
      gp=gpar(col = "grey", lwd = 1))
 grid.lines(x = c(x.finis, x.finis), 
      y = c(dates.fsubs$dotw[dates.len] -0.5, -0.5), default.units = "native",
      gp=gpar(col = "grey", lwd = 1))
 if (dates.fsubs$dotw[dates.len] != 6) {
 grid.lines(x = c(x.finis + 1, x.finis + 1), 
      y = c(dates.fsubs$dotw[dates.len] -0.5, -0.5), default.units = "native",
      gp=gpar(col = "grey", lwd = 1))
      }
 grid.lines(x = c(x.finis, x.finis), 
      y = c(dates.fsubs$dotw[dates.len] -0.5, -0.5), default.units = "native",
      gp=gpar(col = "grey", lwd = 1))
      }
for (n in 1:51) {
  grid.lines(x = c(n + 1.5, n + 1.5), 
    y = c(-0.5, 6.5), default.units = "native", gp=gpar(col = "grey", lwd = 1))
        }
x.start <- adj.start - 0.5

if (y.start > 0) {
  grid.lines(x = c(x.start, x.start + 1),
    y = c(y.start - 0.5, y.start -  0.5), default.units = "native",
    gp=gpar(col = "black", lwd = 1.75))
  grid.lines(x = c(x.start + 1, x.start + 1),
    y = c(y.start - 0.5 , -0.5), default.units = "native",
    gp=gpar(col = "black", lwd = 1.75))
  grid.lines(x = c(x.start, x.start),
    y = c(y.start - 0.5, 6.5), default.units = "native",
    gp=gpar(col = "black", lwd = 1.75))
 if (y.end < 6  ) {
  grid.lines(x = c(x.start + 1, x.finis + 1),
   y = c(-0.5, -0.5), default.units = "native",
   gp=gpar(col = "black", lwd = 1.75))
  grid.lines(x = c(x.start, x.finis),
   y = c(6.5, 6.5), default.units = "native",
   gp=gpar(col = "black", lwd = 1.75))
   } else {
      grid.lines(x = c(x.start + 1, x.finis),
       y = c(-0.5, -0.5), default.units = "native",
       gp=gpar(col = "black", lwd = 1.75))
      grid.lines(x = c(x.start, x.finis),
       y = c(6.5, 6.5), default.units = "native",
       gp=gpar(col = "black", lwd = 1.75))
       }
       } else {
           grid.lines(x = c(x.start, x.start),
            y = c( - 0.5, 6.5), default.units = "native",
            gp=gpar(col = "black", lwd = 1.75))
           }

 if (y.start == 0 ) {
  if (y.end < 6  ) {
  grid.lines(x = c(x.start, x.finis + 1),
   y = c(-0.5, -0.5), default.units = "native",
   gp=gpar(col = "black", lwd = 1.75))
  grid.lines(x = c(x.start, x.finis),
   y = c(6.5, 6.5), default.units = "native",
   gp=gpar(col = "black", lwd = 1.75))
   } else {
      grid.lines(x = c(x.start + 1, x.finis),
       y = c(-0.5, -0.5), default.units = "native",
       gp=gpar(col = "black", lwd = 1.75))
      grid.lines(x = c(x.start, x.finis),
       y = c(6.5, 6.5), default.units = "native",
       gp=gpar(col = "black", lwd = 1.75))
       }
       }
for (j in 1:12)  {
   last.month <- max(dates.fsubs$seq[dates.fsubs$month == j])
   x.last.m <- dates.fsubs$woty[last.month] + 0.5
   y.last.m <- dates.fsubs$dotw[last.month] + 0.5
   grid.lines(x = c(x.last.m, x.last.m), y = c(-0.5, y.last.m),
     default.units = "native", gp=gpar(col = "black", lwd = 1.75))
   if ((y.last.m) < 6) {
      grid.lines(x = c(x.last.m, x.last.m - 1), y = c(y.last.m, y.last.m),
       default.units = "native", gp=gpar(col = "black", lwd = 1.75))
     grid.lines(x = c(x.last.m - 1, x.last.m - 1), y = c(y.last.m, 6.5),
       default.units = "native", gp=gpar(col = "black", lwd = 1.75))
   } else {
      grid.lines(x = c(x.last.m, x.last.m), y = c(- 0.5, 6.5),
       default.units = "native", gp=gpar(col = "black", lwd = 1.75))
    }
 }
 }
 }
trellis.unfocus()
} 
lattice.options(default.theme = def.theme)
}

If we pass the dataframed time series data counting the sum (count) of tweets per day, we can get a calendar heatmap view of Martin’s twitter activity:

calendarHeat(ts.sum.df$date, ts.sum.df$sum, varname="@mhawksey Twitter activity")

I’m not sure if this is even interesting, let alone useful, but I do think now I’ve found out a little bit about working with time in R, that could be handy…

Still to do: extract hashtags and visualise them; extend the twitteR library so it exposes things like retweet counts. But that’s for another day…

Written by Tony Hirst

February 17, 2012 at 3:26 pm

Posted in Rstats

Generating Twitter Wordclouds in R (Prompted by an Open Learning Blogpost)

with 2 comments

A couple of weeks ago I saw a great example of an open learning blogpost from @katy_bird: Generating a word cloud (or not) from a Twitter hashtag. It described the trials and tribulations associated with trying to satisfy a request for the generation of a wordcloud based on tweets associated with a specific Twitter hashtag. A seemingly simple task, you might think, but things are never that easy… If you read the post, you’ll see Katy identified several problems, or stumbling blocks, along the way, as well as how she addressed them. There’s also a bit of reflection on the process as a whole.

Reading the post the first time (and again, just now), completely set me up for the day. It had a little bit of everyhting: a goal statement, the identification of a set of problems associated with trying to complete the task, some commentary on how the problems were tackled, and some reflection on the process as a whole. The post thus serves the purpose of capturing a problem discovery process, as well as the steps taken to try and solve each problem (although full documentation is lacking… This is something I have learned over the years: to use something like a gist on github to actually keep a copy of any code I generated to solve the problem, linked to for reuse by myself and others from the associated blog post). The post captures a glimpse back at a moment in time – when Katy didn’t know how to generate a wordcloud – from the joyful moment at which she has just learned how to generate said wordcloud. More importantly, the post describes the learning problems that became evident whilst trying to achieve the goal in such a way that they can act as hooks on which others can hang alternative or additional ways of solving the problem, or act as mentor.

By identifying the learning journey and problems discovered along the way, Katy’s record of her learning strategy also provides an authentic, learner centric perspective on what’s involved in trying to create a wordcloud around a twitter hashtag.

Reading the post again has also prompted me to blog this recipe, largely copied from the RDataMining post Using Text Mining to Find Out What @RDataMining Tweets are About, for generating a word cloud around a twitter hashtag using R (I use RStudio; the recipe requires at least the twitteR and tm libraries):

require(twitteR)
searchTerm='#dev8d'
#Grab the tweets
rdmTweets <- searchTwitter(searchTerm, n=500)
#Use a handy helper function to put the tweets into a dataframe
tw.df=twListToDF(rdmTweets)

##Note: there are some handy, basic Twitter related functions here:
##https://github.com/matteoredaelli/twitter-r-utils
#For example:
RemoveAtPeople <- function(tweet) {
  gsub("@\\w+", "", tweet)
}
#Then for example, remove @'d names
tweets <- as.vector(sapply(tw.df$text, RemoveAtPeople))

##Wordcloud - scripts available from various sources; I used:
#http://rdatamining.wordpress.com/2011/11/09/using-text-mining-to-find-out-what-rdatamining-tweets-are-about/
#Call with eg: tw.c=generateCorpus(tw.df$text)
generateCorpus= function(df,my.stopwords=c()){
  #Install the textmining library
  require(tm)
  #The following is cribbed and seems to do what it says on the can
  tw.corpus= Corpus(VectorSource(df))
  # remove punctuation
  tw.corpus = tm_map(tw.corpus, removePunctuation)
  #normalise case
  tw.corpus = tm_map(tw.corpus, tolower)
  # remove stopwords
  tw.corpus = tm_map(tw.corpus, removeWords, stopwords('english'))
  tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords)

  tw.corpus
}

wordcloud.generate=function(corpus,min.freq=3){
  require(wordcloud)
  doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1))
  dm = as.matrix(doc.m)
  # calculate the frequency of words
  v = sort(rowSums(dm), decreasing=TRUE)
  d = data.frame(word=names(v), freq=v)
  #Generate the wordcloud
  wc=wordcloud(d$word, d$freq, min.freq=min.freq)
  wc
}

print(wordcloud.generate(generateCorpus(tweets,'dev8d'),7))

##Generate an image file of the wordcloud
png('test.png', width=600,height=600)
wordcloud.generate(generateCorpus(tweets,'dev8d'),7)
dev.off()

#We could make it even easier if we hide away the tweet grabbing code. eg:
tweets.grabber=function(searchTerm,num=500){
  require(twitteR)
  rdmTweets = searchTwitter(searchTerm, n=num)
  tw.df=twListToDF(rdmTweets)
  as.vector(sapply(tw.df$text, RemoveAtPeople))
}
#Then we could do something like:
tweets=tweets.grabber('ukgc12')
wordcloud.generate(generateCorpus(tweets),3)

Here’s the result:

PS for an earlier, was broken, now patched, route to sketching a wordcloud from a twitter search using Wordle, see How To Create Wordcloud from a Twitter Hashtag Search Feed in a Few Easy Steps.

Written by Tony Hirst

February 15, 2012 at 9:40 pm

Posted in Rstats

Tagged with Twitter, wordcloud

More Thoughts on Potential Audience Metrics for Hashtag Communities

with one comment

Following on from the sketched ideas relating to estimating the Potential Audience Size for a Hashtag Community?, here are a few quick doodles around the graph representation of the tag users – followers graph that explore the extent to which we can use quite simple counts and analyses to get a feel for how the followers of a set of hashtag users are distributed and the number of times they are likely to see a hashtagged tweets (I’m mulling over calling this potential view count “receipts”…)

require(igraph)

#Read in the graph: the graphs contain nodes representing Twitter users connected by directed weighted edges that represent 'is followed by' relations. The weights correspond to the number of hashtagged messages published by the from-node over the sample period 
g2=read.graph('/Users/ajh59/code/twapps/newt/reports/tmp/ddj_ncount.graphml',format='graphml')

summary(g2)
#The summary provides an overview of the graph, The number of nodes corresponds to the number of folk in the union of the set of hashtaggers and their followers, for example.

#We can count how many nodes have a particular in-degree count (where' in-degree represents the number of hashtaggers the node follows)
g.nodes=as.data.frame(table(degree(g2,mode='in')))
g.nodes$Var1=as.numeric(levels(g.nodes$Var1)[as.integer(g.nodes$Var1)])

#Check: if we sum the node occurrence frequencies, we should get the total number of nodes as a result
sum(g.nodes$Freq)

#We can then chart the result to look at the distribution of how many hashtaggers are followed by how many people
require(ggplot2)
ggplot(g.nodes)+geom_linerange(aes(x=Var1,ymin=0,ymax=Freq)) + scale_y_log10() + xlab('In-degree of followers')

To start with, we can get a view of how the indegree values of the follower nodes are distributed – this gives us an idea of how many of the hashtag users members of the follower set actually follow.

For a tight knit, coherent community, where tag users know each other, we might expect that folk who are likely to be interested in the tag are following several over the tag users.

Note the use of a log10 scale for the count… Most followers are following one tag user (most likely a single user of the tag with a large follower count). Folk following none of the tag users are likely to be tag users who don’t follow any of the other tag users captured during the sample period (erm, maybe? They could also be tag users with private settings, so their friend/follower lists aren’t public…)

Here’s the code for a second sketch…

#The incoming edges to follower nodes are weighted according to the number of tagged tweets the corresponding hashtagger published in the sample period.
#What this means is that we can count the total number of tagged tweets seen by each follower by summing the weights of edges incident on each node
g.weights=as.data.frame(table(graph.strength(g2,mode='in')))
g.weights$Var1=as.numeric(levels(g.weights$Var1)[as.integer(g.weights$Var1)])
#If we sum the product of message counts and frequencies, we see how many potential "receipts" of a tagged tweet there were.
sum(g.weights$Var1*g.weights$Freq)

#We can also plot the distribution of the number of tagged tweets potentially received by each follower
ggplot(g.weights)+geom_linerange(aes(x=Var1,ymin=0,ymax=Freq)) + scale_y_log10() + xlab('Incoming tagged message count')

This time, we get to see the distribution of the number of receipts of a tagged message across the follower set, where a receipt represents a publication of a tagged tweet in the sample period from any one of the tag users followed by an individual. Because the graph uses edges weighted according to the number of tagged tweets published by a user, we can easily calculate the number of tagged tweets potentially seen by a user by summing the weights of their incoming edges from tag users.

This chart makes it clear that most folk in the potential hashtag audience only had one potential receipt of a tagged tweet… Which makes me start thinking about ways of considering “conversion” rates based in part on the likelihood of follower to join in a hashtag community given the number of tag users to date they follow and the number of followers each of those tag users has…

Note that range of the incoming message count is greater than the range of the number of tag users followed because some tag users tweet using the tag more than once during the sample period.

Finally, we chart as a histogram the distribution of the number of followers of each tag user, simply because we can easily do so…

#It's also easy enough to chart the distribution of the follower counts for each hashtagger:
tagger.nodes=subset(as.data.frame(table(degree(g2,mode='out'))),subset=(Var1!='0'))
tagger.nodes$Var1=as.numeric(levels(tagger.nodes$Var1)[as.integer(tagger.nodes$Var1)])
#Quick check on the number of taggers
sum(tagger.nodes$Freq)

#And the distribution of how many followers they have
ggplot(tagger.nodes)+geom_histogram(aes(x=Var1,ymin=0,ymax=Freq),binwidth=250)  + xlab('Follower count')

Note the outliers…

For additional charts that can be generated from the graph representation, see: Experimenting With iGraph – and a Hint Towards Ways of Measuring Engagement?

PS Hmmm…pondering this.. focus of a tag user is the number of their followers who originate a tagged tweet in the sample period (RTs don’t count, and maybe neither do replies…) divided by the total number of their follwers…? And maybe salience as the number of tagged tweets published by an individual during the sample period divided by the total number of tweets they published over the same period…?

Written by Tony Hirst

February 11, 2012 at 12:58 am

Posted in Rstats

Tagged with twitter community

When A Comment Spammer’s Script Goes Wrong 3

with 2 comments

Hmmmm… getting another of these so soon after posting When A Comment Spammer’s Script Goes Wrong 2 makes me think I maybe better stop this series of posts!

What’s up, just wanted to say, I loved this post. It was practical. Keep on posting!

Hello, I just wanted to mention, you’re wrong. Your point doesn’t make any sense.

Hello, how’s it going? Just shared this post with a colleague, we had a good laugh.

Incredible points. Sound arguments. Keep up the amazing work.

This information is worth everyone’s attention. Where can I find out more?

Highly descriptive post, I enjoyed that bit. Will there be a part 2?

Incredible quest there. What happened after? Take care!

Can you tell us more about this? I’d care to find out more details.

Great article, totally what I was looking for.

Hi there, I read your blogs regularly. Your humoristic style is witty, keep up the good work!

Hi there to every one, it�s truly a good for me to go to see this web page, it contains important Information.

I am actually delighted to glance at this website posts which contains tons of helpful information, thanks for providing these kinds of data.

This video post is really fantastic, the sound quality and the picture feature of this tape post is really awesome.

Hi there to every body, it�s my first pay a quick visit of this blog; this website carries remarkable and really good material in support of readers.

Wow, that�s what I was searching for, what a data! existing here at this blog, thanks admin of this web site.

What’s up, every time i used to check webpage posts here early in the dawn, because i enjoy to learn more and more.

Hi there to every one, for the reason that I am genuinely eager of reading this webpage�s post to be updated on a regular basis. It consists of fastidious stuff.

I always emailed this weblog post page to all my friends, for the reason that if like to read it after that my contacts will too.

Can you please forward me the code for this script or please enlighten me in detail about this script?

Your way of describing everything in this piece of writing is really pleasant, all be capable of effortlessly understand it, Thanks a lot.

Excellent way of describing, and pleasant post to obtain facts regarding my presentation subject matter, which i am going to deliver in institution of higher education.

Hello, I would like to subscribe for this weblog to get most recent updates, so where can i do it please assist.

Hi there, after reading this remarkable paragraph i am as well glad to share my experience here with friends.

Superb, what a weblog it is! This weblog provides valuable information to us, keep it up.

Hello every one, here every person is sharing such know-how, so it�s fastidious to read this website, and I used to visit this weblog everyday.

What a funny blog! I actually enjoyed watching this comic video with my family as well as including my colleagues.

Sketches are truly nice source of education instead of wording, its my knowledge, what would you say?

What’s up, its pleasant article on the topic of media print, we all be aware of media is a fantastic source of information.

What’s up, this weekend is pleasant in support of me, since this occasion i am reading this impressive educational article here at my residence.

This is my first time visit at here and i am genuinely impressed to read everthing at one place.

Nice respond in return of this difficulty with solid arguments and describing the whole thing concerning that.

I go to see everyday a few sites and information sites to read articles or reviews, but this website presents quality based articles.

Of course high resolution film quality contains much memory, that�s why it presents superior feature.

If you are going away to watch funny videos online then I suggest you to pay a quick visit this web page, it contains genuinely therefore comical not only videos but also other material.

I know this site gives quality based content and extra stuff, is there any other site which presents such things in quality?

Hello mates, its fantastic paragraph about teachingand fully explained, keep it up all the time.

Downloading material from this web site is as trouble-free |as clicking the mouse rather than other web sites which transfer me here and there on the web pages.

What’s up everybody, I am sure you will be enjoying here by watching these funny video clips.

I constantly spent my half an hour to read this web site�s articles every day along with a mug of coffee.

Wow! At last I got a weblog from where I be capable of really take valuable facts concerning my study and knowledge.

–

Hi there, all is going sound here and ofcourse every one is sharing facts, that�s genuinely fine, keep up writing.

If you wish for to increase your know-how just keep visiting this website and be updated with the latest news posted here.

Hi there to all, the contents present at this website are really remarkable for people knowledge, well, keep up the good work fellows.

No one can refuse from the quality of this video posted at this web site, good job, keep it all the time.

No matter if some one searches for his vital thing, therefore he/she needs to be available that in detail, thus that thing is maintained over here.

Since the admin of this site is working, no question very quickly it will be well-known, due to its quality contents.

Quality articles is the important to attract the people to pay a visit the website, that�s what this web site is providing.

Hello everyone, it�s my first go to see at this web site, and post is actually fruitful designed for me, keep up posting these content.

Wow, what a video it is! Really nice feature video, the lesson given in this video is in fact informative.

If you are going for most excellent contents like myself, just pay a quick visit this website daily as it presents feature contents, thanks

It�s remarkable in favor of me to have a web page, which is beneficial in favor of my know-how. thanks admin

I visited multiple sites but the audio quality for audio songs present at this web page is in fact fabulous.

I�m gone to tell my little brother, that he should also go to see this weblog on regular basis to get updated from most up-to-date news.

Thankfulness to my father who informed me regarding this web site, this blog is really remarkable.

Hello Dear, are you really visiting this site daily, if so after that you will without doubt obtain pleasant know-how.

I am John, how are you everybody? This piece of writing posted at this site is actually nice.

Hello it’s me Fiona, I am also visiting this web site regularly, this web site is genuinely fastidious and the viewers are truly sharing pleasant thoughts.

Actually no matter if someone doesn�t understand after that its up to other people that they will help, so here it occurs.

Now I am going to do my breakfast, after having my breakfast coming again to read additional news.

Wow, this paragraph is good, my sister is analyzing these things, so I am going to convey her.

These are in fact impressive ideas in regarding blogging. You have touched some good factors here. Any way keep up wrinting.

Asking questions are actually nice thing if you are not understanding anything completely, except this article gives nice understanding even.

That�s really a pleasant video described in this article about how to write a paragraph, thus i got clear idea from here.

This post presents clear idea for the new users of blogging, that really how to do running a blog.

This article about SEO provides clear thought in favor of new SEO viewers that how to do Search engine optimization, so keep it up. Fastidious job

Thanks in favor of sharing such a nice thinking, article is fastidious, thats why i have read it entirely

If some one wants expert view on the topic of blogging and site-building afterward i advise him/her to go to see this website, Keep up the pleasant work.

Very quickly this web page will be famous among all blogging and site-building viewers, due to it’s nice articles

Every weekend i used to go to see this web site, because i wish for enjoyment, since this this website conations genuinely good funny material too.

It’s not my first time to pay a quick visit this web page, i am visiting this site dailly and get good facts from here every day.

–

Wow, what a quality it is! Because mostly YouTube movies have no fastidious quality, however this is genuinely a fastidious quality video.

Hi there to all, the YouTube film that is posted at here has truly good quality along with good audio quality

Remarkable video, in fact a good quality, this YouTube video touched me a lot in terms of quality.

Its fastidious funny YouTube video, I all the time go to pay a quick visit YouTube site designed for funny videos, since there is much more stuff available.

In support of my learn reasons, I at all times used to download the video lectures from YouTube, as it is easy to fan-out from there.

Hahahaha, what a humorous this YouTube record is! I am still laughing, thanks to admin of this site who had posted at this web site.

These are actually cool YouTube videos, its my good luck to pay a quick visit this web page and finding these cool YouTube video tutorials.

Remarkable YouTube videos posted at this web page, I am going to subscribe for daily updates, as I don�t desire to miss this series.

It�s my first go to see to this web page, and I am really surprised to see such a pleasant quality YouTube video posted here.

When I saw this website having awesome featured YouTube video clips, I decided to watch out these all movies.

Hahahahahahaha, this politics related YouTube video is really so humorous, I loved it. Thanks in support of sharing this.

Hello everyone, I be familiar with YouTube video consists of less bytes of memory due to that its quality is awful, but this YouTube video has wonderful picture quality.

My grand father all the time used to watch YouTube funny video clips, hehehehehe, as he wants to be delighted forever.

YouTube is world’s largest video sharing web site, no one can defeat it. Every one add video lessons at YouTube afterward obtain embed code and post anyplace.

Its extremely good YouTube video in terms of features, in fact good, its quality is in fact appreciable.

This website provides pleasant quality YouTube videos; I always down load the dance contest show movies from this web site.

What a pleasant YouTube video it is! Amazing, I liked it, and I am sharing this YouTube film with all my colleagues.

Sharing some thing is superior than keeping up-to our self, thus the YouTube video that is posted at this place I am going to share by means of my relatives and colleagues.

As the YouTube video lessons are posted here same like I also embed YouTube video code at my own site, because it is easy to get embedded code.

Today YouTube movies quality is more improved and improved, thus that�s the cause that I am watching this video at at this time.

These all YouTube gaming videos are actually in pleasant quality, I watched out all these along by means of my colleagues.

I and my colleagues watch the football game clips at YouTube forever, because they have in good quality.

In my residence when I get bored, after that I simply ON my laptop and open YouTube web page to watch the YouTube video lessons.

What’s up, it is understandable article along with this YouTube video; I can�t imagine that one can not understand this simple post having with video demonstration.

This piece of writing on the topic of how to embed a YouTube video code is really useful in support of new internet access visitors. Nice job, keep it up.

Within YouTube video embed script you can also stipulate parameters according to your hope like width, height or even border colors.

YouTube videos are well-known in entire world, since it is the largest video sharing site, and I turn out to be too happy by watching YouTube movies.

YouTube consists of not just funny and humorous movies but also it carries learning related video clips.

Hi there dear, are you enjoying with this comic YouTube video? Hmmm, that�s fastidious, I am as well watching this YouTube joke video at the moment.

I am happy to see this you tube video at this website, therefore now I am also going to upload all my movies at YouTube website.

Its my good luck to pay a quick visit at this website and find out my required piece of writing along with video presentation, that�s YouTube video and its also in quality.

Oh! Wow its actually a funny and jockey YouTube video posted at this place. thanks for sharing it.

I got so bored today afternoon, however as soon as I watched this YouTube comic clip at this blog I turn out to be fresh and delighted as well.

Hi there my friends, how is everything? Here it is in fact fastidious YouTube video lessons collection. i enjoyed a lot.

My boss is as well eager of YouTube comical videos, he also watch these even in workplace hehehe..

–

Hello, can any body assist me how to down load this video tutorial from this web page, I have watched and listen it at this place but would like to get it.

This post is in fact a fastidious one it assists new internet users, who are wishing for blogging.

When someone writes an post he/she keeps the image of a user in his/her brain that how a user can understand it. So that�s why this post is great. Thanks!

continuously i used to read smaller articles or reviews which also clear their motive, and that is also happening with this paragraph which I am reading here.

Paragraph writing is also a excitement, if you be acquainted with afterward you can write otherwise it is difficult to write.

It�s amazing to pay a quick visit this web site and reading the views of all colleagues on the topic of this post, while I am also keen of getting know-how.

I got this site from my pal who shared with me concerning this web page and at the moment this time I am browsing this website and reading very informative articles at this place.

I don�t waste my free time in watching videos but I be fond of to read articles or reviews on net and get updated from newest technologies.

If some one wants to be updated with latest technologies then he must be pay a visit this web site and be up to date everyday.

I think a visualized presentation can be enhanced then simply a trouble-free text, if things are defined in pictures one can without difficulty be familiar with these.

What a material of un-ambiguity and preserveness of precious know-how about unexpected feelings.

Hello, is it rite to only study from textbooks not to visit internet for most recent updates, what you say friends?

I read this piece of writing fully about the difference of newest and previous technologies, it’s awesome article.

Why YouTube movies are shared everywhere? I think one motive is that these are simple to obtain embed script and paste that code somewhere you want.

This web site is containing a pleasant data of comical YouTube videos, I liked it a lot.

For latest information you have to pay a quick visit world-wide-web and on web I found this website as a best website for most recent updates.

It�s very trouble-free to find out any matter on web as compared to books, as I fount this piece of writing at this web site.

You have to waste less time to search your necessary matter on world-wide-web, since nowadays the searching techniques of search engines are good. That�s why I fount this post at this place.

If you are concerned to learn Search engine optimization techniques then you must read this post, I am sure you will obtain much more from this post concerning Web optimization.

Truly it�s referred to as Search engine marketing that when i search for this piece of writing I found this site at the top of all web sites in search engine.

This piece of writing about Search engine optimisation is truly good one, and the back links are genuinely very helpful to promote your web page, its also referred to as Search engine optimization.

What’s up, for SEO real contents are genuinely necessary, if you simply copy and paste then you can not ranked in search engines.

Ahaa, its nice conversation regarding this piece of writing here at this blog, I have read all that, so now me also commenting here.

Wow, fastidious YouTube video regarding how to establish virtual directory, I fully got it. Thanks keep it up.

My relatives all the time say that I am wasting my time here at web, however I know I am getting experience all the time by reading thes fastidious articles.

Hi there i am kavin, its my first occasion to commenting anyplace, when i read this piece of writing i thought i could also make comment due to this good piece of writing.

It�s really very complicated in this busy life to listen news on TV, therefore I just use web for that purpose, and obtain the latest information.

I am sure this piece of writing has touched all the internet users, its really really good paragraph on building up new web site.

This article will help the internet visitors for setting up new webpage or even a weblog from start to end.

I am truly eager of reading content about creating new blog, or even about Search engine marketing.

–

Remarkable! Its really awesome post, I have got much clear idea concerning from this post.

What a lovely story! The tale in this YouTube video that is posted here is in fact a fastidious one with having pleasant picture quality.

I have read so many articles about the blogger lovers however this piece of writing is in fact a good paragraph, keep it up.

I every time download a full film in parts, that�s always present at YouTube, as my network connection is extremely slow and YouTube fulfils my desires.

I think the admin of this site is in fact working hard for his site, because here every information is quality based data.

Wow! this cartoon type YouTube video I have viewed when I was in primary level and at the present I am in school and watching that over again here.

If you wish for to take a great deal from this post then you have to apply these strategies to your won weblog.

The strategies pointed out in this post on the topic of to increase traffic at you own web site are truly good, thanks for such nice paragraph.

If you apply such techniques for increasing traffic on your own website, I am as expected you will get the difference in few days.

There is also one additional method to increase traffic in favor of your web site that is link exchange, so you as well try it

Link exchange is nothing else but it is only placing the other person�s web site link on your page at suitable place and other person will also do same in support of you.

One additional technique for advertising your web site is posting comments on different sites with your weblog link.

I have study much regarding free of charge blogging web pages, however I have no clear idea on the topic of that, can any one tell me which one is most excellent for free blogging and site-building?

What’s up, yes brother there are obviously several blogging sites, however I recommend you to use Google�s free of charge blogging services.

Yup, you are right Google is the most excellent in favor of blogging, Google�s webpage as well appear rapidly in search engines too.

Hmmm, yup no doubt Google is best for blogging however today word press is also good as a blogging since its SEO is fastidious defined already.

One more thing that I would like to share at this time is that, whatever you are using free blogging service but if you don�t update your website on regularly basis then it�s no more attraction.

Okay, you are correct buddy, regularly updating web site is actually essential in favor of SEO. Good argument keeps it up.

Hello friends, you are sharing your thoughts about website Web optimization, I am also new user of web, so I am also getting more from it. Thanks to the whole thing.

It�s wonderful that you are getting thoughts from this article as well as from our argument made at this time.

I am genuinely eager of watching funny movies at youtube, and this video clip is genuinely so comical, hehehhe.

Hello dear, me and my mom are also watch comical video clips however after I done my homework

Hello children, you all have to watch comic video tutorials, however keep in mind that first study then enjoyment okay.

Hi I am from Australia, this time I am viewing this cooking related video at this site, I am truly cheerful and learning more from it. Thanks for sharing.

Please add more videos related to cooking if you have, as I wish for to learn more and more concerning all recipes of cooking.

Hi, thanks for all the people, I will upload many more video clips in upcoming days, admin

Hello buddy, what a quality is! For this YouTube video, I am actually impressed, because I have never seen fastidious quality YouTube video before,

There are also so many video uploading web pages, and these also provide facility for distribution their video lessons, but I think YouTube is the most excellent.

Okay you are correct, YouTube is best video distribution web site, since YouTube is a lightly no much streaming time rather than other web pages.

I am actually thankful to the holder of this web site who has shared this impressive paragraph at at this time.

Hello to all, how is everything, I think every one is getting more from this web site, and your views are nice in support of new visitors.

This paragraph is related to website programming is genuinely fastidious for me as I am web developer. Thanks for sharing keep it up.

Wow! Its also fastidious piece of writing about JavaScript, I am truly keen of learning JavaScript. thanks admin

If any one wishes to be a successful blogger, afterward he/she must study this post, because it includes al} strategies related to that.

Hi, of course this post is truly good and I have learned lot of things from it concerning blogging. thanks.

—

It�s an remarkable piece of writing in favor of all the web visitors; they will take benefit from it I am sure.

I always used to read article in news papers but now as I am a user of net so from now I am using net for articles, thanks to web.

Hi there friends, is there any other nice weblog related to JavaScript content, while this one is good in support of PHP programming.

Hi there, I also would like to share my thoughts here, when i don�t know even about a effortless thing related to Personal home pages, I always go to search that from internet.

Wow! It�s a fastidious jQuery script; I was also searching for that, thus i got it right now from at this place. Keep it up admin of this site.

|When I desire to place gallery or LightBox or yet a slider on my website I every time attempt to use jQuery script in favor of that.

Genuinely programming is nothing but it�s a logic, if you get handle on it after that you are the master else not anything.

I like to work on PHP rather than .NET, even if .NET provides the feature of drag and drop elements, however I like PHP a lot.

All right you are correct, truly PHP is a open source and its help we can obtain free from any forum or web site as it occurs at this place at this website.

Hello to all, I am also really eager of learning Personal home pages programming, but I am new one, I always used to examine articles related to PHP programming.

What a video it is! Actually amazing and good quality, please upload more videos having such fastidious quality. Thanks.

Some people are eager to watch funny videos, but I like to watch terrible videos on YouTube.

Actually movie is the presentation of some one�s feelings; it gives the lesson to the viewers.

Hi there mates, nice article and pleasant urging commented here, I am genuinely enjoying by these.

Yes this YouTube video is much improved than previous one, this one has pleasant picture feature as well as audio.

Now I was so tired, and now this time I have got some relax by viewing this comic YouTube video, thanks, keep it up.

On every weekend, we all mates jointly used to watch film, as fun is also necessary in life.

I am keen of learning Flash, is there any piece of writing related to Flash, if yes, then please post it, thanks.

Yes I am also in look for of Flash tutorials, as I would like to learn more concerning flash, thus if you have please post it here.

I also like Flash, however I am not a good designer to design a Flash, but I have software by witch a Flash is automatically created and no additional to work.

Hello friends, I am again at this place, and reading this post related to Web optimization, its also a fastidious paragraph, thus keep it up.

Can any one tell me that is there any on the internet course for Search engine optimization, as I want to learn more about Search engine marketing.

Hi there every buddy, it�s a impressive entertaining at at this place viewing these funny YouTube video clips at at this place, nice material, thanks to admin of this site

It is the happiest day of my life so far, when I am watching these} funny video clips at this place, since after complete day working I was so tired and now feeling nicely.

It�s going to be ending of mine day, however before end I am reading this great piece of writing to increase my knowledge.

Why visitors still use to read news papers when in this technological world everything is presented on web?

This paragraph is fastidious and fruitful for all new PHP related web programmers; they must read it and perform the practice.

Hi friends, how is the whole thing, and what you would like to say concerning this piece of writing, in my view its truly remarkable designed for me.

What’s up Jackson, if you are a new web user afterward you have to pay a visit every day this web page and read the updated posts at at this place.

Okay, and further more if you wish for update alerts from this site at that time you must subscribe for it, it will be a better for you Jackson. Have a nice day!

Written by Tony Hirst

February 10, 2012 at 2:13 pm

Posted in Anything you want

Tagged with spam, comment spam

Open Standards Consultation and Open Data Standards Challenges

leave a comment »

Take a look around you… see that plug socket? If you’re in the UK, it should conform to British Standard BS1363 (you can read the spec if you have have you credit card to hand…). Take a listen around you… is that someone listening to an audio device playing an MP3 music file? ISO/IEC 11172-3:1993 (or ISO/IEC 13818-3:1995) helped make that possible… “that” being the agreed upon standard that let the music publisher put the audio file into a digital format that the maker of the audio device knows how to recognise and decode. (Beware, though. The MP3 specification is tainted with all sorts of patents – so you need to check whether or if you need to pay someone in order to build a device that encodes or decodes MP3 files.) If the music happens to be being played from a CD (hard to believe, but bear with me!), then you’ll be thankful the CD maker and the audio player manufacturer agreed to both work with a physical object that conforms to IEC 60908 ed2.0 (“Audio recording – Compact disc digital audio system”), and that maybe makes use of Standard ECMA-130 (also available as ISO/IEC 10149:1995). That Microsoft Office XML document you just opened somewhere? ISO/IEC 29500-1:2011. And so on…

Standards make interoperability possible. Which means that standards can be a valuable thing. If I create a standard that allows lots of things to interoperate, and I “own” the “intellectual property” associated with that standard, I can make you pay every time you sell a device that implements that standard. If I control the process by which the standard is defined and updated, then I can make changes to the standard that may or may not be to your benefit but with which you have to comply if you want to continue to be able to use the standard.

There are at least a couple of issues we need to take into account, then, when we look at adopting or “buying in” to a standard: who says what goes in to the standard, and how is agreement reach about those things; and under what terms is usage of the standard allowed (for example, do I have to pay to make use of the standard, do I have to pay in order to even read the standard).

At the adoption level, there is also the question of who decides what standard to adopt, and the means by which adoption of the standard is forced onto other parties. In the case of legislation, governments have the power to inflict a considerable financial burden on companies and government agencies by passing legislation that mandates the adoption of a particular standard that has some of fee associated with it’s use. Even outside of legislation, if a large organisation requires its suppliers to use a particular standard, then it could be commercial suicide for a supplier not to adopt the standard even if there are direct licensing costs associated with using it.

If we want to reduce the amount of friction in a process that is introduced by costs associated with the adoption of standards that make that process possible, then “open standards” may be a way forward. But what are “open standards” and what might we expect of them?

A new consultation from the Cabinet Office seeks views on this matter, with a view towards adopting open standards (whatever they are?!;-) across government, wherever possible: Cabinet Office calls on IT Community to engage in Open Standards consultation. In particular, the consultation will inform:

- the definition of open standards in the context of government IT;
- the meaning of mandation and the effects compulsory standards may have on government departments, delivery partners and supply chains; and
- international alignment and cross-border interoperability.

The consultation closes on 1 May 2012.

(Hmm, the consultation doesn’t seem to be online commentable… wouldn’t it be handy if there was something around like the old WriteToReply…?;-)

Here’s a related “open data standards in government” session from UKGovCamp 2012:

Related to the whole open standards thang is a new challenge on the Standards Hub posted by the HM Gov Open Data Standards (Shadow*) Panel (disclaimer: I’m a member of said panel; it’s (Shadow) because the board it will report to has not been formally constituted yet). The challenge covers open standards for “Managing and Using Terms and Codes” and seeks input from concerned parties relating to document standards and specifications relating to the coding and publication of controlled term lists, their provenance, version control/change files, and so on. (So for example, if you happened to work on the W3C provenance data model (which I note has reached the third working draft stage), and think it’s relevant, it might be worth bringing it to the attention of the panel as a reply to the challenge).

It occurs to me that recent JISC activity relating to UK Discovery intitiative may have something to say about the issues involved with, and formats appropriate for, representing and sharing data lists, so I commend the challenge to them: open standards for “Managing and Using Terms and Codes” (I’ll also pick my way through the #ukdiscovery docs and feed anything I find there back to the panel). I also suspect the library and shambrarian community may have something to offer, as well as members of the Linked Universities community…

[A quick note on the Open Data Standards Panel - it's role in part is to help identify and recommend open standards appropriate for adoption across government, as well as identify areas where there is a need for open standards development. It won't directly develop any standards, although it may have a role in recommending the commissioning of standards.]

A couple of other things to note on sort of tangentially related matters (this post is in danger of turning in to a newsletter, methinks… [hmmm: should I do a weekly newsletter?!]):

JISC just announced some invitations to tender on the production of some reports on Digital Infrastructure Directions. The reports are to cover the following areas: Advantages of APIs, Embedded Licences: What, Why and How, Activity Data: Analytics and Metrics, The Open Landscape, Access to citation data: a cost-benefit and risk review and forward look. the Open Knowledge Foundations has a post up Announcing the School of Data, “a joint venture between the Open Knowledge Foundation and Peer 2 Peer University (P2PU)”. The course is still in the early planning stage, and volunteers are being sought…

Related: last year, the OU co-produced a special series of programmes on “openness” with the BBC World Service Digital Planet/Click (radio) programme. You can listen to the programmes again here:

Written by Tony Hirst

February 10, 2012 at 12:36 pm

Do Retweeters Lack Commitment to a Hashtag?

leave a comment »

I seem to be going down more ratholes than usual at the moment, in this case relating to activity round Twitter hashtags. Here’s a quick bit of reflection around a chart from Visualising Activity Around a Twitter Hashtag or Search Term Using R that shows activity around a hashtag that was minted for an event that took place before the sample period.

The y-axis is organised according to the time of first use (within the sample period) of the tag by a particular user. The x axis is time. The dots represent tweets containing the hashtag, coloured blue by default, red if they are an old-style RT (i.e. they begin RT @username:).

So what sorts of thing might we look for in this chart, and what are the problems with it? Several things jump out at me:

For many of the users, their first tweet (in this sample period at least) is an RT; that is, they are brought into the hashtag community through issuing an RT; Many of the users whose first use is via an RT don’t use the hashtag again within the sample period. Is this typical? Does this signal represent amplification of the tag without any real sense of engagement with it? A noticeable proportion of folk whose first use is not an RT go on to post further non-RT tweets. Does this represent an ongoing commitment to the tag? Note that this chart does not show whether tweets are replies, or “open” tweets. Replies (that is, tweets beginning @username are likely to represent conversational threads within a tag context rather than “general” tag usage, so it would be worth using an additional colour to identify reply based conversational tweets as such. “New style” retweets are diaplayed as retweets by colouring… I need to check whether or nor newstyle RT information is available that I could use to colour such tweets appropriately. (or alternatively, I’d have to do some sort of string matching to see whether or not a tweet was the same as a previously seen tweet, which is a bit of a pain:-(

(Note that when I started mapping hashtag communities, I used to generate tag user names based on a filtered list of tweets that excluded RTs. this meant that folk who only used the tag as part of an RT and did not originate tweets that contained the tag, either in general or as part of a conversation, would not be counted as a member of the hashtag community. More recently, I have added filters that include RTs but exclude users who used the tag only once, for example, thus retaining serial RTers, but not single use users.)

So what else might this chart tell us? Looking at vertical slices, it seems that news entrants to the tag community appear to come in waves, maybe as part of rapid fire RT bursts. This chart doesn’t tell us for sure that this is happening, but it does highlight areas of the timelime that might be worth investigating more closely if we are interested in what happened at those times when there does appear to be a spike in activity. (Are there any modifications we could make to this chart to make them more informative in this respect? The time resolution is very poor, for example, so being able to zoom in on a particular time might be handy. Or are there other charts that might provide a different lens that can help us see what was happening at those times?)

And as a final point – this stuff may be all very interesting, but is it useful?, And if so, how? I also wonder how generalisable it is to other sorts of communication analysis. For example, I think we could use similar graphical techniques to explore engagement with an active comment thread on a blog, or Google+, or additions to an online forum thread. (For forums with mutliple threads, we maybe need to rethink how this sort of chart would work, or how it might be coloured/what symbols we might use, to distinguish between starting a new thread, or adding to a pre-existing one, for example. I’m sure the literature is filled with dozens of examples for how we might visualise forum activity, so if you know of any good references/links…?! ;-) #lazyacademic)

Written by Tony Hirst

February 9, 2012 at 6:30 pm

What is the Potential Audience Size for a Hashtag Community?

with 2 comments

What’s the potential audience size around a Twitter hashtag?

Way back when, in the early days of webs stats, reported figures tended to centre around the notion of hits, the number of calls made to a server via website activity. I forget the details, but the metric was presumably generated from server logs. This measure was always totally unreliable, because in the course of serving a web page, a server might be hit multiple times, once for each separately delivered asset, such as images, javascript files, css files and so on. Hits soon gave way to the notion of Page Views, which more accurately measured the number of pages (rather than assets) served via a website. This was complemented with the notion of Visits and Unique Visits: Visits, as tracked by a cookies, represent a set of pages viewed around about the same time by the same person. Unique Visits (or “Uniques”), represent the number of different people who appear to have visited the site in any given period.

What we see here, then, is a steady evolution in the complexity of website metrics that reflects on the one hand dissatisfaction with one way of measuring or reporting activity, and on the other practical considerations with respect to instrumentation and the ability to capture certain metrics once they are conceived of.

Widespread social media monitoring/tracking is largely still in the realm of “hits” measurement. Personal dashboards for services such as Twitter typically display direct measures provided by the Twitter API, or measures trivially/directly identified from Twitter API or archived data – number of followers, numbers of friends, distribution of updates over time, number of mentions, and so on.

Something both myself and Martin Hawksey have been thinking about on and off for some time are ways of reporting activity around Twitter hashtags. A commonly(?!) asked question in this respect relates to how much engagement (whatever that means) there has been with a particular tag. So here’s a quick mark in the sand about some of my current thinking about this. (Note that these ideas may well have been more formally developed in the academic literature – I’m a bit behind in my reading! If you know something that covers this in more detail, or that I should cite, please feel free to add a link in the comments… #lazyAcademic.)

One of the first metrics that comes to my mind is the number of people who have used a particular hashtag, and the number of their followers. Easily stated, it doesn’t take a lot of thought to realise even these “simple” measures are fraught with difficulty:

what counts as a use of the hashtag? If I retweet a measure of yours that contains a hashtag, have I used it in any meaningful sense? Does a “use” mean the creation of a new tweet containing the tag? What about if I reply to a tweet from you than contains the tag and I include the tag in my reply to you, even if I’m not sure what that tag relates to? the potential audience size for the tag (potential uniques?), based on the number of followers of the tag users. At first glance, we might think this can be easily calculated by adding together the follower counts of the tag users, but this is more strictly an approximation of the potential audience: the set of followers of A may include some of the followers of B, or C; do we count the tag users themselves amongst the audience? If so, the upper bound also needs to take into account the fact that none of the users may be followers of any of the other tag users.
Note there is also a lower bound – the largest follower count amongst the tag users (whatever that means…) of the hashtag. Furthermore, if we want to count the number of folk not using the tag but who may have seen the tag, this lower bound can be revised downwards by subtracting the number of tag users minus one (for the tag user with the largest follower count). The value is still only an approximation, though, becuase it assumes that all the tag users are actually included as followers of at least one, each, of the tag users. (If you think these points are “just academic”, they are and they aren’t – observations like these can often be used to help formulate gaming strategies around metrics based on these measures.)
the potential number of views of a tag, for example based on the product of the number of times a user tweets and their follower count? the reach of (or active engagement with?) the tag, as measured by the number of people who actually see the tag, or the number of people who take and action around it (such as replying to a tagged tweet, RTing it, or clicking on a link a tagged tweet contains); note that we may be able ot construct probabilistic models (albeit quite involved ones) of the potential reach based on factors like the number of people someone follows, when they are online, the rate at which the people they follow tweet, and so on..

To try to make this a little more concrete, here are a couple of scripts for exploring the potential audience size of a tag based on the followers of the tag users (where a user is someone who publishes or retweets a tweet containing the tag over a specified period). The first, Python script runs a Twitter search and generates a list of unique users of the tag, along with the timestamp of their first use of the tag within the sample period. This script also grabs all the followers of the tag users, along with their counts, and generates running cumulative (upper bound approximation) count of the tag user follower numbers as well as calculating the rolling set of unique followers to date as each new tag user is observed. The second, R script plots the values.

The first thing we can do is look at the incidence of new users of the hashtag over time:

(For a little more discussion of this sort of chart, see Visualising Activity Around a Twitter Hashtag or Search Term Using R and its inspiration, @mediaczar’s How should Page Admins deal with Flame Wars?.)

More relevant to this post, however, is a plot showing some counts relating to followers of users of the hashtag:

In this case, the top, green line represents the summed total number of followers for tag users as they enter the conversation. If every user had completely different followers, this might be meaningful, but where conversation takes place around a tag between folk who know each other, it’s highly likely that they have followers in common.

The middle, red line shows a count of the number of unique followers to date, based on the the followers of users of the tag to date.

The lower, blue line shows the difference between the red and green lines. This represents the error between the summed follower counts and the actual number of unique followers.

Here’s a view over the number of new unique potential audience members at each time step (I think the use of the line chart here may be a mistake… I think bars/lineranges would probably be more appropriate…):

In the following chart, I overplot oneline with another. The lower layer (a red line) is the total follower account for each new tag user. The blue is the increase in the potential audience count (that is, the number of the new users’ followers that haven’t potentially seen the tag so far). The range of the visible part of the red line thus shows the number of a new tag user’s followers who have potentially already seen the tag. Err… maybe (that is, if my code is correct and all the scripts are doing what I think they’re doing! If they aren’t, then just treat this post as an exploration of the sorts of charts we might be able to produce to explore audience reach;-)

Here are the scripts (such as they are!)

import newt,csv,tweepy
import networkx as nx

#the term we're going to search for
tag='ddj'
#how many tweets to search for (max 1500)
num=500

##Something along lines of:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(SKEY, SSECRET)
api = tweepy.API(auth, cache=tweepy.FileCache('cache',cachetime), retry_errors=[500], retry_delay=5, retry_count=2)

#You need to do some work here to search the Twitter API
tweeters, tweets=yourSearchTwitterFunction(api,tag,num)
#tweeters is a list of folk who tweeted the term of interest
#tweets is a list of the Twitter tweet objects returned from the search
#My code for this is tightly bound up in a large and rambling library atm...

#Put tweets into chronological order
tweets.reverse()

#I was being lazy and wasn't sure what vars I needed or what I was trying to do when I started this!
#The whole thing really needs rewriting...
tweepFo={}
seenToDate=set([])
uniqSourceFo=[]
#runtot is crude and doesn't measure overlap
runtot=0
oldseentodate=0

#Construct a digraph from folk using the tag to their followers
DG=nx.DiGraph()

for tweet in tweets:
        user=tweet['from_user']
        if user not in tweepFo:
                tweepFo[user]=[]
                print "Getting follower data for", str(user), str(len(tweepFo)), 'of', str(len(tweeters))
                mi=tweepy.Cursor(api.followers_ids,id=user).items()
                userID=tweet['from_user_id'] #check
                DG.add_node(userID,label=user)
                for m in mi:
                        tweepFo[user].append(m)
                        #construct graph
                        DG.add_edge(userID,m,weight=1)
                        DG.node[m]['label']=''
                ufc=len(tweepFo[user])
                runtot=runtot+ufc
                #seen to date is all people who have seen so far, plus new ones, so it's the union
                oldseentodate=len(seenToDate)
                seenToDate=seenToDate.union(set(tweepFo[user]))
                uniqSourceFo.append((tweet['created_at'],len(seenToDate),user,runtot,ufc,oldseentodate))
        else:
                #I'm weighting the edges so we can count how many times folk see the hashtag
                if len(DG.edges(userID))>0:
                        tmp1,tmp2=DG.edges(userID)[0]
                        weight=DG[userID][tmp2]['weight']+1
                        for fromN,toN in DG.edges(userID):
                                DG[fromN][toN]['weight']=weight


fo='reports/tmp/'+tag+'_ncount.csv'
f=open(fo,'wb+')
writer=csv.writer(f)
writer.writerow(['datetime','count','newuser','crudetot','userFoCount','previousCount'])
for ts,l,u,ct,ufc,ols in uniqSourceFo:
        print ts,l
        writer.writerow([ts,l,u,ct,ufc,ols])

f.close()

print "Writing graph.."
filter=[]
for n in DG:
        if DG.degree(n)>1: filter.append(n)
filter=set(filter)
H=DG.subgraph(filter)
nx.write_graphml(H, 'reports/tmp/'+tag+'_ncount_2up.graphml')
print "Writing other graph.."
nx.write_graphml(DG, 'reports/tmp/'+tag+'_ncount.graphml')

Here’s the R script…

ddj_ncount <- read.csv("~/code/twapps/newt/reports/tmp/ddj_ncount.csv")
#Convert the datetime string to a time object
ddj_ncount$ttime=as.POSIXct(strptime(ddj_ncount$datetime, "%a, %d %b %Y %H:%M:%S"),tz='UTC')

#Order the newuser factor levels into the order in which they first use the tag
dda=subset(ddj_ncount,select=c('ttime','newuser'))
dda=arrange(dda,-desc(ttime))
ddj_ncount$newuser=factor(ddj_ncount$newuser, levels = dda$newuser)

#Plot when each user first used the tag against time
ggplot(ddj_ncount) + geom_point(aes(x=ttime,y=newuser)) + opts(axis.text.x=theme_text(size=6),axis.text.y=theme_text(size=4))

#Plot the cumulative and union flavours of increasing possible audience size, as well as the difference between them
ggplot(ddj_ncount) + geom_line(aes(x=ttime,y=count,col='Unique followers')) + geom_line(aes(x=ttime,y=crudetot,col='Cumulative followers')) + geom_line(aes(x=ttime,y=crudetot-count,col='Repeated followers')) + labs(colour='Type') + xlab(NULL)

#Number of new unique followers introduced at each time step
ggplot(ddj_ncount)+geom_line(aes(x=ttime,y=count-previousCount,col='Actual delta'))

#Try to get some idea of how many of the followers of a new user are actually new potential audience members
ggplot(ddj_ncount) + opts(axis.text.x=theme_text(angle=-90,size=4)) + geom_linerange(aes(x=newuser,ymin=0,ymax=userFoCount,col='Follower count')) + geom_linerange(aes(x=newuser,ymin=0,ymax=(count-previousCount),col='Actual new audience'))

#This is still a bit experimental
#I'm playing around trying to see what proportion or number of a users followers are new to, or subsumed by, the potential audience of the tag to date...
ggplot(ddj_ncount) + geom_linerange(aes(x=newuser,ymin=0,ymax=1-(count-previousCount)/userFoCount)) + opts(axis.text.x=theme_text(angle=-90,size=6)) + xlab(NULL)

In the next couple of posts in this series, I’ll start to describe how we can chart the potential increase in audience count as a delta for each new tagger, along with a couple of ways of trying to get some initial sort of sense out of the graph file, such as the distribution of the potential number of “views” of a tag across the unique potential audience members…

PS See also the follow on post More Thoughts on Potential Audience Metrics for Hashtag Communities

Written by Tony Hirst

February 9, 2012 at 12:30 am

Dangers of a Walled Garden…

with one comment

Reading a recent Economist article (The value of friendship) about the announcement last week that Facebook is to float as a public company, and being amazed as ever about how these valuations, err, work, I recalled a couple of observations from a @currybet post about the Guardian Facebook app (“The Guardian’s Facebook app” – Martin Belam at news:rewired). The first related to using Facebook apps to (only partially successfully) capture attention of folk on Facebook and get them to refocus it on the Guardian website:

We knew that 77% of visits to the Guardian from facebook.com only lasted for one page. A good hypothesis for this was that leaving the confines of Facebook to visit another site was an interruption to a Facebook session, rather than a decision to go off and browse another site. We began to wonder what it would be like if you could visit the Guardian whilst still within Facebook, signed in, chatting and sharing with your friends. Within that environment could we show users a selection of other content that would appeal to them, and tempt them to stay with our content a little bit longer, even if they weren’t on our domain.

The second thing that came to mind related to the economic/business models around the app Facebook app itself:

The Guardian Facebook app is a canvas app. That means the bulk of the page is served by us within an iFrame on the Facebook domain. All the revenue from advertising served in that area of the page is ours, and for launch we engaged a sponsor to take the full inventory across the app. Facebook earn the revenue from advertising placed around the edges of the page.

I’m not sure if Facebook runs CPM (cost per thousand) display based ads, where advertisers pay per impression, or follow the Google AdWords model, where advertisers pay per click (PPC), but it got me wondering… A large number of folk on Facebook (and Twitter) share links to third party websites external to Facebook. As Martin Belam points out, the user return rate back to Facebook for folk visiting third party sites from Facebook seems very high – folk seem to follow a link from Facebook, consume that item, return to Facebook. Facebook makes an increasing chunk of its revenue from ads it sells on Facebook.com (though with the amount of furniture and Facebook open graph code it’s getting folk to include on their own websites, it presumably wouldn’t be so hard for them to roll out their own ad network to place ads on third party sites?) so keeping eyeballs on Facebook is presumably in their commercial interest.

In Twitter land, where the VC folk are presumably starting to wonder when the money tap will start to flow, I notice “sponsored tweets” are starting to appear in search results:

ANother twitter search irrelevance

Relevance still appears to be quite low, possibly because they haven’t yet got enough ads to cover a wide range of keywords or prompts:

Dodgy twitter promoted tweet

(Personally, if the relevance score was low, I wouldn’t place the ad, or I’d serve an ad tuned to the user, rather than the content, per se…)

Again, with Twitter, a lot of sharing results in users being taken to external sites, from which they quickly return to the Twitter context. Keeping folk in the Twitter context for images and videos through pop-up viewers or embedded content in the client is also a strategy pursued in may Twitter clients.

So here’s the thought, though it’s probably a commercially suicidal one: at the moment, Facebook and Twitter and Google+ all automatically “linkify” URLs (though Google+ also takes the strategy of previewing the first few lines of a single linked to page within a Google+ post). That is, given a URL in a post, they turn it into a link. But what if they turned that linkifier off for a domain, unless a fee was paid to turn it back on. Or what if the linkifier was turned off if the number of clickthrus on links to a particular domain, or page within a domain, exceeded a particular threshold, and could only be turned on again at a metered, CPM rate. (Memories here of different models for getting folk to pay for bandwidth, because what we have here is access to bandwidth out of the immediate Facebook, Twitter or Google+ context).

As a revenue model, the losses associated with irritating users would probably outweigh any revenue benefits, but as a thought experiment, it maybe suggests that we need to start paying more attention to how these large attention-consuming services are increasingly trying to cocoon us in their context (anyone remember AOL, or to a lesser extent Yahoo, or Microsoft?), rather than playing nicely with the rest of the web.

PS Hmmm…”app”. One default interpretation of this is “app on phone”, but “Facebook app” means an app that runs on the Facebook platform… So for any give app, that it is an “app” implies that that particular variant means “software application that runs on a proprietary platform”, which might actually be a combination of hardware and software platforms (e.g. Facebook API and Android phone)???

Written by Tony Hirst

February 8, 2012 at 11:46 am

Posted in Anything you want

Tagged with ads, Facebook, Google, Twitter

Visualising Activity Around a Twitter Hashtag or Search Term Using R

with 4 comments

I think one of valid criticisms around a lot of the visualisations I post here and on my various #f1datajunkie blogs is that I often don’t post any explanatory context around the visualisations. This is partly a result of the way I use my blog posts in a selfish way to document the evolution of my own practice, but not necessarily the “so what” elements that represent any meaning or sense I take from the visualisations. In many cases, this is because the understanding I come to of a dataset is typically the result of an (inter)active exploration of the data set; what I blog are the pieces of the puzzle that show how I personally set about developing a conversation with a dataset, pieces that you can try out if you want to…;-)

An approach that might get me more readers would be to post commentary around what I’ve learned about a dataset from having a conversation with it. A good example of this can be seen in @mediaczar’s post on How should Page Admins deal with Flame Wars?, where this visualisation of activity around a Facebook post is analysed in terms of effective (or not!) strategies for moderating a flame war.

@mediaczar visualisation of engagement around facebook flamewars

The chart shows a sequential ordering of posts in the order they were made along the x-axis, and the unique individual responsible for each post, ordered by accession to the debate along the y-axis. For interpretation and commentary, see the original post: How should Page Admins deal with Flame Wars? ;-)

One take away of the chart for me is that it provides a great snapshot of new people entering into a conversation (vertical lines) as well as engagement by an individual (horizontal lines). If we use a time proportional axis on x, we can also see engagement over time.

In a Twitter context, it’s likely that a rapid increase in numbers of folk engaging with a hashtag, for example, might be the result of an RT related burst of activity. For folk who have already engaged in hashtag usage, for example as part of a live event backhannel, a large number of near co-occurring tweets that are not RTs might signal some notable happenstance within the event.

To explore this idea, here’s a quick bit of R tooling inspired by Mat’s post… It uses the twitteR library and sources tweets via a Twitter search.

require(twitteR)
#Pull in a search around a hashtag.
searchTerm='#ukgc12'
rdmTweets <- searchTwitter(searchTerm, n=500)
# Note that the Twitter search API only goes back 1500 tweets

#Plot of tweet behaviour by user over time
#Based on @mediaczar's http://blog.magicbeanlab.com/networkanalysis/how-should-page-admins-deal-with-flame-wars/
#Make use of a handy dataframe creating twitteR helper function
tw.df=twListToDF(rdmTweets)
#@mediaczar's plot uses a list of users ordered by accession to user list
## 1) find earliest tweet in searchlist for each user [ http://stackoverflow.com/a/4189904/454773 ]
require(plyr)
tw.dfx=ddply(tw.df, .var = "screenName", .fun = function(x) {return(subset(x, created %in% min(created),select=c(screenName,created)))})
## 2) arrange the users in accession order
tw.dfxa=arrange(tw.dfx,-desc(created))
## 3) Use the username accession order to order the screenName factors in the searchlist
tw.df$screenName=factor(tw.df$screenName, levels = tw.dfxa$screenName)
#ggplot seems to be able to cope with time typed values...
require(ggplot2)
ggplot(tw.df)+geom_point(aes(x=created,y=screenName))

We can get a feeling for which occurrences were old-style RTs by identifying tweets that start with a classic RT, and then colouring each tweet appropriately (note there may be some overplotting/masking of points…I’m not sure how big the x-axis time bins are…)

#Identify and colour the RTs...
library(stringr)
#A helper function to remove @ symbols from user names...
trim <- function (x) sub('@','',x)
#Identify classic style RTs
tw.df$rt=sapply(tw.df$text,function(tweet) trim(str_match(tweet,"^RT (@[[:alnum:]_]*)")[2]))
tw.df$rtt=sapply(tw.df$rt,function(rt) if (is.na(rt)) 'T' else 'RT')
ggplot(tw.df)+geom_point(aes(x=created,y=screenName,col=rtt))

So now we can see when folk entered into the hashtag community via a classic RT.

We can also start to explore who was classically retweeted when:

#Generate a plot showing how a person is RTd
tw.df$rtof=sapply(tw.df$text,function(tweet) trim(str_match(tweet,"^RT (@[[:alnum:]_]*)")[2]))
#Note that this doesn't show how many RTs each person got in a given time period if they got more than one...
ggplot(subset(tw.df,subset=(!is.na(rtof))))+geom_point(aes(x=created,y=rtof))

Another view might show who was classically RTd by whom (activity along a row indicating someone was retweeted a lot through one or more tweets, activity within a column identifying an individual who RTs a lot…):

#We can start to get a feel for who RTs whom...
require(gdata)
#We don't want to display screenNames of folk who tweeted but didn't RT
tw.df.rt=drop.levels(subset(tw.df,subset=(!is.na(rtof))))
#Order the screennames of folk who did RT by accession order (ie order in which they RTd)
tw.df.rta=arrange(ddply(tw.df.rt, .var = "screenName", .fun = function(x) {return(subset(x, created %in% min(created),select=c(screenName,created)))}),-desc(created))
tw.df.rt$screenName=factor(tw.df.rt$screenName, levels = tw.df.rta$screenName)
# Plot who RTd whom
ggplot(subset(tw.df.rt,subset=(!is.na(rtof))))+geom_point(aes(x=screenName,y=rtof))+opts(axis.text.x=theme_text(angle=-90,size=6)) + xlab(NULL)

What sense you might make of all this, or where to take it next, is down to you of course… Err, erm…?! ;-)

PS see also: http://blog.ouseful.info/2012/01/21/a-quick-view-over-a-mashe-google-spreadsheet-twitter-archive-of-ukgc2012-tweets/

Written by Tony Hirst

February 6, 2012 at 1:14 pm

OU Marketers Go After Competition Supported Editorial…?

leave a comment »

Over the weekend, I noticed that the Guardian was offering readers a chance to win the chance to study for an OU degree for free. Today, via a tweet, I see a link to a piece of editorial coverage from Friday – Live and learn with distance learning – on some of the motivations for studying for an OU degree – as well as a look at the commitment that’s involved in taking a distance learning degree.

The competition is prominently linked to:

OU advertorial and linked competition

I suspect we are going to see more of this…

I was also interested to see this tweet from @barnstormed on Sunday: Nice to see the @openuniversity on one of the electronic pitch-side advertising boards at Murrayfield :) #rugby #6nations [Anyone got a screenshot?]

See also a previous campaign: OU Course Discounts with the Tesco Clubcard, although I note this is about to come to an end?

End of OU/Tesco Clubcard deal

Hmmm…

Written by Tony Hirst

February 6, 2012 at 10:45 am

Posted in OU2.0

Follow

Get every new post delivered to your Inbox.

Join 133 other followers


You are viewing a mobilized version of this site...
View original page here

Mobilized by Mowser Mowser