Building With the Twitter API: Analyzing Your Followers

Welcome to the latest episode of our Twitter API series. In our last episode, I built Twixxr.com which will let you discover influential women on Twitter for your account to follow. Today, I’m going to turn the focus inward to look at my own followers.

Building With the Twitter API: Analyzing Your Followers

Building With the Twitter API: Analyzing Your Followers
While I haven’t really used Facebook since 2013, I’ve remained active on Twitter—even as they pumped my feed with ads and annoyed me by trying to algorithmically optimize it.

Recently, I was verified and started to gather followers at a slightly faster rate. I was hopeful that I might see more response to my tweets. Generally, I’ve been surprised at how little response there usually is on Twitter for the average person.

Building With the Twitter API: Analyzing Your Followers

I have nearly 1,900 followers, but rarely do people comment or retweet pieces that I think are important and of general interest. For example, not a single person shared my piece on the sharp spike in rape reports in Seattle or commentary on Bill Gates at his most outrageously hypocritical.

For a long time I’ve wanted to look more closely at my Twitter followers and answer some questions: Who exactly is following me? And why aren’t they more interactive? Is it possible that only 10% of my followers are real people?

Twitter’s been having trouble finding a buyer, and maybe this has something to do with it.

The Twitter API is a good tool to investigate this. Yet it has a ton of rate limits which make even something simple like analyzing your followers quite complex. In today’s episode, I’ll show you how I worked with the rate limits to assess and build a scoreboard of my followers.

If you have any questions or feedback, please post them below in the comments or reach out to me on Twitter @reifman.

Analyze Our Twitter Followers

Building With the Twitter API: Analyzing Your Followers

Just above, you can see the basic scoreboard I’ve created. Today’s episode will focus mostly on the infrastructure and approach I took to create this. I hope I get a chance to write more about improving the scoring mechanism.

And yes, as you can see above, renowned gay rights leader and sex advice columnist Dan Savage follows me but never retweets anything I share. If there’s time today, we’ll analyze this to answer important questions like: is he real, a bot, or just following me for personal sex advice? What can we learn from his account to determine whether he’s likely ever to interact with me on Twitter or, for that matter, any of my other followers?

The scoreboard code is mostly a prototype which I’ve built on top of the Twixxr code from the last episode, but it’s not a live demo for people to use. I’m sharing so you can learn from it and build on it yourself.

Here are the basic elements of the code:

  • Creating the database to store my followers and related data.
  • Downloading my followers in pages of 20 followers each.
  • Tracking the cursors for the pages as I download 15 pages per rate limited window.
  • Storing data collected about my followers in the database.
  • Building a prototype scoring algorithm to score all of the followers.
  • Building a view to browse the scoreboard.

Diving Into the Code

Creating the Database Table Migrations

I created three different tables to store all the data and help me work with the Twitter API rate limiting. If you’re not familiar with Yii database migrations, please see How to Program With Yii2: Working With the Database and Active Record.

First, I extended the SocialProfile table to record a lot more data from the follower’s accounts such as whether they are verified, their location, and how many items they’ve favorited:

<?php
use yiidbSchema;
use yiidbMigration;

class m161026_221130_extend_social_profile_table extends Migration
{
        public function up()
        {
          $tableOptions = null;
          if ($this->db->driverName === 'mysql') {
              $tableOptions = 'CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE=InnoDB';
          }
          $this->addColumn('{{%social_profile}}','social_id',Schema::TYPE_STRING.' NOT NULL');
          $this->addColumn('{{%social_profile}}','name','string NOT NULL');
          $this->addColumn('{{%social_profile}}','screen_name',Schema::TYPE_STRING.' NOT NULL');
          $this->addColumn('{{%social_profile}}','description',Schema::TYPE_TEXT.' NOT NULL');
          $this->addColumn('{{%social_profile}}','url',Schema::TYPE_STRING.' NOT NULL');
          $this->addColumn('{{%social_profile}}','protected',Schema::TYPE_SMALLINT. ' NOT NULL DEFAULT 0');
          $this->addColumn('{{%social_profile}}','favourites_count',Schema::TYPE_BIGINT. ' NOT NULL DEFAULT 0');
          $this->addColumn('{{%social_profile}}','verified',Schema::TYPE_SMALLINT. ' NOT NULL DEFAULT 0');
          $this->addColumn('{{%social_profile}}','location',Schema::TYPE_STRING.' NOT NULL');
          $this->addColumn('{{%social_profile}}','profile_location',Schema::TYPE_STRING.' NOT NULL');
          $this->addColumn('{{%social_profile}}','score',Schema::TYPE_BIGINT. ' NOT NULL DEFAULT 0');
        }

Then, I built an indexing table called SocialFriend to track followers for specific accounts. If I decide to formalize this service publicly, I’ll need this. It links the User table with the user’s followers in the SocialProfile table.

<?php
  use yiidbSchema;
  use yiidbMigration;
class m161026_233916_create_social_friend_table extends Migration
{
   public function up()
   {
       $tableOptions = null;
       if ($this->db->driverName === 'mysql') {
           $tableOptions = 'CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE=InnoDB';
       }

       $this->createTable('{{%social_friend}}', [
           'id' => Schema::TYPE_PK,
           'user_id' => Schema::TYPE_BIGINT.' NOT NULL',
           'social_profile_id' => Schema::TYPE_BIGINT.' NOT NULL',
       ], $tableOptions);
   }

Next, the Twitter API requires that you page through requests of 20 followers at a time. To know the next page, you have to track the cursors, essentially tags, that mark the next page to fetch.

Since you’re only allowed to make 15 requests for followers every 15 minutes, you have to store these cursors in the database. The table is called SocialCursor:

<?php
  use yiidbSchema;
  use yiidbMigration;
  class m161027_001026_social_cursor_table extends Migration
{
   public function up()
   {
       $tableOptions = null;
       if ($this->db->driverName === 'mysql') {
           $tableOptions = 'CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE=InnoDB';
       }

       $this->createTable('{{%social_cursor}}', [
           'id' => Schema::TYPE_PK,
           'user_id' => Schema::TYPE_BIGINT.' NOT NULL',
           'next_cursor' => Schema::TYPE_STRING.' NOT NULL',
       ], $tableOptions);
   }

Eventually, I’ll build background cron tasks to manage all this, but for today’s prototype, I’m running these tasks by hand.

Collecting the Followers and Their Account Data

Next, I created a method Twitter::getFollowers() to make the request. Here’s the basics of the code:

public function getFollowers($user_id) {
  $sp = new SocialProfile();
  $next_cursor = SocialCursor::getCursor($user_id);
  ...
  while ($next_cursor>0) {
      $followers = $this->connection->get("followers/list",['cursor'=>$next_cursor]);
      if ($this->connection->getLastHttpCode() != 200) {
        var_dump($this->connection);
        exit;
      }
        if (isset($followers->users)) {
          foreach ($followers->users as $u) {
            $n+=1;
            $users[]=$u;
            $sp->add($user_id,$u);        
          }
        $next_cursor= $followers->next_cursor;
        SocialCursor::refreshCursor($user_id,$next_cursor);
        echo $next_cursor.'<br />';
        echo '======================================================<br />';
      } else {
        exit;
      }
  }

It gets the next_cursor and repeatedly asks for followers, $followers = $this->connection->get("followers/list",['cursor'=>$next_cursor]), until it hits rate limits.

The output looks something like this as it runs through each page of 20 results:

refresh cursor: 1489380833827620370
======================================================
refresh cursor: 1488086367811119559
======================================================
refresh cursor: 1486452899268510188
======================================================
refresh cursor: 1485593015909209633
======================================================
refresh cursor: 1485330282069552137
======================================================
refresh cursor: 1485256983607000799
======================================================
refresh cursor: 1484594012550322889
======================================================
refresh cursor: 1483359799854574028
======================================================
refresh cursor: 1481615590678791493
======================================================
refresh cursor: 1478424827838161031
======================================================
refresh cursor: 1477449626282716582
======================================================
refresh cursor: 1475751176809638917
======================================================
refresh cursor: 1473539961706830585
======================================================
refresh cursor: 1471375035531579849
======================================================

The data is stored by those $sp->add($user_id,$u); methods. The SocialProfile::add() method is a different version of the fill() method from the Twixxr tutorial. It stores more data and manages the SocialFriend index:

public static function add($user_id,$profileObject=null) {
      $sp = SocialProfile::find()
        ->where(['social_id'=>$profileObject->id_str])
        ->one();
      if (!isset($profileObject->name) || empty($profileObject->name)) {
        $profileObject->name='Nameless';
      }
      if (!isset($profileObject->url) || empty($profileObject->url)) {
        $profileObject->url='';
      }
      if (!isset($profileObject->screen_name) || empty($profileObject->screen_name)) {
        $profileObject->screen_name='error_sn';
      }
      if (!isset($profileObject->description) || empty($profileObject->description)) {
        $profileObject->description='(empty)';
      }
      if (!isset($profileObject->profile_location) || empty($profileObject->profile_location)) {
        $profileObject->profile_location='';
      }
      if (!isset($profileObject->profile_image_url_https) || empty($profileObject->profile_image_url_https)) {
        $profileObject->profile_image_url_https='';
      }
      if (!is_null($sp)) {
        $sp->social_id = $profileObject->id;
        $sp->image_url = $profileObject->profile_image_url_https;
        $sp->follower_count= $profileObject->followers_count;
        $sp->status_count = $profileObject->statuses_count;
        $sp->friend_count = $profileObject->friends_count;
        $sp->listed_in = $profileObject->listed_count;
        $sp->url=$profileObject->url;
        if ($profileObject->protected) {
            $sp->protected=1;
        } else {
          $sp->protected=0;
        }
        if ($profileObject->verified) {
            $sp->verified=1;
        } else {
          $sp->verified=0;
        }
        $sp->favourites_count=$profileObject->favourites_count;
        $sp->location=$profileObject->location;
        $sp->profile_location=$profileObject->profile_location;
        $sp->name = $profileObject->name;
        $sp->description = $profileObject->description;
        $sp->image_url = $profileObject->profile_image_url_https;
        if ($sp->validate()) {
            $sp->update();
        } else {
          var_dump($sp->getErrors());
        }
      } else {
        $sp = new SocialProfile();
        $sp->social_id = $profileObject->id;
        $sp->score = 0;
        $sp->header_url='';
        $sp->url=$profileObject->url;
        $sp->favourites_count=$profileObject->favourites_count;
        if ($profileObject->protected) {
            $sp->protected=1;
        } else {
          $sp->protected=0;
        }
        if ($profileObject->verified) {
            $sp->verified=1;
        } else {
          $sp->verified=0;
        }     $sp->location=$profileObject->location;
        $sp->profile_location=$profileObject->profile_location;
        $sp->name = $profileObject->name;
        $sp->description = $profileObject->description;
        $sp->screen_name = $profileObject->screen_name;
        $sp->image_url = $profileObject->profile_image_url_https;
        $sp->follower_count= $profileObject->followers_count;
        $sp->status_count = $profileObject->statuses_count;
        $sp->friend_count = $profileObject->friends_count;
        $sp->listed_in = $profileObject->listed_count;
        if ($sp->validate()) {
            $sp->save();
        } else {
          var_dump($sp->getErrors());
        }
      }
      $sf = SocialFriend::find()
        ->where(['social_profile_id'=>$sp->id])
        ->andWhere(['user_id'=>$user_id])
        ->one();
      if (is_null($sf)) {
          $sf = new SocialFriend();
          $sf->user_id = $user_id;
          $sf->social_profile_id = $sp->id;
          $sf->save();
      }
      return $sp->id;
    }

It’s written to save new records or update old records so that in the future you could track your follower data and update it regularly, overwriting old data.

This last section at the end makes sure there is a SocialFriend index between the User table and the SocialProfile table.

$sf = SocialFriend::find()
        ->where(['social_profile_id'=>$sp->id])
        ->andWhere(['user_id'=>$user_id])
        ->one();
      if (is_null($sf)) {
          $sf = new SocialFriend();
          $sf->user_id = $user_id;
          $sf->social_profile_id = $sp->id;
          $sf->save();
      }

Scoring Twitter Followers

Building With the Twitter API: Analyzing Your Followers

I had a handful of goals for my Twitter scoring:

  • Eliminate accounts that follow everyone that follows them. For example, they have 12,548 followers and follow 12,392 people (see above).
  • Eliminate accounts following more than say 1,500 accounts who are unlikely to ever see what I share. For example, Dan Savage follows 1,536 people.
  • Eliminate accounts that have very few posts or very few accounts they follow, likely abandoned accounts.
  • Eliminate accounts with few favorites—these are likely bots, not really using the app.

Similarly, I wanted to highlight some positive aspects:

  • Accounts that are verified
  • Accounts that have lots of followers
  • Accounts that have less than 1,000 people that they follow—a sweet spot to me

Here’s some rough basic code from SocialProfile::score() that highlights some of the positives:

foreach ($all as $sp) {
        // score sp
        $score =0;
        // RULE IN
        if ($sp->verified==1) {
          $score+=1000;
        }
        // POSITIVE
        if ($sp->protected==1) {
          $score+=500;
        }
        if ($sp->follower_count > 10000) {
          $score+=500;
        } else if ($sp->follower_count > 3500) {
          $score+=750;
        } else if ($sp->follower_count > 1100) {
          $score+=1000;
        }  else if ($sp->follower_count > 1000) {
          $score+=250;
        } else if ($sp->follower_count> 500) {
          $score+=250;
        }

Here’s some code that eliminates some of the bad accounts:

  // RULE OUT
        // make this a percentage of magnitude
        $magnitude = $sp->follower_count/1000;
        if ($sp->follower_count> 1000 and abs($sp->follower_count-$sp->friend_count)<$magnitude) {
          $score-=2500;
        }
        if ($sp->friend_count > 7500) {
            $score-=10000;
        } else
        if ($sp->friend_count > 5000) {
            $score-=5000;
        }
         else if ($sp->friend_count > 2500) {
            $score-=2500;
        }else  if ($sp->friend_count > 2000) {
           $score-=2000;
       } else if ($sp->friend_count > 1000) {
            $score-=250;
        } else if ($sp->friend_count > 750) {
          $score-=100;
        }
        if ($sp->follower_count<100) {
          $score-=1000;
        }
        if ($sp->status_count < 35) {
          $score-=5000;
        }

Obviously, there’s a lot to play with here and a variety of ways to improve this. I hope I get a chance to spend more time on this.

As the method runs, it looks like this, but updates the SocialProfile table with scores as it goes:

DJMany -6300
gai_ltau -7850
Michal92B -900
InvestmentAdvsr -2900
TSSStweets -7500
sandcageapp -1750
dominicpouzin 1950
daletdykaaolch1 -7850
suzamack -8250
writingthrulife -7500
ryvr -1550
RichardAngwin -8300
DanielleMorrill -7300
ReversaCreates 2750
BoKnowsMarkting -7500
TheHMProA -8500
HouseMgmt101 750
itsmeKennethG -1250
drbobbiwegner -8500
Mizzfit_Bianca -7300
wilsonmar 700
CoachVibeke -7300
jhurwitz 0
PiedPiperComms 500
Prana2thePeople -1100
singlemomspower -2250
mouselink -7300
MotivatedGenY -7300
brett7three -7300
JovanWalker 2950
ITSPmagazine 450
RL_Miller -2250

Displaying the Scoreboard

Yii’s default grid makes it pretty easy to display the SocialProfile table and customize the scoreboard columns.

Here’s SocialProfileController::actionIndex():

/**
     * Lists all SocialProfile models.
     * @return mixed
     */
    public function actionIndex()
    {
        $searchModel = new SocialProfileSearch();
        $dataProvider = $searchModel->search(Yii::$app->request->queryParams);

        return $this->render('index', [
            'searchModel' => $searchModel,
            'dataProvider' => $dataProvider,
        ]);
    }

And here’s the grid view customized:

<?php
use yiihelpersHtml;
use yiigridGridView;
use yiiwidgetsPjax;
/* @var $this yiiwebView */
/* @var $searchModel frontendmodelsSocialProfileSearch */
/* @var $dataProvider yiidataActiveDataProvider */
$this->title = Yii::t('frontend', 'Social Profiles');
$this->params['breadcrumbs'][] = $this->title;
?>
<div class="social-profile-index">
    <h1><?= Html::encode($this->title) ?></h1>
    <?php // echo $this->render('_search', ['model' => $searchModel]); ?>
<?php Pjax::begin(); ?>    <?= GridView::widget([
        'dataProvider' => $dataProvider,
        'filterModel' => $searchModel,
        'columns' => [
            ['class' => 'yiigridSerialColumn'],
            [
            'label'=>'Name',
              'format' => 'raw',
              'value' => function ($model) {
                      return '<div><span><strong><a href="http://twitter.com/'.$model->screen_name.'">'.$model->name.'</a></strong><br />'.$model->screen_name.'</span></div>';
                  },
          ],
            'score',
            [
            'label'=>'Follows',
              'format' => 'raw',
              'attribute'=>'friend_count',
            ],
            [
            'label'=>'Followers',
              'format' => 'raw',
              'attribute'=>'follower_count',
            ],
            [
            'label'=>'Tweets',
              'format' => 'raw',
              'attribute'=>'status_count',
            ],
            [
            'label'=>'Favs',
              'format' => 'raw',
              'attribute'=>'favourites_count',
            ],
            [
            'label'=>'Listed',
              'format' => 'raw',
              'attribute'=>'listed_in',
            ],
            [
            'label'=>'P',
              'format' => 'raw',
              'attribute'=>'protected',
            ],
          [
          'label'=>'V',
            'format' => 'raw',
            'attribute'=>'verified',
        ],

            // 'location',
            // 'profile_location',
            [
            //'contentOptions' => ['class' => 'col-lg-11 col-xs-10'],
            'label'=>'Pic',
              'format' => 'raw',
              'value' => function ($model) {
                      return '<div><span><img src="'.$model->image_url.'"></span></div>';
                  },
          ],
        ],
    ]); ?>
<?php Pjax::end(); ?></div>

Here’s what the top scores look like with my initial algorithm:

Building With the Twitter API: Analyzing Your Followers

There are so many ways to improve and tune the scoring. I look forward to playing with it more.

And there’s more I’d like to write code for and expand my use of the API, for example:

  • Use PHP gender to help eliminate companies from people (companies don’t interact much).
  • Look up the frequency of posts that people have made and the last time they used Twitter.
  • Use Twitter’s search API to see which followers have actually ever interacted with my content.
  • Provide feedback to the scoring to tune it.

Looking Ahead

I hope you find the scoring approach intriguing. There’s so much more that can be done to improve this. Please feel free to play with it and post your ideas below.

If you have any questions or suggestions, please post them in the comments. If you’d like to keep up on my future ThemeKeeper Tuts+ tutorials and other series, please visit my instructor page or follow @reifman. Definitely check out my startup series and Meeting Planner.

Related Links