how to take all tweets in a hashtag with tweepy?

shuetisha.dev

I'm trying to take every open tweets in a hashtag but my code does not go further than 299 tweets.

I also trying to take tweets from a specific time line like tweets only in May 2015 and July 2016. Are there any way to do it in the main process or should I write a little code for it?

Here is my code:

# if this is the first time, creates a new array which
# will store max id of the tweets for each keyword
if not os.path.isfile("max_ids.npy"):
    max_ids = np.empty(len(keywords))
    # every value is initialized as -1 in order to start from the beginning the first time program run
    max_ids.fill(-1)
else:
    max_ids = np.load("max_ids.npy")  # loads the previous max ids

# if there is any new keywords added, extends the max_ids array in order to correspond every keyword
if len(keywords) > len(max_ids):
    new_indexes = np.empty(len(keywords) - len(max_ids))
    new_indexes.fill(-1)
    max_ids = np.append(arr=max_ids, values=new_indexes)

count = 0
for i in range(len(keywords)):
    since_date="2015-01-01"
    sinceId = None
    tweetCount = 0
    maxTweets = 5000000000000000000000  # maximum tweets to find per keyword
    tweetsPerQry = 100
    searchQuery = "#{0}".format(keywords[i])
    while tweetCount < maxTweets:
        if max_ids[i] < 0:
                if (not sinceId):
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry)
                else:
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                            since_id=sinceId)
        else:
                if (not sinceId):
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                            max_id=str(max_ids - 1))
                else:
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                            max_id=str(max_ids - 1),
                                            since_id=sinceId)
        if not new_tweets:
            print("Keyword: {0}      No more tweets found".format(searchQuery))
            break
        for tweet in new_tweets:
            count += 1
            print(count)

            file_write.write(
                       .
                       .
                       .
                         )

            item = {
                .
                .
                .
                .
                .
            }

            # instead of using mongo's id for _id, using tweet's id
            raw_data = tweet._json
            raw_data["_id"] = tweet.id
            raw_data.pop("id", None)

            try:
                db["Tweets"].insert_one(item)
            except pymongo.errors.DuplicateKeyError as e:
                print("Already exists in 'Tweets' collection.")
            try:
                db["RawTweets"].insert_one(raw_data)
            except pymongo.errors.DuplicateKeyError as e:
                print("Already exists in 'RawTweets' collection.")

        tweetCount += len(new_tweets)
        print("Downloaded {0} tweets".format(tweetCount))
        max_ids[i] = new_tweets[-1].id

np.save(arr=max_ids, file="max_ids.npy")  # saving in order to continue mining from where left next time program run
Attila Kis

Sorry, I can't answer in comment, too long. :)

Sure :) Check this example: Advanced searched for #data keyword 2015 may - 2016 july Got this url: https://twitter.com/search?l=&q=%23data%20since%3A2015-05-01%20until%3A2016-07-31&src=typd

session = requests.session()
keyword = 'data'
date1 = '2015-05-01'
date2 = 2016-07-31
session.get('https://twitter.com/search?l=&q=%23+keyword+%20since%3A+date1+%20until%3A+date2&src=typd', streaming = True)

Now we have all the requested tweets, Probably you could have problems with 'pagination' Pagination url ->

https://twitter.com/i/search/timeline?vertical=news&q=%23data%20since%3A2015-05-01%20until%3A2016-07-31&src=typd&include_available_features=1&include_entities=1&max_position=TWEET-759522481271078912-759538448860581892-BD1UO2FFu9QAAAAAAAAETAAAAAcAAAASAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&reset_error_state=false

Probably you could put a random tweet id, or you can parse first, or requests some data from twitter. It can be done.

Use Chrome's networking tab to find all the requested information :)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to get tweets of a particular hashtag in a location in a tweepy?

How To Get All Tweets on Hashtag using LinqToTwitter

How to get tweets by hashtag, if hashtag contains Non-ASCII character or non English characters in tweepy?

Retrieve all the tweets of a specific hashtag

How to successfully get all the tweets for one user with tweepy?

How to get tweets author with tweepy

Python tweepy find all tweets in the Netherlands

How to get tweets with given position of hashtag

How to retrieve tweets by hashtag without authentication?

Python x Tweepy: how to pull tweets from all users contained within a list

Tweepy - How to "tag" tweets with their respective tracking filter

How to get full tweets from tweepy?

How to get the location from geotagged tweets with tweepy?

How to export tweets to txt or json, by tweepy?

How to get the profile pictures of tweets using tweepy

How to check for image use in tweets in Tweepy

how to listen for new tweets for an @mention using tweepy?

How to search tweets containing either one hashtag or other ones

Twitter API - How to count Tweets by hashtag in last 24 hours?

Tracking hashtag with geolocation - tweepy

Python: Hashtag search with Tweepy

Reply to tweets in tweepy

Return a users tweets with tweepy

Getting tweets by date with tweepy

Random sampling tweets with tweepy

Search for tweets with Tweepy library

How to get location wise tweets using tweepy for streaming API?

How do I ignore tweets that I have already retweeted in Tweepy?

How to retrieve tweets from a week ago using Tweepy in API 3.9