Support the ongoing development of Laravel.io →
Requests Input
Last updated 1 year ago.
0

Can you give a few more examples. Would be helpful to understand the url pattern if you show a page from the actual site you are trying to scrape.

Last updated 1 year ago.
0

Sure, so here is: http://www.nfl.com/player/chrisjohnson/262/profile and the variable would be the 262, and another url: http://www.nfl.com/player/russellwilson/2532975/profile. so the variable is always between player name and profile. Ideally, I'd like the url to be a base url of http://www.nfl.com/player with a request for { $playername }/\S*/profile. I just found something in guzzle docs, just don't know how to syntax it: http://guzzle.readthedocs.org/en/latest/http-client/uri-templa...

Last updated 1 year ago.
0

Owk.

Consider the url : http://www.nfl.com/player/chrisjohnson/262/profile

When the server receives this url it looks in the database for this player by id not name. So, chrisjohnson is not the player name for NFL site. Its 262.

Try this :

http://www.nfl.com/player/idontcare/262/profile and you will still see Chrish Johnson's profile.

So, to scrape you will have to use player ids.

And the docs url you mentioned is only about helper to build query by passing variable parameters. Like for to get the url you would do :

$player = new Guzzle\Http\Client('http://www.nfl.com/player/xyz/{id}/', array(
	'id' => $player_id,
));

$profile = $player->get('profile');

Player name in the url is just for seo & user help.

Last updated 1 year ago.
0

Awesome, that makes a lot more sense now. That does lead me to one more question, how should I start about automating the collection of player ID's that I will now need? Thanks for your time on this, by the way things are making more sense.

Last updated 1 year ago.
0

There are many ways you can go on this. I will list two of them.

You know player name (NOT username, name)

You can use NFL's search functionality :

/search?category=name&filter={ player_name }&playerType=current example

$player = new Guzzle\Http\Client('http://www.nfl.com/players/');

$results = $player->get(['/search{?data*}', [
    'data' => [
	    'category'   => 'name',
	    'playerType' => 'current',
	    'filter'     => $player_name,
    ]
]]);

// $player_id = select the player url and parse for id

$profile = $player->get(['/idontcare/{id}/profile',
    'id'    =>  $player_id,
]);

Get id of every players in a team in one go

Scrape the player roaster page : ex : /search?category=team&filter=3430&playerType=current

Last updated 1 year ago.
0

Hmmm, tried to use the example above, but it sent me to the results page with "Displaying 1 - 25 of 275089", So I tried working on a different query that would better suit my needs (filtering by position), but got the same issue. Here is what I had:

$position = 'quarterback';
	$player = new Guzzle\Http\Client('http://www.nfl.com/players/');

	$results = $player->get(['/search{?data*}', [
	    'category'   => 'position',
	    'playerType' => 'current',
	    'conference' => 'ALL',
	    'd-447263-p' => '1',
	    'filter'     => $position,
	    'conferenceAbbr' => 'null'
	]]);

	return $results->send();

Any way I could barrow you on IRC for a minute or so? (IRC name is same as here)

EDIT: Still looking into this, and through docs and articles it seems like getQuery() method is the way to do this, but when I put:

$player->getQuery()
    ->set('category', 'position')
    ->set('playerType', 'current')
	->set('conference', 'ALL')
	->set('d-447263-p', 1)
	->set('filter', $position)
	->set('conferenceAbbr', 'null');	

I get "Call to undefined method Guzzle\Http\Client::getQuery()"

Last updated 1 year ago.
0

Okay, took a while, but I've got the syntax for the query in Guzzle right now:

$position = 'quarterback';
	$client = new Guzzle\Http\Client('http://www.nfl.com/players/');    
	$request = $client->get();

	$q = $request->getQuery();

	$q->set('category', 'position');
	$q->set('playerType', 'current');
	$q->set('conference', 'ALL');
	$q->set('d-447263-p', 1);
	$q->set('filter', $position);
	$q->set('conferenceAbbr', 'null');

return $request->send();

Now, I just need to figure out how to loop it for the 'd-447263-p' => 1, to include however many pages there may be. Maybe a loop?

Last updated 1 year ago.
0

O ! just saw I made a mistake.

Because we are referring {?data*}, we have to cap the query parameters in a array with data key.

####I have previous rectified the answer.

Gr8 that figured out the solution. Wold like to add a bit:

Goutte provides a method which can select a link with the text. So, just look for the text Next in search section footer and recursively run the scraper until the next word goes away !

Last updated 1 year ago.
0

Sign in to participate in this thread!

Eventy

Your banner here too?

Moderators

We'd like to thank these amazing companies for supporting us

Your logo here?

Laravel.io

The Laravel portal for problem solving, knowledge sharing and community building.

© 2024 Laravel.io - All rights reserved.