Hello, I'm doing a website like this one and I would need some help. I'm going to do it simpler at first so it's easier to program. The format will have legacy cards under 0.3 as legal. So the maximum deck price would be around 10.
The first thing that has to be done is a database saving all the legal cards. For that I have done a python script that loops through the gatherer pages. Once I have the cardName I go to a different website and scrape it again for the price. If the card already exists the price is updated. This script should be run at the beginning of each month to update the legal cards for the format. That can be done with a cronjob in the server.
Next I've thought about doing the web in php and queriying the database for single cards, multiple cards in a copy paste way and in a file browsing way for .txt files. I haven't got into this yet.
Any help is appreciated. I would like suggestions as to how accelerate the updating of the prices as right now it would take around 8 hours to update the 17k cards that are in legacy. This only has to be done once a month but still I think it could be improved.
Depending on where you scrape the prices... really a direct API will probably be faster...
Add a field to indicate "Price Age", update all your prices ONCE to set the price age for each card. As users access their deck lists, check the price age for each card. If the price age is < some set time, don't bother updating the price. If the price age > some set time, then update only those cards.
This means you'll update the cards on an as-needed basis. The obvious trade off is that, user experience will be impacted in a worse use case scenario where every card in a deck List needs a price update.
Edit:
I should point out this technique requires a lot of care to avoid race and collision problems. Especially as you scale with more users and/or decks. Once you hit that wall, I would switch to a "popular" card update based on usage statistics of the decks. Popular cards would be auto-updated monthly, less popular cards will be updated as needed. This obviously assumes you're still able to scrape prices. Scraping is OK for one shots but consistent scraping will draw the attention of admins. You really need an API to do it properly.
---
Your price point of .4 is very low. You can statistically ignore cards over a certain dollar level. In essence, there is no point in updating or checking for prices on say... Power 9. Extreme example I know, but you get the idea. Setting a cut off of say... anything over $40 already rotated out of Standard can be discarded or ignored.
It's hard to do a detailed analyses without looking at source code.
Edit:
Legacy isn't a widely changing format outside of new cards introduced with each printing and the ocassional ban. There's not a hugely pressing issue to scan Gatherer once a month for cards going in or out when there's already a known schedule. This can alleviate the pressure for scanning for new cards.
The entire issue can be sidestepped entirely by utilizing freely available data packages that have done all of the grunt work. I use one such package to regularly reload my own database and the process is fast. Even if I had to parse raw text, the entire process is measured in seconds.
Sorry I didn't explained the format correctly. The Heirloom format has legal cards based on card price. All cards that are common and $0.05 or cheaper are legal, uncommon $0.10, Rare $0.20, Mythic $1.00 As long as one version is cheap enough that makes it legal (Promo and premade deck cards are evaluated based on their original highest rarity from a legal set). Legality of cards shifts at the beginning of each month to last for one month.
With those prices average deck price would be around $5. So having a single price limit for all cards as am doing it right now, card price limit would be $5/37 ~= $0.14. But I've considered an average deck price of $15 which gives a price limit of $15/37 ~= $0.4. I would like opinions about which should be the price limit. And if it's better to have one for each rarity or a single one for all of them?
Can you give me some reference to look for an extensive mysql query system for php (some open source web with advanced search maybe). I want to do something like magiccards.info where you can select color, card type, subtype etc.
I would need help selecting an API which allows trend price checking.
Sorry I didn't explained the format correctly. The Heirloom format has legal cards based on card price. All cards that are common and $0.05 or cheaper are legal, uncommon $0.10, Rare $0.20, Mythic $1.00 As long as one version is cheap enough that makes it legal (Promo and premade deck cards are evaluated based on their original highest rarity from a legal set). Legality of cards shifts at the beginning of each month to last for one month.
I didn't misunderstand. I just didn't feel like looking up a relevant card price as a better example
With those prices average deck price would be around $5. So having a single price limit for all cards as am doing it right now, card price limit would be $5/37 ~= $0.14. But I've considered an average deck price of $15 which gives a price limit of $15/37 ~= $0.4. I would like opinions about which should be the price limit. And if it's better to have one for each rarity or a single one for all of them?
None of them. I get the format limitations, I don't necessarily agree with it as a viable format. I feel it creates a sort of pseudo Standard format that has the potential for an even higher rotation than Standard. That constant rotation is one of my biggest dislikes for Standard.
I don't feel like analyzing any of the data right now. But let's hypothetically say there is a striking card in an older-limited run set that is highly unlikely to see a reprint, like something from Ice Age. Let's say this card, given the very limited nature of this new format, proves to be an all star. What will happen to the price of the card as the format grows? It might go up. Might even go high enough to be banned by virtue of its price point. The card is banned and demand drops. Then it's unbanned and the price rises again.
Worse, some cards are so sensitive to buy outs that all someone has to do to get rid of the card from this format is to get those clueless ******* speculators to buy a bunch and push the floor up long enough to initiate an auto ban. Then the floor could potentially be set at the higher price and boom, the card is gone. If the floor isn't set, then it'll drop and the whole process starts again.
I imagine this monthly yoyoing of the price and bans might ultimately up prove disastrous to the long term health of a format so closely tied to card cost. A deck might initially cost only $15 to build, but if you have to keep buying new cards because of the price fluctuations, what are the realized price savings?
Not to mention what prices do you look at? TCGPlayer? eBay? Troll & Toad? U.S. prices? Europe?
Like I said, I get where this format is coming from, I just don't necessarily agree with it.
Please I understand, I'm not against the format per se either. I'm against Standard for many reasons. I'm rarely against fan created formats.
Can you give me some references to look for an extensive mysql query system for php (some open source web with advanced search maybe). I want to do something like magiccards.info where you can select color, card type, subtype etc.
I never really applied any of the open source packages to something like an MtG database.
I built my own SQL DB using MariaDB. (I refuse to build new projects using Oracle or MySQL due to Oracle's ongoing abuse of the market.) Then I wrote my own code to pull down updates from mtgjson.com. I'm indecisive on whether the SQL should be manually or auto stuffed. The code seems resilient enough to run on its own. I have a quirky statistic I like to track so the Perl does a lot of heavy calculations and JSON is a different structure than a typical relational DB. The entire fetch/update/calculate/stuff process is done with Perl and is stupid fast even though I slurp the entire JSON object model in one go. I chose Perl because I already have an existing Perl backend. The entire thing can easily be done in Python (I already checked). I'm just too lazy to rewrite it all.
The above JSON model has almost all of that data already and if it doesn't, you can extrapolate it. If you control the SQL structure, the PHP queries is nuts easy. Just don't let the references get out of control.
I would need help selecting an API which allows trend price checking.
That's going to be a problem. I don't know of any that is available without stipulations.
The goto is usually TCGPlayer. They have information on their site to sign up to get access. I considered it myself but my own project has too many overlaps with their services so I doubt they want competition. Furthermore, they seem more interested in er... projects that get customers back to their website to buy cards. Either way, TCGP wants to see a completed or nearly complete project to demo to them before they'll give you the keys. Something I don't have ready.
It seems that most other sites that track $$ in some fashion have similar requirements or they, themselves, grab the data from TCGP. This creates a very interesting problem which is another topic for another time.
I suggest just writing a wrapper for an API so you can build the rest of the components without having to worry about it for the time being.
I'm not really concerned with the banning list changing every month because the format is not going to be so popular. It's just a casual format that keeps the game more balanced between friends that would other way invest a very different amount of money in their cards. It also keeps the Rare and Uncommon cards so the format is more like normal magic than pauper.
I've changed the gathererScraper to get this information from the card: name, rarity, cmc, color, multiverseid, text, artist, mkm_value, format, expansion.
TODO list
Cache downloaded webs to make things quicker while testing.
I have to fix double sided cards.
Cards without cmc are giving problems.
I have to get the hole text because right now I only get some.
I have to get: id, flavor, tcg_value, mtgo_trader_value, cost, type
I've uploaded the web because it's already working pretty well but know I've seen that 000webhosting doesn't support python so I'll have to rewrite the code for php or look for another host. I consider this version 1.0 even if I haven't uploaded it yet.
I've added:
Cache downloaded webs to make things quicker while testing.
I have fixed the price checking for double sided cards.
The only thing that has to be done is add a new field when saving the database with the image url from wizards and then show that image in the web when hoving over the name. I'll do it as soon as I have some time. Also I'm planning to add the possibility to search in multiple languages and show the card images in the language in which the search was performed or if that's too difficult just have an option to select in which language should the images be displayed. The images from different languages can be gotten at magiccards.info.
I can't figure out how to do the advanced search yet. It get's too complicated as I add search features. I would need an example of an advanced search to see how to structure the code so it doesn't get out of hand.
It seems Magic Card Market has an api here to see the price for all the magic cards in CSV. Maybe I can get the prices from there instead of scraping the website. I have to see how to import the price from the CSV to the database.
There are a few changes I've thought off, I'd like the average deck price to be similar to a preconstructed deck and the number of cards by rarity in the format to be proportional to the number of cards by rarity in a set which is approximately 101 commons, 60 uncommons, 53 rares, and 15 mythic rares.
(f:legacy ((usd<=0.20 r:common) or (usd<=0.25 r:uncommon) or (usd<=0.5 (r:rare or r:mythic)))) = 13750
(f:legacy ((eur<=0.20 r:common) or (eur<=0.25 r:uncommon) or (eur<=0.5 (r:rare or r:mythic))) ((tix<=0.05 r:common) or (tix<=0.1 r:uncommon) or (tix<=0.2 r:rare) or (tix<=0.4 r:mythic))) = 12430
Instead of working with a database work with the json from mtgjson that way I don't have to worry about downloading all the database every time a new set comes out which is the thing that last the most.
Update prices only every two set releases because I don't like how much standard rotates now.
I've added the option to search by rarity I need to add search by type, contained words, etc.
Hello, I'm doing a website like this one and I would need some help. I'm going to do it simpler at first so it's easier to program. The format will have legacy cards under 0.3 as legal. So the maximum deck price would be around 10.
The first thing that has to be done is a database saving all the legal cards. For that I have done a python script that loops through the gatherer pages. Once I have the cardName I go to a different website and scrape it again for the price. If the card already exists the price is updated. This script should be run at the beginning of each month to update the legal cards for the format. That can be done with a cronjob in the server.
Next I've thought about doing the web in php and queriying the database for single cards, multiple cards in a copy paste way and in a file browsing way for .txt files. I haven't got into this yet.
Any help is appreciated. I would like suggestions as to how accelerate the updating of the prices as right now it would take around 8 hours to update the 17k cards that are in legacy. This only has to be done once a month but still I think it could be improved.
The Heirloom format is described here:
MTG Heirloom Legality Checker
Depending on where you scrape the prices... really a direct API will probably be faster...
Add a field to indicate "Price Age", update all your prices ONCE to set the price age for each card. As users access their deck lists, check the price age for each card. If the price age is < some set time, don't bother updating the price. If the price age > some set time, then update only those cards.
This means you'll update the cards on an as-needed basis. The obvious trade off is that, user experience will be impacted in a worse use case scenario where every card in a deck List needs a price update.
Edit:
I should point out this technique requires a lot of care to avoid race and collision problems. Especially as you scale with more users and/or decks. Once you hit that wall, I would switch to a "popular" card update based on usage statistics of the decks. Popular cards would be auto-updated monthly, less popular cards will be updated as needed. This obviously assumes you're still able to scrape prices. Scraping is OK for one shots but consistent scraping will draw the attention of admins. You really need an API to do it properly.
---
Your price point of .4 is very low. You can statistically ignore cards over a certain dollar level. In essence, there is no point in updating or checking for prices on say... Power 9. Extreme example I know, but you get the idea. Setting a cut off of say... anything over $40 already rotated out of Standard can be discarded or ignored.
It's hard to do a detailed analyses without looking at source code.
Edit:
Legacy isn't a widely changing format outside of new cards introduced with each printing and the ocassional ban. There's not a hugely pressing issue to scan Gatherer once a month for cards going in or out when there's already a known schedule. This can alleviate the pressure for scanning for new cards.
The entire issue can be sidestepped entirely by utilizing freely available data packages that have done all of the grunt work. I use one such package to regularly reload my own database and the process is fast. Even if I had to parse raw text, the entire process is measured in seconds.
With those prices average deck price would be around $5. So having a single price limit for all cards as am doing it right now, card price limit would be $5/37 ~= $0.14. But I've considered an average deck price of $15 which gives a price limit of $15/37 ~= $0.4. I would like opinions about which should be the price limit. And if it's better to have one for each rarity or a single one for all of them?
Can you give me some reference to look for an extensive mysql query system for php (some open source web with advanced search maybe). I want to do something like magiccards.info where you can select color, card type, subtype etc.
I would need help selecting an API which allows trend price checking.
I didn't misunderstand. I just didn't feel like looking up a relevant card price as a better example
None of them. I get the format limitations, I don't necessarily agree with it as a viable format. I feel it creates a sort of pseudo Standard format that has the potential for an even higher rotation than Standard. That constant rotation is one of my biggest dislikes for Standard.
I don't feel like analyzing any of the data right now. But let's hypothetically say there is a striking card in an older-limited run set that is highly unlikely to see a reprint, like something from Ice Age. Let's say this card, given the very limited nature of this new format, proves to be an all star. What will happen to the price of the card as the format grows? It might go up. Might even go high enough to be banned by virtue of its price point. The card is banned and demand drops. Then it's unbanned and the price rises again.
Worse, some cards are so sensitive to buy outs that all someone has to do to get rid of the card from this format is to get those clueless ******* speculators to buy a bunch and push the floor up long enough to initiate an auto ban. Then the floor could potentially be set at the higher price and boom, the card is gone. If the floor isn't set, then it'll drop and the whole process starts again.
I imagine this monthly yoyoing of the price and bans might ultimately up prove disastrous to the long term health of a format so closely tied to card cost. A deck might initially cost only $15 to build, but if you have to keep buying new cards because of the price fluctuations, what are the realized price savings?
Not to mention what prices do you look at? TCGPlayer? eBay? Troll & Toad? U.S. prices? Europe?
Like I said, I get where this format is coming from, I just don't necessarily agree with it.
Please I understand, I'm not against the format per se either. I'm against Standard for many reasons. I'm rarely against fan created formats.
I never really applied any of the open source packages to something like an MtG database.
I built my own SQL DB using MariaDB. (I refuse to build new projects using Oracle or MySQL due to Oracle's ongoing abuse of the market.) Then I wrote my own code to pull down updates from mtgjson.com. I'm indecisive on whether the SQL should be manually or auto stuffed. The code seems resilient enough to run on its own. I have a quirky statistic I like to track so the Perl does a lot of heavy calculations and JSON is a different structure than a typical relational DB. The entire fetch/update/calculate/stuff process is done with Perl and is stupid fast even though I slurp the entire JSON object model in one go. I chose Perl because I already have an existing Perl backend. The entire thing can easily be done in Python (I already checked). I'm just too lazy to rewrite it all.
The above JSON model has almost all of that data already and if it doesn't, you can extrapolate it. If you control the SQL structure, the PHP queries is nuts easy. Just don't let the references get out of control.
That's going to be a problem. I don't know of any that is available without stipulations.
The goto is usually TCGPlayer. They have information on their site to sign up to get access. I considered it myself but my own project has too many overlaps with their services so I doubt they want competition. Furthermore, they seem more interested in er... projects that get customers back to their website to buy cards. Either way, TCGP wants to see a completed or nearly complete project to demo to them before they'll give you the keys. Something I don't have ready.
It seems that most other sites that track $$ in some fashion have similar requirements or they, themselves, grab the data from TCGP. This creates a very interesting problem which is another topic for another time.
I suggest just writing a wrapper for an API so you can build the rest of the components without having to worry about it for the time being.
I've changed the gathererScraper to get this information from the card: name, rarity, cmc, color, multiverseid, text, artist, mkm_value, format, expansion.
TODO list
I've added:
TODO list:
(f:legacy ((usd<=0.20 r:common) or (usd<=0.25 r:uncommon) or (usd<=0.5 (r:rare or r:mythic)))) = 13750
(f:legacy ((eur<=0.20 r:common) or (eur<=0.25 r:uncommon) or (eur<=0.5 (r:rare or r:mythic))) ((tix<=0.05 r:common) or (tix<=0.1 r:uncommon) or (tix<=0.2 r:rare) or (tix<=0.4 r:mythic))) = 12430
Instead of working with a database work with the json from mtgjson that way I don't have to worry about downloading all the database every time a new set comes out which is the thing that last the most.
Update prices only every two set releases because I don't like how much standard rotates now.
I've added the option to search by rarity I need to add search by type, contained words, etc.