Working with data

====== Working With Data ======

===== Tools =====

==== PDF ==== - [[http://tabula.nerdpower.org/ | Tabula]] - Extract data from PDFs - [[http://community.coherentpdf.com/ | Coherent PDF Command line tools]] - Powerful, free tools to manipulate PDF files - [[https://code.google.com/p/peepdf/ | PeePDF]] - Python PDF analysis tool. - [[https://pdftables.com/|pdftables.com]]- Accurately extract tables from PDFs. - [[http://tabula.technology/|Tabula]] allows you to extract that PDF data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface.

==== Twitter ==== - [[http://tags.hawksey.info/|TAGS]] is a free Google Sheet template which lets you setup and run automated collection of search results from Twitter. - [[https://github.com/digitalmethodsinitiative/dmi-tcat/wiki |Twitter Capture and Analysis Toolset (DMI-TCAT)]] - The Digital Methods Initiative Twitter Capture and Analysis Toolset (DMI-TCAT) is a set of tools to retrieve and collect tweets from Twitter and to analyze them in various ways. It is written mostly in PHP and runs in a webserver (LAMP) environment ==== Working with Other Formats ==== - [[http://nytimes.github.io/svg-crowbar/ | SVG Crowbar]] - Export inforgraphics in your browser to images. - [[http://bost.ocks.org/mike/make/ | Why Use Make?]] - [[https://github.com/jsoma/tabletop | Tabletop.js]] - Interface for Google Spreadsheet. - [[http://misoproject.com/dataset/ | Miso Dataset]] - Client-side data transformation and management library - [[http://tablesorter.com/docs/ | tablesorter.js]] - Client-side table sorting based on jQuery. - [[https://github.com/zzolo/so-you-want-to-open-some-data | Technical guidelines to opening up data.]] - [[https://github.com/propublica/guides/blob/master/data-bulletproofing.md#a-guide-to-bulletproofing-your-data | Guidelines to bullet-proofing your data.]] - [[https://scraperwiki.com/|ScraperWiki]] can be used for scraping the data. - [[https://import.io|Transform any website into a table of data or a Structured API in minutes without even writing any code.]] - [[http://vis.stanford.edu/wrangler/ | Wrangler]] is an interactive tool for data cleaning and transformation. Spend less time formatting and more time analyzing your data. - [[http://www.openheatmap.com/ | OpenHeatMap]] - Turn your spreadsheet into a map - [[http://quartz.github.io/Chartbuilder/ | ChartBuilder]] - [[http://plot.ly | Plotly]] is a collaborative data analysis and graphing tool. - [[https://github.com/mages/googleVis | googleVis]] Interface between R and the Google Chart Tool - [[http://tools.digitalmethods.net/beta/deduplicate |Deduplicate Tags]]- Replicates the tags in a tag cloud by their value - [[https://tools.digitalmethods.net/beta/disqusScraper/ | Discus Comment Scraper]] -This tool scrapes threads and comments from websites implementing the Disqus commenting system - [[https://apps.facebook.com/netvizz/ | netvizz ]] - Extracts various datasets from Facebook. - [[http://tools.medialab.sciences-po.fr/table2net/ |Table 2 Net]] - Extract a network from a table. Set a column for nodes and a column for edges. It deals with multiple items per cell. - [[http://tools.medialab.sciences-po.fr/sciencescape/ | ScienceScape]] - Helpers for scientometrics. Convert files, get networks, visualize stuff from Scopus or Web of Knowledge. ===== Tutorials =====

===== Books ===== - [[https://s3.amazonaws.com/leada/handbook/Handbook_Pt1.pdf| Data Analytics Handbook Pt1 (Data Scientists / Data Analysts) ]] - [[http://www.math.umass.edu/~lavine/Book/book.html | Introduction to Statistical Thought]] - [[http://mitpress.mit.edu/books/street-fighting-mathematics | Street-Fighting Mathematics]], check creative commons edition. - [[http://cran.r-project.org/doc/contrib/usingR.pdf | Using R]] - [[http://cran.r-project.org/doc/manuals/R-intro.pdf | Introduction to R]] - [[http://tables2graphs.com/doku.php | Using Graphs Instead of Tables]] - [[http://ipsur.r-forge.r-project.org/book/ | IPSUR: Introduction to Probability and Statistics Using R by G. Jay Kerns]] - [[http://en.wikibooks.org/wiki/R_Programming | R Programming – a wikibook]]