HK Court Lists Archive

A web application that automatically scrapes and archives Hong Kong court lists daily, the front-end application would offer the court list data as a searchable database, to the public for free, without restrictions of use.

What is the origin of this Project? 

Hong Kong has a very limited amount of open legal data. Currently, http://legalref.judiciary.gov.hk offers a very limited amount of judgement and court documents, such as High Court judgement. Other private, pay-walled services like D-Law exist, but data is patchy and expensive (>$100 per document order).

All court cases are otherwise recorded in the court lists, as soon as the cases enter the justice system, at the "Mention." Currently, the court lists are available for 7 days only: 3 days before and after the current day. There is no publicly available archive. 

As a matter of principal, justice could not exist without transparency. Open legal data is a crucial to a sound justice system.

What social problem are you trying to solve?

Journalists often learn of a case after that short window limit, such as from the GIS system, with limited information on the case. Once past the window, it would be impossible to search for the individual or the organization’s name, case number, date, nature of charge. 

The web app would be very useful not only to journalists hoping to pursue a case, or research an individual or an organization’s background. It would also be useful to due diligence professionals, legal professionals, and the public in general. 

How do we begin from scratch?

The Challenges we face?

What to do:

What resource do you need? 

python ==> scrapy

manual => frequency 

error > retry 

server space estimation, data compression 

SQL database for managing large datasets

crawl from different levels : e.g. 

morph.io

10,000 characters = 10kb per court per day

10,000 characters x 20 courts x 5 days x 50 weeks

= 50,000,000 characters

= approx 50 mb per year

computing speed as data accumulates in the long term?

 Progress

  1.  demo using morph.io (by Omar K.)
  1. wget (by Kennon Wong)
  1. ...

 

Relevent links:

https://www.judiciary.hk/tc/crt_lists/daily_caulist.htm

https://www.d-law.com/