HK Court Lists Archive

編輯歷史

時間 作者 版本
2018-11-29 20:46 SelinaC r1762
顯示 diff
(60 行未修改)
*...
*...
- o are we?
- RSelina Cheng, reporter
- selinakycheng@gmail.com
-
- elevent links:
+ o levent links:
https://www.judiciary.hk/tc/crt_lists/daily_caulist.htm
https://www.d-law.com/
2018-06-25 06:59 – 07:05 Omar K. r1457 – r1761
顯示 diff
(51 行未修改)
computing speed as data accumulates in the long term?
- o are we?
+ Progress
+ * demo using morph.io (by Omar K.)
+ * https://morph.io/oktak/daily_caulist (source code: https://github.com/oktak/daily_caulist)
+ *This is very preliminary prototype. It currently tackles only one of 29 court case hearing lists.
+ *morph.io supports scraping the sites once per day, and provide download CSV and SQLite for further storage
+ *TODO: extend the scrape.py to tackle all courts case hearing list and setup to permanent Database hosting server
+ *wget (by Kennon Wong)
+ *...
+ *...
+ o are we?
RSelina Cheng, reporter
selinakycheng@gmail.com
(4 行未修改)
2018-06-23 07:22 – 10:28 SelinaC r1245 – r1456
顯示 diff
(34 行未修改)
hat resource do you need?
- W
- ho are we?
+
+ python ==> scrapy
+ manual => frequency
+ error > retry
+ server space estimation, data compression
+ SQL database for managing large datasets
+
+
+ crawl from different levels : e.g.
+ morph.io
+ 1W,00
+ characters = 10kb per court per day
+ 10,00
+ characters x 20 courts x 5 days x 50 weeks= 50,000,000 charactersh
+ = approx 50 mb per year
+ computing speed as data accumulates in the long term?
+
+ o are we?
RSelina Cheng, reporter
selinakycheng@gmail.com
(4 行未修改)
2018-06-22 18:02 – 18:03 Brian Leung r1239 – r1244
顯示 diff
(17 行未修改)
The Challenges we face?
- W*There may be challenge from by government or organization's on violation of privacy (although the only private info would be the name.)
+ W*There may be challenges from by government or organization's on violation of privacy (although the only private info would be the name.)
*There may be government restriction on the use of legal data
*Long-term archive maintenance
(4 行未修改)
W*Seek legal advice on privacy issues
*Build a scraper, possibly with the help of existing open source tools, at fixed daily intervals
- *Collect each data item, create data structure
*Build a database to store the data scraped
- *API
*Build a front-end web application, with data entry points: search by parties' name, date, court, nature of charge, etc. (Ref: Pacer.gov) then offer a full list of data available.
*Long-term database maintenance
(13 行未修改)
2018-06-22 15:15 – 15:16 Brian Leung r1237 – r1238
顯示 diff
- *HK Court Lists Archive
+ HK Court Lists Archive
A web application that automatically scrapes and archives Hong Kong court lists daily, the front-end application would offer the court list data as a searchable database, to the public for free, without restrictions of use.
(28 行未修改)
*Build a front-end web application, with data entry points: search by parties' name, date, court, nature of charge, etc. (Ref: Pacer.gov) then offer a full list of data available.
*Long-term database maintenance
- *Might need fundraising efforts to hire coders for longer term development, and server space rental
+ *Might need fundraising efforts to hire coders for longer-term development, and server space rental
**
(10 行未修改)
2018-06-22 10:45 – 11:14 SelinaC r26 – r1236
顯示 diff
*HK Court Lists Archive
+
+ A web application that automatically scrapes and archives Hong Kong court lists daily, the front-end application would offer the court list data as a searchable database, to the public for free, without restrictions of use.
What is the origin of this Project?
+ Hong Kong has a very limited amount of open legal data. Currently, http://legalref.judiciary.gov.hk offers a very limited amount of judgement and court documents, such as High Court judgement. Other private, pay-walled services like D-Law exist, but data is patchy and expensive (>$100 per document order.
+
+ All court cases are otherwise recorded in the court lists, as soon as the cases enter the justice system, at the "Mention." Currently, the court lists are available for 7 days only: 3 days before and after the current day. There is no publicly available archive.
+ *
+ As a matter of principal, justice could not exist without transparency. Open legal data is a crucial to a sound justice system.
+
What social problem are you trying to solve?
- How did we begin from scratch?
+ Journalists often learn of a case after that short window limit, such as from the GIS system, with limited information on the case. Once past the window, it would be impossible to search for the individual or the organization's name, case number, date, nature of charge.
+
+ The web app would be very useful not only to journalists hoping to pursue a case, or research an individual or an organization's background. It would also be useful to due diligence professionals, legal professionals, and the public in general.
+
+ How diowe begin from scratch?
+
The Challenges we face?
- What to do:
- What resource do you need?
- Who are we?
- Relevent links:
+ W*There may be challenge from by government or organization's on violation of privacy (although the only private info would be the name.)
+ *There may be government restriction on the use of legal data
+ *Long-term archive maintenance
+ *Long-term server space, and possibly server maintenance
+ *$
+
+ hat to do:
+ W*Seek legal advice on privacy issues
+ *Build a scraper, possibly with the help of existing open source tools, at fixed daily intervals
+ *Collect each data item, create data structure
+ *Build a database to store the data scraped
+ *API
+ *Build a front-end web application, with data entry points: search by parties' name, date, court, nature of charge, etc. (Ref: Pacer.gov) then offer a full list of data available.
+ *Long-term database maintenance
+ *Might need fundraising efforts to hire coders for longer term development, and server space rental
+
+ **
+
+ hat resource do you need?
+ W
+ ho are we?
+ RSelina Cheng, reporter
+ selinakycheng@gmail.com
+
+ elevent links:
+ https://www.judiciary.hk/tc/crt_lists/daily_caulist.htm
+ https://www.d-law.com/
2018-06-22 09:32 (unknown) r25
顯示 diff
(10 行未修改)
2018-06-22 09:32 SelinaC r24
顯示 diff
(7 行未修改)
What resource do you need?
Who are we?
+ Relevent links:
2018-06-22 09:32 (unknown) r23
顯示 diff
(9 行未修改)
2018-06-22 09:32 SelinaC r22
顯示 diff
(6 行未修改)
What to do:
What resource do you need?
+ Who are we?
2018-06-22 09:32 (unknown) r21
顯示 diff
(8 行未修改)
2018-06-22 09:32 SelinaC r20
顯示 diff
(5 行未修改)
The Challenges we face?
What to do:
+ What resource do you need?
2018-06-22 09:32 (unknown) r19
顯示 diff
(7 行未修改)
2018-06-22 09:32 SelinaC r18
顯示 diff
(4 行未修改)
How did we begin from scratch?
The Challenges we face?
+ What to do:
2018-06-22 09:32 (unknown) r17
顯示 diff
(6 行未修改)
2018-06-22 09:32 SelinaC r16
顯示 diff
(3 行未修改)
What social problem are you trying to solve?
How did we begin from scratch?
+ The Challenges we face?
2018-06-22 09:32 (unknown) r15
顯示 diff
(5 行未修改)
2018-06-22 09:32 SelinaC r14
顯示 diff
(2 行未修改)
What is the origin of this Project?
What social problem are you trying to solve?
+ How did we begin from scratch?
2018-06-22 09:32 (unknown) r13
顯示 diff
(4 行未修改)
2018-06-22 09:30 – 09:32 SelinaC r1 – r12
顯示 diff
- Untitled
+ *HK Court Lists Archive
- This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents!
+ What is the origin of this Project?
+ What social problem are you trying to solve?
2018-06-22 09:30 (unknown) r0
顯示 diff
+ Untitled
+ This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents!